Title
Impact of text classification on natural language processing applications
Creator
Šandrih, Branislava, 1991-, 57190153
Copyright date
2020
Object Links
Select license
Autorstvo 3.0 Srbija (CC BY 3.0)
License description
Dozvoljavate umnožavanje, distribuciju i javno saopštavanje dela, i prerade, ako se navede ime autora na način odredjen od strane autora ili davaoca licence, čak i u komercijalne svrhe. Ovo je najslobodnija od svih licenci. Osnovni opis Licence: http://creativecommons.org/licenses/by/3.0/rs/deed.sr_LATN Sadržaj ugovora u celini: http://creativecommons.org/licenses/by/3.0/rs/legalcode.sr-Latn
Language
English
Cobiss-ID
Theses Type
Doktorska disertacija
description
Datum odbrane: 08.07.2020.
Other responsibilities
mentor
Kartelj, Aleksandar, 1986-, 57190665
član komisije
Pavlović-Lažetić, Gordana, 1955-, 12443239
član komisije
Filipović, Vladimir, 1968-, 12759399
član komisije
Krstev, Cvetana, 1952-, 12149607
član komisije
Mitkov, Ruslan, 57703177
Academic Expertise
Prirodno-matematičke nauke
Academic Title
-
University
Univerzitet u Beogradu
Alternative title
Uticaj klasifikacije teksta na primene u obradi prirodnih jezika
Publisher
[B. Šandrih]
Format
145 str.
description
Computer Science - Natural Language Processing /Računarstvo - Obrada prirodnih jezika
Abstract (en)
The main goal of this dissertation is to put different text classification tasks in
the same frame, by mapping the input data into the common vector space of linguistic
attributes. Subsequently, several classification problems of great importance for natural
language processing are solved by applying the appropriate classification algorithms.
The dissertation deals with the problem of validation of bilingual translation pairs, so
that the final goal is to construct a classifier which provides a substitute for human evaluation
and which decides whether the pair is a proper translation between the appropriate
languages by means of applying a variety of linguistic information and methods.
In dictionaries it is useful to have a sentence that demonstrates use for a particular dictionary
entry. This task is called the classification of good dictionary examples. In this thesis,
a method is developed which automatically estimates whether an example is good or bad
for a specific dictionary entry.
Two cases of short message classification are also discussed in this dissertation. In the
first case, classes are the authors of the messages, and the task is to assign each message
to its author from that fixed set. This task is called authorship identification. The other
observed classification of short messages is called opinion mining, or sentiment analysis.
Starting from the assumption that a short message carries a positive or negative attitude
about a thing, or is purely informative, classes can be: positive, negative and neutral.
These tasks are of great importance in the field of natural language processing and the
proposed solutions are language-independent, based on machine learning methods: support
vector machines, decision trees and gradient boosting. For all of these tasks, a
demonstration of the effectiveness of the proposed methods is shown on for the Serbian
language.
Abstract (sr)
Osnovni cilj disertacije je stavljanje različitih zadataka klasifikacije teksta u
isti okvir, preslikavanjem ulaznih podataka u isti vektorski prostor lingvističkih atributa...
Authors Key words
natural language processing, machine learning, computational linguistics,
text classification, terminology extraction, authorship identification, sentiment classification,
classification of good dictionary examples
Authors Key words
obrada prirodnih jezika, mašinsko učenje, računarska lingvistika, klasifikacija
teksta, ekstrakcija terminologije, identifikacija autorstva, analiza osećanja, odabir
dobrih rečničkih primera
Classification
004.85:519.765(043.3)
Type
Tekst
Abstract (en)
The main goal of this dissertation is to put different text classification tasks in
the same frame, by mapping the input data into the common vector space of linguistic
attributes. Subsequently, several classification problems of great importance for natural
language processing are solved by applying the appropriate classification algorithms.
The dissertation deals with the problem of validation of bilingual translation pairs, so
that the final goal is to construct a classifier which provides a substitute for human evaluation
and which decides whether the pair is a proper translation between the appropriate
languages by means of applying a variety of linguistic information and methods.
In dictionaries it is useful to have a sentence that demonstrates use for a particular dictionary
entry. This task is called the classification of good dictionary examples. In this thesis,
a method is developed which automatically estimates whether an example is good or bad
for a specific dictionary entry.
Two cases of short message classification are also discussed in this dissertation. In the
first case, classes are the authors of the messages, and the task is to assign each message
to its author from that fixed set. This task is called authorship identification. The other
observed classification of short messages is called opinion mining, or sentiment analysis.
Starting from the assumption that a short message carries a positive or negative attitude
about a thing, or is purely informative, classes can be: positive, negative and neutral.
These tasks are of great importance in the field of natural language processing and the
proposed solutions are language-independent, based on machine learning methods: support
vector machines, decision trees and gradient boosting. For all of these tasks, a
demonstration of the effectiveness of the proposed methods is shown on for the Serbian
language.
“Data exchange” service offers individual users metadata transfer in several different formats. Citation formats are offered for transfers in texts as for the transfer into internet pages. Citation formats include permanent links that guarantee access to cited sources. For use are commonly structured metadata schemes : Dublin Core xml and ETUB-MS xml, local adaptation of international ETD-MS scheme intended for use in academic documents.