E teze

General Metadata

Title

Impact of text classification on natural language processing applications

Creator

Šandrih, Branislava, 1991-, 57190153

2020

Object Links

Permanent link

Full text preview

Download link

Select license

Autorstvo 3.0 Srbija (CC BY 3.0)

License description

Dozvoljavate umnožavanje, distribuciju i javno saopštavanje dela, i prerade, ako se navede ime autora na način odredjen od strane autora ili davaoca licence, čak i u komercijalne svrhe. Ovo je najslobodnija od svih licenci. Osnovni opis Licence: http://creativecommons.org/licenses/by/3.0/rs/deed.sr_LATN Sadržaj ugovora u celini: http://creativecommons.org/licenses/by/3.0/rs/legalcode.sr-Latn

Language

English

Cobiss-ID

20766985

Academic metadata

Theses Type

Doktorska disertacija

description

Datum odbrane: 08.07.2020.

Other responsibilities

mentor

Kartelj, Aleksandar, 1986-, 57190665

član komisije

Pavlović-Lažetić, Gordana, 1955-, 12443239

član komisije

Filipović, Vladimir, 1968-, 12759399

član komisije

Krstev, Cvetana, 1952-, 12149607

član komisije

Mitkov, Ruslan, 57703177

Academic Expertise

Prirodno-matematičke nauke

Academic Title

University

Univerzitet u Beogradu

Other Theses Metadata

Alternative title

Uticaj klasifikacije teksta na primene u obradi prirodnih jezika

Publisher

[B. Šandrih]

Format

145 str.

description

Computer Science - Natural Language Processing /Računarstvo - Obrada prirodnih jezika

Abstract (en)

The main goal of this dissertation is to put different text classification tasks in the same frame, by mapping the input data into the common vector space of linguistic attributes. Subsequently, several classification problems of great importance for natural language processing are solved by applying the appropriate classification algorithms. The dissertation deals with the problem of validation of bilingual translation pairs, so that the final goal is to construct a classifier which provides a substitute for human evaluation and which decides whether the pair is a proper translation between the appropriate languages by means of applying a variety of linguistic information and methods. In dictionaries it is useful to have a sentence that demonstrates use for a particular dictionary entry. This task is called the classification of good dictionary examples. In this thesis, a method is developed which automatically estimates whether an example is good or bad for a specific dictionary entry. Two cases of short message classification are also discussed in this dissertation. In the first case, classes are the authors of the messages, and the task is to assign each message to its author from that fixed set. This task is called authorship identification. The other observed classification of short messages is called opinion mining, or sentiment analysis. Starting from the assumption that a short message carries a positive or negative attitude about a thing, or is purely informative, classes can be: positive, negative and neutral. These tasks are of great importance in the field of natural language processing and the proposed solutions are language-independent, based on machine learning methods: support vector machines, decision trees and gradient boosting. For all of these tasks, a demonstration of the effectiveness of the proposed methods is shown on for the Serbian language.

Abstract (sr)

Osnovni cilj disertacije je stavljanje različitih zadataka klasifikacije teksta u isti okvir, preslikavanjem ulaznih podataka u isti vektorski prostor lingvističkih atributa...

Authors Key words

natural language processing, machine learning, computational linguistics, text classification, terminology extraction, authorship identification, sentiment classification, classification of good dictionary examples

Authors Key words

obrada prirodnih jezika, mašinsko učenje, računarska lingvistika, klasifikacija teksta, ekstrakcija terminologije, identifikacija autorstva, analiza osećanja, odabir dobrih rečničkih primera

Classification

004.85:519.765(043.3)

Type

Tekst

Abstract (en)