Efficient Classification of Text Documents Using Word Embeddings' Distributions

Rynkun, Szymon

Praca magisterska

Licencja

Dostęp zamknięty

Efficient Classification of Text Documents Using Word Embeddings' Distributions

dc.abstract.en	Artificial intelligence is currently experiencing an all-round rapid development. Although new, benchmark-breaking Natural Language Processing solutions are presented each month, text classification of highly specific documents, such as patent files, is still a challenge. In this work, I present a document (paragraph) classification method, which relies on embedding clouds and their marginal distributions. These distributions are then used as features for an artificial neural network classifier. The method is evaluated on patent descriptions and although it doesn’t achieve high performance, it does provide insights into the nature of the problem.
dc.abstract.pl	Sztuczna inteligencja przeżywa obecnie gwałtowny rozwój. Chociaż każdego miesiąca prezentowane są nowe, przełomowe rozwiązania w zakresie przetwarzania języka naturalnego, klasyfikacja tekstów zawierających specyficzny język, takich jak dokumenty patentowe, nadal stanowi wyzwanie. W niniejszej pracy przedstawiam metodę klasyfikacji dokumentów, która opiera się na użyciu chmur zanurzeń słów oraz ich rozkładów brzegowych. Rozkłady te są następnie wykorzystywane jako dane wejściowe dla klasyfikatora opartego na sztucznej sieci neuronowej. Metoda została przetestowana na opisach z dokumentów patentowych i chociaż nie osiągnęła wysokich wyników, to na jej przykładzie można dowiedzieć się więcej o naturze problemu.
dc.affiliation	Uniwersytet Warszawski
dc.affiliation.department	Wydział Psychologii
dc.contributor.author	Rynkun, Szymon
dc.date.accessioned	2025-01-09T12:49:28Z
dc.date.available	2025-01-09T12:49:28Z
dc.date.defence	2024-07-18
dc.date.issued	2024
dc.date.submitted	2024-07-04
dc.description.promoter	Szczuka, Marcin
dc.description.reviewer	Zadrożny, Adam
dc.description.reviewer	Szczuka, Marcin
dc.identifier.apd	228900
dc.identifier.uri	https://repozytorium.uw.edu.pl//handle/item/164989
dc.language	en
dc.language.other	pl
dc.publisher	Uniwersytet Warszawski
dc.rights	ClosedAccess
dc.subject.en	Natural Language Processing
dc.subject.en	Machine Learning
dc.subject.en	Text Classification
dc.subject.en	Word Embeddings
dc.subject.en	Marginal Distributions
dc.subject.pl	Przetwarzanie języka naturalnego
dc.subject.pl	Uczeni maszynowe
dc.subject.pl	Klasyfikacja tekstu
dc.subject.pl	Zanurzenia słów
dc.subject.pl	Rozkłady brzegowe
dc.title	Efficient Classification of Text Documents Using Word Embeddings' Distributions
dc.title.alternative	Wydajna klasyfikacja dokumentów tekstowych przy użyciu rozkładów zanurzeń słów
dc.type	MasterThesis
dspace.entity.type	Publication

Licencja

Efficient Classification of Text Documents Using Word Embeddings' Distributions

Opcje