Artykuł w czasopiśmie
Brak miniatury
Licencja

ClosedAccessDostęp zamknięty
 

New Parallel Corpora of Baltic and Slavic Languages — Assumptions of Corpus Construction

dc.abstract.enIn this article, we describe the design principles of the ten newly published CLARIN-PL corpora of Slavic and Baltic languages. In relation to other non-commercial online corpora, we highlight the distinctive features of these CLARIN-PL corpora: resource selection, preprocessing, manual segmentation at the sentence level, lemmatisation, annotation and metadata. We also present current and planned work on the development of the CLARIN-PL Balto–Slavic corpora.
dc.affiliationUniwersytet Warszawski
dc.contributor.authorRoszko, Danuta
dc.contributor.authorDuszkin, Maksim
dc.contributor.authorRoszko, Roman
dc.date.accessioned2024-01-25T13:48:27Z
dc.date.available2024-01-25T13:48:27Z
dc.date.issued2021
dc.description.financePublikacja bezkosztowa
dc.identifier.issn0302-9743
dc.identifier.urihttps://repozytorium.uw.edu.pl//handle/item/113604
dc.identifier.weblinkhttps://www.springer.com/series/1244
dc.languageeng
dc.pbn.affiliationlinguistics
dc.relation.ispartofLecture Notes in Computer Science
dc.relation.pages172-183
dc.rightsClosedAccess
dc.sciencecloudnosend
dc.titleNew Parallel Corpora of Baltic and Slavic Languages — Assumptions of Corpus Construction
dc.typeJournalArticle
dspace.entity.typePublication