Artykuł w czasopiśmie
Brak miniatury
Licencja
New Parallel Corpora of Baltic and Slavic Languages — Assumptions of Corpus Construction
dc.abstract.en | In this article, we describe the design principles of the ten newly published CLARIN-PL corpora of Slavic and Baltic languages. In relation to other non-commercial online corpora, we highlight the distinctive features of these CLARIN-PL corpora: resource selection, preprocessing, manual segmentation at the sentence level, lemmatisation, annotation and metadata. We also present current and planned work on the development of the CLARIN-PL Balto–Slavic corpora. |
dc.affiliation | Uniwersytet Warszawski |
dc.contributor.author | Roszko, Danuta |
dc.contributor.author | Duszkin, Maksim |
dc.contributor.author | Roszko, Roman |
dc.date.accessioned | 2024-01-25T13:48:27Z |
dc.date.available | 2024-01-25T13:48:27Z |
dc.date.issued | 2021 |
dc.description.finance | Publikacja bezkosztowa |
dc.identifier.issn | 0302-9743 |
dc.identifier.uri | https://repozytorium.uw.edu.pl//handle/item/113604 |
dc.identifier.weblink | https://www.springer.com/series/1244 |
dc.language | eng |
dc.pbn.affiliation | linguistics |
dc.relation.ispartof | Lecture Notes in Computer Science |
dc.relation.pages | 172-183 |
dc.rights | ClosedAccess |
dc.sciencecloud | nosend |
dc.title | New Parallel Corpora of Baltic and Slavic Languages — Assumptions of Corpus Construction |
dc.type | JournalArticle |
dspace.entity.type | Publication |