Efficient enumerations in words

Zuba, Wiktor

Praca doktorska

Pliki

Licencja

Korzystanie z tego materiału możliwe jest zgodnie z właściwymi przepisami o dozwolonym użytku lub o innych wyjątkach przewidzianych w przepisach prawa. Korzystanie w szerszym zakresie wymaga uzyskania zgody uprawnionego.

Efficient enumerations in words

dc.abstract.en	The dissertation presented here consists of a set of results in text processing algorithms from four different scientific publications. The papers focus on algorithms and data structures that allow efficient enumeration and counting of certain classes of factors. Most of the results are based on different variations of the same key property - regularity of the set of occurrences of factors of specific types. In ”Efficient Representation and Counting of Antipower Factors in Words” we focused on a class of k-antipowers (words consisting of k pairwise different segments of the same length). The results of the paper consist of effective algorithms for enumerating all occurrences, counting all occurrences and counting different k-antipower factors in a string, working in O(nk log k + output), O(nk log k) and O(nk^4 log k log n) time, respectively. The paper ”Counting Distinct Patterns in Internal Dictionary Matching” deals with a problem in which we are given an input word T and its internal dictionary D. We want to answer questions of the form: ”How many distinct words from D are the factors of T[i . . j]?”. We have shown few algorithms and data structures, that allows us to do that effectively. The algorithms differ in exploited properties and complexities (each one can be more effective from the other depending on the ratio between T and D). In ”The Number of Repetitions in 2D-Strings” we focused on the extensions of the notions of squares and runs to the case of two-dimensional strings. We gave new bounds on the maximal number of their occurrences in strings and effective algorithms for enumerating them. In the fourth paper titled ”Efficient Enumeration of Distinct Factors Using Package Representations”, we proposed a new compact representation of string factors. It allows us to obtain new effective algorithms for enumeration and counting of distinct factors it contains. In particular it allowed us to obtain a new simple algorithm for finding distinct squares and a faster algorithm for finding all distinct antipowers in a string (improving one of the results from the first of enclosed papers).
dc.abstract.en	Przedstawiona rozprawa stanowi zbiór wyników prac na temat algorytmów przetwarzających dane tekstowe. Prace skupiają się na zaprezentowaniu algorytmów i struktur danych pozwalających na efektywne znajdowanie i zliczanie podsłów należących do konkretnych klas podsłów. Innym elementem wspólnym poniższych prac jest to, ze wykorzystują one regularności w słowach związane z wystąpieniami podsłów szczególnego typu. W pracy ”Efficient Representation and Counting of Antipower Factors in Words” zajęliśmy się klasa k-antypotęg (słów składających się z k parami rożnych członów o tej samej długości). Rezultatem pracy są efektywne algorytmy wypisywania wszystkich wystąpień, zliczania wszystkich wystąpień oraz zliczania rożnych k-antypotęg w słowie, działające w czasach odpowiednio O(nk log k + wynik), O(nk log k) i O(nk^4 log k log n). Praca ”Counting Distinct Patterns in Internal Dictionary Matching” przedstawia problem w którym mając dane słowo wejściowe T oraz słownik jego podsłów D chcielibyśmy móc efektywnie odpowiadać na pytania ”Ile rożnych słów ze słownika D jest podsłowami słowa T[i . . j]?”. Przedstawiliśmy kilka algorytmów/struktur danych, które pozwalają na rozwiązanie tego problemu. Algorytmy różnią się wykorzystywanymi założeniami oraz złożonościami (każdy może być lepszy od pozostałych w zależności od zastosowania, oraz stosunku wielkości T i D). W pracy ”The Number of Repetitions in 2D-Strings” zajęliśmy się uogólnieniami pojęć kwadratów i maksymalnych powtórzeń na teksty dwuwymiarowe. Podaliśmy nowe ograniczenia na maksymalna ich ilość w tekstach, oraz efektywne algorytmy ich wyznaczania. Na końcu przedstawiam pracę ”Efficient Enumeration of Distinct Factors Using Package Representations”, w której wprowadziliśmy nowy, skompresowany sposób reprezentacji wystąpień wielu podsłów. Pozwala ona na otrzymanie nowych, efektywnych algorytmów wyznaczania i zliczania rożnych słów przez nią reprezentowanych. W szczególności pozwoliło nam to na otrzymanie nowego prostego algorytmu wyznaczania kwadratów i szybszego algorytmu wyznaczania antypotęg (poprawiając jeden z wyników pierwszej z przedstawionych prac).
dc.affiliation.department	Wydział Matematyki, Informatyki i Mechaniki
dc.contributor.author	Zuba, Wiktor
dc.date.accessioned	2021-10-08T06:56:00Z
dc.date.available	2021-10-08T06:56:00Z
dc.date.defence	2021-10-19
dc.date.issued	2021-10-08
dc.description.additional	Link archiwalny https://depotuw.ceon.pl/handle/item/4022
dc.description.promoter	Rytter, Wojciech
dc.identifier.uri	https://repozytorium.uw.edu.pl//handle/item/4022
dc.language.iso	en
dc.rights	FairUse
dc.subject.en	internal dictionary
dc.subject.en	two-dimensional string
dc.subject.en	runs
dc.subject.en	antipower
dc.subject.en	subword
dc.subject.en	factor
dc.subject.en	counting
dc.subject.en	enumeration
dc.subject.en	data structure
dc.subject.en	algorithm
dc.subject.en	słownik wewnętrzny
dc.subject.en	słowo dwuwymiarowe
dc.subject.en	maksymalne powtórzenie
dc.subject.en	antypotęga
dc.subject.en	podsłowo
dc.subject.en	zliczanie
dc.subject.en	wyliczanie
dc.subject.en	struktura danych
dc.subject.en	algorytm
dc.title	Efficient enumerations in words
dc.title.alternative	Efektywne zliczanie w słowach
dc.type	DoctoralThesis
dspace.entity.type	Publication

Licencja

Efficient enumerations in words

Opcje