Licencja
Efficient enumerations in words
Abstrakt (EN)
The dissertation presented here consists of a set of results in text processing algorithms from four different scientific publications. The papers focus on algorithms and data structures that allow efficient enumeration and counting of certain classes of factors. Most of the results are based on different variations of the same key property - regularity of the set of occurrences of factors of specific types. In ”Efficient Representation and Counting of Antipower Factors in Words” we focused on a class of k-antipowers (words consisting of k pairwise different segments of the same length). The results of the paper consist of effective algorithms for enumerating all occurrences, counting all occurrences and counting different k-antipower factors in a string, working in O(nk log k + output), O(nk log k) and O(nk^4 log k log n) time, respectively. The paper ”Counting Distinct Patterns in Internal Dictionary Matching” deals with a problem in which we are given an input word T and its internal dictionary D. We want to answer questions of the form: ”How many distinct words from D are the factors of T[i . . j]?”. We have shown few algorithms and data structures, that allows us to do that effectively. The algorithms differ in exploited properties and complexities (each one can be more effective from the other depending on the ratio between T and D). In ”The Number of Repetitions in 2D-Strings” we focused on the extensions of the notions of squares and runs to the case of two-dimensional strings. We gave new bounds on the maximal number of their occurrences in strings and effective algorithms for enumerating them. In the fourth paper titled ”Efficient Enumeration of Distinct Factors Using Package Representations”, we proposed a new compact representation of string factors. It allows us to obtain new effective algorithms for enumeration and counting of distinct factors it contains. In particular it allowed us to obtain a new simple algorithm for finding distinct squares and a faster algorithm for finding all distinct antipowers in a string (improving one of the results from the first of enclosed papers).