Artykuł w czasopiśmie
Brak miniatury
Licencja
Linear Time Construction of Cover Suffix Tree and Applications
dc.abstract.en | The Cover Suffix Tree (CST) of a string T is the suffix tree of T with additional explicit nodes corresponding to halves of square substrings of T. In the CST an explicit node corresponding to a substring C of T is annotated with two numbers: the number of non-overlapping consecutive occurrences of C and the total number of positions in T that are covered by occurrences of C in T. Kociumaka et al. (Algorithmica, 2015) have shown how to compute the CST of a length-n string in (n log n) time. We give an algorithm that computes the same data structure in (n) time assuming that T is over an integer alphabet and discuss its implications. A string C is a cover of text T if occurrences of C in T cover all positions of T; C is a seed of T if occurrences and overhangs (i.e., prefix-suffix occurrences) of C in T cover all positions of T. An α-partial cover (α-partial seed) of text T is a string C whose occurrences in T (occurrences and overhangs in T, respectively) cover at least α positions of T. Kociumaka et al. (Algorithmica, 2015; Theor. Comput. Sci., 2018) have shown that knowing the CST of a length-n string T, one can compute a linear-sized representation of all seeds of T as well as all shortest α-partial covers and seeds in T for a given α in (n) time. Thus our result implies linear-time algorithms computing these notions of quasiperiodicity. The resulting algorithm computing seeds is substantially different from the previous one (Kociumaka et al., SODA 2012, ACM Trans. Algorithms, 2020); in particular, it is non-recursive. Kociumaka et al. (Algorithmica, 2015) proposed an (n log n)-time algorithm for computing a shortest α-partial cover for each α = 1,…,n; we improve this complexity to (n). Our results are based on a new combinatorial characterization of consecutive overlapping occurrences of a substring S of T in terms of the set of runs (see Kolpakov and Kucherov, FOCS 1999) in T. This new insight also leads to an (n)-sized index for reporting overlapping consecutive occurrences of a given pattern P of length m in the optimal (m+output) time, where output is the number of occurrences reported. In comparison, a general index for reporting bounded-gap consecutive occurrences of Navarro and Thankachan (Theor. Comput. Sci., 2016) uses (n log n) space. |
dc.affiliation | Uniwersytet Warszawski |
dc.conference.country | Holandia |
dc.conference.datefinish | 2023-09-08 |
dc.conference.datestart | 2023-09-04 |
dc.conference.place | Amsterdam |
dc.conference.series | European Symposium on Algorithms |
dc.conference.series | European Symposium on Algorithms |
dc.conference.seriesshortcut | ESA |
dc.conference.shortcut | ESA 2023 |
dc.conference.weblink | https://algo-conference.org/2023/esa/ |
dc.contributor.author | Radoszewski, Jakub |
dc.date.accessioned | 2024-01-25T05:11:39Z |
dc.date.available | 2024-01-25T05:11:39Z |
dc.date.copyright | 2023-08-30 |
dc.date.issued | 2023 |
dc.description.accesstime | AT_PUBLICATION |
dc.description.finance | Publikacja bezkosztowa |
dc.description.version | FINAL_PUBLISHED |
dc.identifier.doi | 10.4230/LIPICS.ESA.2023.89 |
dc.identifier.uri | https://repozytorium.uw.edu.pl//handle/item/111289 |
dc.identifier.weblink | https://drops-beta.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.89 |
dc.language | eng |
dc.pbn.affiliation | computer and information sciences |
dc.relation.pages | 89:1-17 |
dc.rights | CC-BY |
dc.sciencecloud | nosend |
dc.subject.en | cover (quasiperiod) |
dc.subject.en | seed |
dc.subject.en | suffix tree |
dc.subject.en | run (maximal repetition) |
dc.title | Linear Time Construction of Cover Suffix Tree and Applications |
dc.type | JournalArticle |
dspace.entity.type | Publication |