Artykuł w czasopiśmie
Brak miniatury
Licencja

CC-BYCC-BY - Uznanie autorstwa
 

Linear Time Construction of Cover Suffix Tree and Applications

dc.abstract.enThe Cover Suffix Tree (CST) of a string T is the suffix tree of T with additional explicit nodes corresponding to halves of square substrings of T. In the CST an explicit node corresponding to a substring C of T is annotated with two numbers: the number of non-overlapping consecutive occurrences of C and the total number of positions in T that are covered by occurrences of C in T. Kociumaka et al. (Algorithmica, 2015) have shown how to compute the CST of a length-n string in (n log n) time. We give an algorithm that computes the same data structure in (n) time assuming that T is over an integer alphabet and discuss its implications. A string C is a cover of text T if occurrences of C in T cover all positions of T; C is a seed of T if occurrences and overhangs (i.e., prefix-suffix occurrences) of C in T cover all positions of T. An α-partial cover (α-partial seed) of text T is a string C whose occurrences in T (occurrences and overhangs in T, respectively) cover at least α positions of T. Kociumaka et al. (Algorithmica, 2015; Theor. Comput. Sci., 2018) have shown that knowing the CST of a length-n string T, one can compute a linear-sized representation of all seeds of T as well as all shortest α-partial covers and seeds in T for a given α in (n) time. Thus our result implies linear-time algorithms computing these notions of quasiperiodicity. The resulting algorithm computing seeds is substantially different from the previous one (Kociumaka et al., SODA 2012, ACM Trans. Algorithms, 2020); in particular, it is non-recursive. Kociumaka et al. (Algorithmica, 2015) proposed an (n log n)-time algorithm for computing a shortest α-partial cover for each α = 1,…,n; we improve this complexity to (n). Our results are based on a new combinatorial characterization of consecutive overlapping occurrences of a substring S of T in terms of the set of runs (see Kolpakov and Kucherov, FOCS 1999) in T. This new insight also leads to an (n)-sized index for reporting overlapping consecutive occurrences of a given pattern P of length m in the optimal (m+output) time, where output is the number of occurrences reported. In comparison, a general index for reporting bounded-gap consecutive occurrences of Navarro and Thankachan (Theor. Comput. Sci., 2016) uses (n log n) space.
dc.affiliationUniwersytet Warszawski
dc.conference.countryHolandia
dc.conference.datefinish2023-09-08
dc.conference.datestart2023-09-04
dc.conference.placeAmsterdam
dc.conference.seriesEuropean Symposium on Algorithms
dc.conference.seriesEuropean Symposium on Algorithms
dc.conference.seriesshortcutESA
dc.conference.shortcutESA 2023
dc.conference.weblinkhttps://algo-conference.org/2023/esa/
dc.contributor.authorRadoszewski, Jakub
dc.date.accessioned2024-01-25T05:11:39Z
dc.date.available2024-01-25T05:11:39Z
dc.date.copyright2023-08-30
dc.date.issued2023
dc.description.accesstimeAT_PUBLICATION
dc.description.financePublikacja bezkosztowa
dc.description.versionFINAL_PUBLISHED
dc.identifier.doi10.4230/LIPICS.ESA.2023.89
dc.identifier.urihttps://repozytorium.uw.edu.pl//handle/item/111289
dc.identifier.weblinkhttps://drops-beta.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.89
dc.languageeng
dc.pbn.affiliationcomputer and information sciences
dc.relation.pages89:1-17
dc.rightsCC-BY
dc.sciencecloudnosend
dc.subject.encover (quasiperiod)
dc.subject.enseed
dc.subject.ensuffix tree
dc.subject.enrun (maximal repetition)
dc.titleLinear Time Construction of Cover Suffix Tree and Applications
dc.typeJournalArticle
dspace.entity.typePublication