Artykuł w czasopiśmie
Brak miniatury
Licencja

CC-BYCC-BY - Uznanie autorstwa
 

Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications

Uproszczony widok
cris.lastimport.scopus2024-02-12T20:29:11Z
dc.abstract.enAn elastic-degenerate (ED) string T is a sequence of n sets T[1],…,T[n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T, respectively. The language of T is defined as ℒ(T) = {S_1 ⋯ S_n : S_i ∈ T[i] for all i ∈ [1,n]}. ED strings have been introduced to represent a set of closely-related DNA sequences, also known as a pangenome. The basic question we investigate here is: Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call the underlying problem the ED String Intersection (EDSI) problem. For two ED strings T₁ and T₂ of lengths n₁ and n₂, cardinalities m₁ and m₂, and sizes N₁ and N₂, respectively, we show the following: - There is no ((N₁N₂)^{1-ε})-time algorithm, thus no ((N₁m₂+N₂m₁)^{1-ε})-time algorithm and no ((N₁n₂+N₂n₁)^{1-ε})-time algorithm, for any constant ε > 0, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false. - There is no combinatorial ((N₁+N₂)^{1.2-ε}f(n₁,n₂))-time algorithm, for any constant ε > 0 and any function f, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false. - An (N₁log N₁log n₁+N₂log N₂log n₂)-time algorithm for outputting a compact (RLE) representation of the intersection language of two unary ED strings. In the case when T₁ and T₂ are given in a compact representation, we show that the problem is NP-complete. - An (N₁m₂+N₂m₁)-time algorithm for EDSI. - An Õ(N₁^{ω-1}n₂+N₂^{ω-1}n₁)-time algorithm for EDSI, where ω is the exponent of matrix multiplication; the Õ notation suppresses factors that are polylogarithmic in the input size. We also show that the techniques we develop have applications outside of ED string comparison.
dc.affiliationUniwersytet Warszawski
dc.conference.countryFrancja
dc.conference.datefinish2023-06-28
dc.conference.datestart2023-06-26
dc.conference.placeMarne-la-Vallée
dc.conference.seriesCombinatorial Pattern Matching
dc.conference.seriesCombinatorial Pattern Matching
dc.conference.seriesshortcutCPM
dc.conference.shortcutCPM 2023
dc.conference.weblinkhttps://cpm2023.u-pem.fr/
dc.contributor.authorZuba, Wiktor
dc.contributor.authorSweering, Michelle
dc.contributor.authorRadoszewski, Jakub
dc.contributor.authorPissis, Solon P.
dc.contributor.authorPisanti, Nadia
dc.contributor.authorMoses, Njagi
dc.contributor.authorGabory, Esteban
dc.date.accessioned2024-01-24T19:50:47Z
dc.date.available2024-01-24T19:50:47Z
dc.date.copyright2023-06-21
dc.date.issued2023
dc.description.accesstimeAT_PUBLICATION
dc.description.financePublikacja bezkosztowa
dc.description.versionFINAL_PUBLISHED
dc.identifier.doi10.4230/LIPICS.CPM.2023.11
dc.identifier.urihttps://repozytorium.uw.edu.pl//handle/item/103369
dc.identifier.weblinkhttps://drops.dagstuhl.de/opus/volltexte/2023/17965/
dc.languageeng
dc.pbn.affiliationcomputer and information sciences
dc.relation.pages11:1-20
dc.rightsCC-BY
dc.sciencecloudnosend
dc.subject.enelastic-degenerate string
dc.subject.ensequence comparison
dc.subject.enlanguages intersection
dc.subject.enpangenome
dc.subject.enacronym identification
dc.titleComparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications
dc.typeJournalArticle
dspace.entity.typePublication