Artykuł w czasopiśmie
Brak miniatury
Licencja
Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications
cris.lastimport.scopus | 2024-02-12T20:29:11Z |
dc.abstract.en | An elastic-degenerate (ED) string T is a sequence of n sets T[1],…,T[n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T, respectively. The language of T is defined as ℒ(T) = {S_1 ⋯ S_n : S_i ∈ T[i] for all i ∈ [1,n]}. ED strings have been introduced to represent a set of closely-related DNA sequences, also known as a pangenome. The basic question we investigate here is: Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call the underlying problem the ED String Intersection (EDSI) problem. For two ED strings T₁ and T₂ of lengths n₁ and n₂, cardinalities m₁ and m₂, and sizes N₁ and N₂, respectively, we show the following: - There is no ((N₁N₂)^{1-ε})-time algorithm, thus no ((N₁m₂+N₂m₁)^{1-ε})-time algorithm and no ((N₁n₂+N₂n₁)^{1-ε})-time algorithm, for any constant ε > 0, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false. - There is no combinatorial ((N₁+N₂)^{1.2-ε}f(n₁,n₂))-time algorithm, for any constant ε > 0 and any function f, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false. - An (N₁log N₁log n₁+N₂log N₂log n₂)-time algorithm for outputting a compact (RLE) representation of the intersection language of two unary ED strings. In the case when T₁ and T₂ are given in a compact representation, we show that the problem is NP-complete. - An (N₁m₂+N₂m₁)-time algorithm for EDSI. - An Õ(N₁^{ω-1}n₂+N₂^{ω-1}n₁)-time algorithm for EDSI, where ω is the exponent of matrix multiplication; the Õ notation suppresses factors that are polylogarithmic in the input size. We also show that the techniques we develop have applications outside of ED string comparison. |
dc.affiliation | Uniwersytet Warszawski |
dc.conference.country | Francja |
dc.conference.datefinish | 2023-06-28 |
dc.conference.datestart | 2023-06-26 |
dc.conference.place | Marne-la-Vallée |
dc.conference.series | Combinatorial Pattern Matching |
dc.conference.series | Combinatorial Pattern Matching |
dc.conference.seriesshortcut | CPM |
dc.conference.shortcut | CPM 2023 |
dc.conference.weblink | https://cpm2023.u-pem.fr/ |
dc.contributor.author | Zuba, Wiktor |
dc.contributor.author | Sweering, Michelle |
dc.contributor.author | Radoszewski, Jakub |
dc.contributor.author | Pissis, Solon P. |
dc.contributor.author | Pisanti, Nadia |
dc.contributor.author | Moses, Njagi |
dc.contributor.author | Gabory, Esteban |
dc.date.accessioned | 2024-01-24T19:50:47Z |
dc.date.available | 2024-01-24T19:50:47Z |
dc.date.copyright | 2023-06-21 |
dc.date.issued | 2023 |
dc.description.accesstime | AT_PUBLICATION |
dc.description.finance | Publikacja bezkosztowa |
dc.description.version | FINAL_PUBLISHED |
dc.identifier.doi | 10.4230/LIPICS.CPM.2023.11 |
dc.identifier.uri | https://repozytorium.uw.edu.pl//handle/item/103369 |
dc.identifier.weblink | https://drops.dagstuhl.de/opus/volltexte/2023/17965/ |
dc.language | eng |
dc.pbn.affiliation | computer and information sciences |
dc.relation.pages | 11:1-20 |
dc.rights | CC-BY |
dc.sciencecloud | nosend |
dc.subject.en | elastic-degenerate string |
dc.subject.en | sequence comparison |
dc.subject.en | languages intersection |
dc.subject.en | pangenome |
dc.subject.en | acronym identification |
dc.title | Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications |
dc.type | JournalArticle |
dspace.entity.type | Publication |