Licencja
Feasibility Algorithms for the Duplication-Loss Cost
Abstrakt (EN)
Gene duplications are a dominant force in creating genetic novelty, and studying their evolutionary history is benefiting various research areas. The gene duplication model, which was introduced more than 40 years ago, is widely used to infer duplication histories by resolving the discordance between the evolutionary history of a gene family and the species tree through which this family has evolved. Today, for many gene families lower bounds on the number of gene duplications that have occurred along each edge of the species tree, called duplication scenarios, can be derived, for example from genome duplications. Recently, the gene duplication model has been augmented to include duplication scenarios and to address the question of whether such a scenario is feasible for a given gene family. Non-feasibility of a duplication scenario for a gene family can provide a strong indication that this family might not be well-resolved, and identifying well-resolved gene families is a challenging task in evolutionary biology. However, genome duplications are often followed by episodes of gene losses, and lost genes can explain non-feasible duplication scenarios. Here, we address this major shortcoming of the augmented duplication model, by proposing a gene duplication model that incorporates duplication-loss scenarios. We describe efficient algorithms that decide whether a duplication-loss scenario is feasible for a gene family; and if so, compute a gene tree for the family that infers the minimum duplication-loss events satisfying the scenario.