Praca doktorska
Ɓadowanie...
Miniatura
Licencja

CC-BYCC-BY - Uznanie autorstwa

Algorithms and Computational Models in Chemical Analysis

Autor
SkoraczyƄski, Grzegorz
Promotor
Miasojedow, BƂaĆŒej
Data publikacji
2023-09-26
Abstrakt (EN)

In the present work, we undertake two problems of computational chemistry: retention time alignment and synthetic accessibility scoring. For the former one, we present the Alignstein, an algorithm for LC-MS retention time alignment by feature matching. We show that the algorithm can find the correspondence appropriately even for signals of swapped elution order. We achieve this by taking advantage of the generalization of the Wasserstein distance as mass spectra and feature dissimilarity measure. It allows us to incorporate all signal information and compare features not only by monoisotopic mass but also by their spatial properties or signal distribution. We validate the algorithm on publicly available benchmark datasets obtaining competitive results. Finally, we show that it can detect the information contained in the tandem mass spectrum by the spatial properties of LC-MS chromatograms. For the latter problem, we design three different synthetic accessibility scores. The first one is based on a manually prepared set of descriptors, computed on molecules from the database. This model uses stochastic gradient descent to model the distribution of descriptors and predict the likelihood of molecule structure. The second model is based on the same set of descriptors but applies supervised learning to predict compound synthetic accessibility. It requires creating a part dataset representing infeasible molecules, for which we use the bootstrap method. The last model is based on semisupervised learning for outliers detection: One Class SVM. It does not require creating part of the dataset corresponding to non-existent molecules. Moreover, we trained it on extended-connectivity fingerprints, which allows for capturing all possible structural patterns. In this work, we discuss their applicability as a preretrosynthesis heuristic, their limitations, as well as verify the correctness of their predictions. One of the challenges of designing new synthetic accessibility scores is their verification with a ground-truth dataset. To this point, we assess if synthetic accessibility scores: SAscore, SCScore, RAscore, SYBA, and previously described OCSVM-based score can reliably predict out-comes and complexity of the retrosynthesis planning performed by the AiZynthFinder tool. Moreover, by in-depth analysis of AiZynthFinder search trees, we assess if synthetic accessibility scores can speed up retrosynthesis planning by better prioritizing partial synthetic routes.

SƂowa kluczowe EN
synthetic accessibility
Wasserstein distance
retention time alignment
syntezowalnoƛć
odlegƂoƛć Wassersteina
uliniowienie czasu retencji
Inny tytuƂ
Algorytmy i modele obliczeniowe w analizie chemicznej
Data obrony
2023-10-06
Licencja otwartego dostępu
Uznanie autorstwa
Uznanie autorstwa