Knot or not? Identifying unknotted proteins in knotted families with sequence-based Machine Learning model

Uproszczony widok
dc.abstract.enKnotted proteins, although scarce, are crucial structural components of certain protein families, and their roles continue to be a topic of intense research. Capitalizing on the vast collection of protein structure predictions offered by AlphaFold (AF), this study computationally examines the entire UniProt database to create a robust dataset of knotted and unknotted proteins. Utilizing this dataset, we develop a machine learning (ML) model capable of accurately predicting the presence of knots in protein structures solely from their amino acid sequences. We tested the model's capabilities on 100 proteins whose structures had not yet been predicted by AF and found agreement with our local prediction in 92% cases. From the point of view of structural biology, we found that all potentially knotted proteins predicted by AF can be classified only into 17 families. This allows us to discover the presence of unknotted proteins in families with a highly conserved knot. We found only three new protein families: UCH, DUF4253, and DUF2254, that contain both knotted and unknotted proteins, and demonstrate that deletions within the knot core could potentially account for the observed unknotted (trivial) topology. Finally, we have shown that in the majority of knotted families (11 out of 15), the knotted topology is strictly conserved in functional proteins with very low sequence similarity. We have conclusively demonstrated that proteins AF predicts as unknotted are structurally accurate in their unknotted configurations. However, these proteins often represent nonfunctional fragments, lacking significant portions of the knot core (amino acid sequence).
dc.affiliationUniwersytet Warszawski
dc.affiliation.departmentWydział Fizyki
dc.affiliation.departmentWydział Matematyki, Informatyki i Mechaniki
dc.affiliation.otherCentrum Nowych Technologii UW CeNT
dc.contributor.authorSikora, Maciej
dc.contributor.authorKlimentova, Eva
dc.contributor.authorUchal, Dawid
dc.contributor.authorSramkova, Denisa
dc.contributor.authorPerlińska, Agata
dc.contributor.authorNguyen, Mai Lan
dc.contributor.authorKorpacz, Marta
dc.contributor.authorMalinowska, Roksana
dc.contributor.authorNowakowski, Szymon
dc.contributor.authorRubach, Paweł
dc.contributor.authorSimecek, Petr
dc.contributor.authorSułkowska, Joanna Ida
dc.date.accessioned2024-10-29T08:13:57Z
dc.date.available2024-10-29T08:13:57Z
dc.date.issued2024-06-18
dc.description.grantnumber#UMO-2018/31/B/NZ1/04016
dc.description.grantnumber2021/43/I/NZ1/03341
dc.description.grantnumberReg. No. 204/07/1592
dc.description.granttitleNational Science Centre
dc.description.granttitleNational Science Centre
dc.description.granttitle“Biological code of knots – identification of knotted patterns in biomolecules via AI approach” Grant Agency of Czech Republic
dc.description.number7
dc.description.volume33
dc.identifier.doi10.1002/pro.4998
dc.identifier.issn0961-8368
dc.identifier.orcid0009-0003-0289-276X
dc.identifier.orcid0009-0006-0298-1975
dc.identifier.orcid0000-0002-3746-5114
dc.identifier.orcid0000-0003-2806-1190
dc.identifier.orcid0000-0001-8117-0391
dc.identifier.orcid0009-0005-5269-7174
dc.identifier.orcid0000-0002-1939-9512
dc.identifier.orcid0000-0001-5487-609X
dc.identifier.orcid0000-0002-2922-7183
dc.identifier.orcid0000-0003-2452-0724
dc.identifier.urihttps://repozytorium.uw.edu.pl//handle/item/160569
dc.languageen
dc.pbn.affiliationbiological sciences
dc.relation.ispartofProtein Science
dc.relation.pagese4998
dc.rightsCC-BY
dc.sciencecloudsend
dc.share.typeOPEN_REPOSITORY
dc.subject.enKnots in proteins
dc.subject.enProtein topology
dc.subject.enSPOUT family proteins
dc.subject.enDeep learning
dc.subject.enAlphaFold
dc.titleKnot or not? Identifying unknotted proteins in knotted families with sequence-based Machine Learning model
dc.typeJournalArticle
dspace.entity.typePublication