When Does Additional Information Improve Accuracy of RNA Secondary Structure Prediction?

Logan Rose, Luis Sanchez Giraldo, Duc Nguyen, Matthew Wheeler, David Murrugarra

Producción científica: Articlerevisión exhaustiva

Resumen

The secondary structure of an RNA sequence plays an important role in determining its function, and accurate prediction of the structure is still a major goal in computational biology. Improvements in the prediction accuracy of the secondary structure can be achieved via auxiliary information. In this paper, we study features based on suboptimal formations competing with the minimum-free energy formation and investigate their role in determining the improvement of accuracy via auxiliary information, which we call directability. Here, we introduce a similarity measure among competing substructures called profiles. Then, we present an n-dimensional representation of the profiles which allows the use of topological data analysis (i.e., persistence landscapes) to obtain different metrics that represent topological features. Then, we built random forest classifiers using these novel features. We show how the similarity feature is more important for classifiers trained on sequences with similar structures while the topological features are more important for classifiers trained on sequences with dissimilar structures. We perform extensive testing on two sets of RNA sequences where we studied the sensitivity of the classification accuracy and their feature importance.

Idioma originalEnglish
Páginas (desde-hasta)10701-10712
Número de páginas12
PublicaciónJournal of Chemical Information and Modeling
Volumen65
N.º19
DOI
EstadoPublished - oct 13 2025

Nota bibliográfica

Publisher Copyright:
© 2025 American Chemical Society

Financiación

Part of this work was supported by a pilot grant (to D.M., L.S.G., and L.R.) from the University of Kentucky AI/ML Hub initiative. D.M. was partially supported by a Collaboration grant (# 850896) from the Simons Foundation and a grant (NSF: # 2424633) from the National Science Foundation. D.N. was partially supported by the National Science Foundation (NSF: # 2516126, # 2151802, and # 2534947). M.W. was partially supported by the National Science Foundation (NSF:#2424633) and the National Institute of Health (NIH: R01-AI135128).

FinanciadoresNúmero del financiador
Simons Foundation2424633
National Science Foundation Arctic Social Science Program2534947, 2151802, 2516126
University of Kentucky850896
National Institutes of Health (NIH)R01-AI135128

    ASJC Scopus subject areas

    • General Chemistry
    • General Chemical Engineering
    • Computer Science Applications
    • Library and Information Sciences

    Huella

    Profundice en los temas de investigación de 'When Does Additional Information Improve Accuracy of RNA Secondary Structure Prediction?'. En conjunto forman una huella única.

    Citar esto