SISSA DIGITAL LIBRARYInstitutional Research Information System (Statistiche: prodotti, OA)
Per informazioni contatta [email protected]

Predicting the secondary structure of RNA is a core challenge in computational biology, essential for understanding molecular function and designing novel therapeutics. The field has evolved from foundational but accuracy-limited thermodynamic approaches to a new data-driven paradigm dominated by machine learning and deep learning. These models learn folding patterns directly from data, leading to significant performance gains. This review surveys the modern landscape of these methods, covering single-sequence, evolutionary-based, and hybrid models that blend machine learning with biophysics. A central theme is the field's "generalization crisis," where powerful models were found to fail on new RNA families, prompting a community-wide shift to stricter, homology-aware benchmarking. In response to the underlying challenge of data scarcity, RNA foundation models have emerged, learning from massive, unlabeled sequence corpora to improve generalization. Finally, we look ahead to the next set of major hurdles-including the accurate prediction of complex motifs like pseudoknots, scaling to kilobase-length transcripts, incorporating the chemical diversity of modified nucleotides, and shifting the prediction target from static structures to the dynamic ensembles that better capture biological function. We also highlight the need for a standardized, prospective benchmarking system to ensure unbiased validation and accelerate progress.

Machine Learning for RNA Secondary Structure Prediction: a review of current methods and challenges / Sacco, G., Bussi, G., Sanguinetti, G.. - In: RNA. - ISSN 1355-8382. - 32:4(2026), pp. 443-456. [10.1261/rna.080840.125]

Machine Learning for RNA Secondary Structure Prediction: a review of current methods and challenges

Sacco, Giuseppe;Bussi, Giovanni;Sanguinetti, Guido

2026-01-01

Abstract

Predicting the secondary structure of RNA is a core challenge in computational biology, essential for understanding molecular function and designing novel therapeutics. The field has evolved from foundational but accuracy-limited thermodynamic approaches to a new data-driven paradigm dominated by machine learning and deep learning. These models learn folding patterns directly from data, leading to significant performance gains. This review surveys the modern landscape of these methods, covering single-sequence, evolutionary-based, and hybrid models that blend machine learning with biophysics. A central theme is the field's "generalization crisis," where powerful models were found to fail on new RNA families, prompting a community-wide shift to stricter, homology-aware benchmarking. In response to the underlying challenge of data scarcity, RNA foundation models have emerged, learning from massive, unlabeled sequence corpora to improve generalization. Finally, we look ahead to the next set of major hurdles-including the accurate prediction of complex motifs like pseudoknots, scaling to kilobase-length transcripts, incorporating the chemical diversity of modified nucleotides, and shifting the prediction target from static structures to the dynamic ensembles that better capture biological function. We also highlight the need for a standardized, prospective benchmarking system to ensure unbiased validation and accelerate progress.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Rivista
	
				RNA
			
	Numero del volume
	
				32
			
	Fascicolo
	
				4
			
	Da pagina
	
				443
			
	A pagina
	
				456
			
	Codice DOI
	
				https://dx.doi.org/10.1261/rna.080840.125
			
	Fulltext via DOI
	
				10.1261/rna.080840.125
			
	URL
	
				https://arxiv.org/abs/2511.02622
			
	Tutti gli autori
	
						Sacco, Giuseppe; Bussi, Giovanni; Sanguinetti, Guido
					
	Appare nelle tipologie:
	
				1.8 Review in journal

File in questo prodotto:

File	Dimensione	Formato
RNA-2026-Sacco-rna.080840.125.pdf accesso aperto Descrizione: postprint Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 899 kB Formato Adobe PDF Visualizza/Apri	899 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/149890

Citazioni

1

1

1

social impact