Most cancer genomic data are generated from bulk samples composed of mixtures of cancer subpopulations, as well as normal cells. Subclonal reconstruction methods based on machine learning aim to separate those subpopulations in a sample and infer their evolutionary history. However, current approaches are entirely data driven and agnostic to evolutionary theory. We demonstrate that systematic errors occur in the analysis if evolution is not accounted for, and this is exacerbated with multi-sampling of the same tumor. We present a novel approach for model-based tumor subclonal reconstruction, called MOBSTER, which combines machine learning with theoretical population genetics. Using public whole-genome sequencing data from 2,606 samples from different cohorts, new data and synthetic validation, we show that this method is more robust and accurate than current techniques in single-sample, multiregion and longitudinal data. This approach minimizes the confounding factors of nonevolutionary methods, thus leading to more accurate recovery of the evolutionary history of human cancers.

Subclonal reconstruction of tumors by using machine learning and population genetics / Caravagna, G.; Heide, T.; Williams, M. J.; Zapata, L.; Nichol, D.; Chkhaidze, K.; Cross, W.; Cresswell, G. D.; Werner, B.; Acar, A.; Chesler, L.; Barnes, C. P.; Sanguinetti, G.; Graham, T. A.; Sottoriva, A.. - In: NATURE GENETICS. - ISSN 1061-4036. - 52:(2020), pp. 898-907. [10.1038/s41588-020-0675-5]

Subclonal reconstruction of tumors by using machine learning and population genetics

Sanguinetti, G.;
2020-01-01

Abstract

Most cancer genomic data are generated from bulk samples composed of mixtures of cancer subpopulations, as well as normal cells. Subclonal reconstruction methods based on machine learning aim to separate those subpopulations in a sample and infer their evolutionary history. However, current approaches are entirely data driven and agnostic to evolutionary theory. We demonstrate that systematic errors occur in the analysis if evolution is not accounted for, and this is exacerbated with multi-sampling of the same tumor. We present a novel approach for model-based tumor subclonal reconstruction, called MOBSTER, which combines machine learning with theoretical population genetics. Using public whole-genome sequencing data from 2,606 samples from different cohorts, new data and synthetic validation, we show that this method is more robust and accurate than current techniques in single-sample, multiregion and longitudinal data. This approach minimizes the confounding factors of nonevolutionary methods, thus leading to more accurate recovery of the evolutionary history of human cancers.
2020
52
898
907
Caravagna, G.; Heide, T.; Williams, M. J.; Zapata, L.; Nichol, D.; Chkhaidze, K.; Cross, W.; Cresswell, G. D.; Werner, B.; Acar, A.; Chesler, L.; Barnes, C. P.; Sanguinetti, G.; Graham, T. A.; Sottoriva, A.
File in questo prodotto:
File Dimensione Formato  
s41588-020-0675-5-compresso.pdf

non disponibili

Tipologia: Versione Editoriale (PDF)
Licenza: Non specificato
Dimensione 9.2 MB
Formato Adobe PDF
9.2 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/117194
Citazioni
  • ???jsp.display-item.citation.pmc??? 23
  • Scopus 55
  • ???jsp.display-item.citation.isi??? 50
social impact