This thesis explores the operationalization and challenges to the implementation of the FAIR (Findable, Accessible, Interoperable, Reusable) data principles in two very different scientific domains (riverine litter data management; nanopore sequencing data analysis). It responded to a growing need for structured frameworks for data management in the wake of exponential growth in scientific data generation, as many earnestly present and operationalized datasets suffer from fragmentation, no documentation, and are difficult to reuse. The first project sought to standardize and migrate a large riverine litter dataset from 716 micro research campaigns, with 12,143 samples, across waterways around the world from a heterogeneous excel format to a normalized MySQL database. The second project developed an automated Pythonbased tool for validating and analyzing Oxford Nanopore sequencing outputs that improved quality assessment and metadata extraction from a variety of output file formats. Both applications showed marked improvements in data utility, despite a series of challenges, which were data heterogeneity, persistent identifier adoption, and limitations on resources. The riverine litter database was able to undergo full migration to using more standardized terminology and hierarchical classification systems enabling cross-continental comparisons. The sequencing analysis application was able to implement automated quality assessments, context-aware reports, and tiered metadata extraction activities which shortened the time-to-insight for sequencing run assessments. Despite the recognition that FAIR principles would require considerable adaptations specific to the scientific domain, the results showed each time the adaptations were successfully completed the practical benefits outweighed the adaptations in terms of improved data discoverability, decreased redundancy, and improved reproducibility. The project delivers economic impact and highlights the dangers of duplicated, wasted research efforts if FAIR principles are adopted, but stresses the need for institutional policies, specialized training, and long-lasting supportive technical infrastructure for further FAIR implementation across science.
FAIR Data in Practice: Riverine Litter Data Standardization and Nanopore Sequencing Workflow Validation / Zidaric, Samuel. - (2025 May 26).
FAIR Data in Practice: Riverine Litter Data Standardization and Nanopore Sequencing Workflow Validation
ZIDARIC, SAMUEL
2025-05-26
Abstract
This thesis explores the operationalization and challenges to the implementation of the FAIR (Findable, Accessible, Interoperable, Reusable) data principles in two very different scientific domains (riverine litter data management; nanopore sequencing data analysis). It responded to a growing need for structured frameworks for data management in the wake of exponential growth in scientific data generation, as many earnestly present and operationalized datasets suffer from fragmentation, no documentation, and are difficult to reuse. The first project sought to standardize and migrate a large riverine litter dataset from 716 micro research campaigns, with 12,143 samples, across waterways around the world from a heterogeneous excel format to a normalized MySQL database. The second project developed an automated Pythonbased tool for validating and analyzing Oxford Nanopore sequencing outputs that improved quality assessment and metadata extraction from a variety of output file formats. Both applications showed marked improvements in data utility, despite a series of challenges, which were data heterogeneity, persistent identifier adoption, and limitations on resources. The riverine litter database was able to undergo full migration to using more standardized terminology and hierarchical classification systems enabling cross-continental comparisons. The sequencing analysis application was able to implement automated quality assessments, context-aware reports, and tiered metadata extraction activities which shortened the time-to-insight for sequencing run assessments. Despite the recognition that FAIR principles would require considerable adaptations specific to the scientific domain, the results showed each time the adaptations were successfully completed the practical benefits outweighed the adaptations in terms of improved data discoverability, decreased redundancy, and improved reproducibility. The project delivers economic impact and highlights the dangers of duplicated, wasted research efforts if FAIR principles are adopted, but stresses the need for institutional policies, specialized training, and long-lasting supportive technical infrastructure for further FAIR implementation across science.| File | Dimensione | Formato | |
|---|---|---|---|
|
Zidaric_Samuel_Thesis_MDMC2024-2025.pdf
accesso aperto
Tipologia:
Tesi
Licenza:
Non specificato
Dimensione
1.3 MB
Formato
Adobe PDF
|
1.3 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


