This thesis discusses and presents some developments toward new data services within the EU NFFAEUROPE project. The work performed originates by the need to rationalize and organize large scientific data-sets using a FAIR approach. The activity leverages on results obtained in previous MHPC work and tackle some of the issues about FAIR principle that are coming out due to an increase in size of variety of the original datasets. More specifically the overall goal of the thesis is to setup well organized data services to manage all the SEM images coming from different sources and partner within the NFFA-EUROPE project. The specific goals within this thesis are the following; • Creation of python application to collect and enrich metadata for SEM images coming from different sources. • Develop a massive parallel processing approach to be able to reduce time in collecting metadata on a large amount of images. • Plan and develop of an easy to setup and portable computational ecosystem to accomplish the above goal based on Kubernetes and Spark, with the idea to easily deploy in on different computational infrastructure. • Measure performance on different computational infrastructure of the massive data processing.
Data management tools for NFFA-EUROPE project / SALEH HASSANIN KHALIL, Ahmed Mohamed. - (2019 Dec 20).
Data management tools for NFFA-EUROPE project
SALEH HASSANIN KHALIL, Ahmed Mohamed
2019-12-20
Abstract
This thesis discusses and presents some developments toward new data services within the EU NFFAEUROPE project. The work performed originates by the need to rationalize and organize large scientific data-sets using a FAIR approach. The activity leverages on results obtained in previous MHPC work and tackle some of the issues about FAIR principle that are coming out due to an increase in size of variety of the original datasets. More specifically the overall goal of the thesis is to setup well organized data services to manage all the SEM images coming from different sources and partner within the NFFA-EUROPE project. The specific goals within this thesis are the following; • Creation of python application to collect and enrich metadata for SEM images coming from different sources. • Develop a massive parallel processing approach to be able to reduce time in collecting metadata on a large amount of images. • Plan and develop of an easy to setup and portable computational ecosystem to accomplish the above goal based on Kubernetes and Spark, with the idea to easily deploy in on different computational infrastructure. • Measure performance on different computational infrastructure of the massive data processing.File | Dimensione | Formato | |
---|---|---|---|
Saleh Hossanin.pdf
accesso aperto
Descrizione: MHPC Thesis
Tipologia:
Tesi
Licenza:
Non specificato
Dimensione
3.37 MB
Formato
Adobe PDF
|
3.37 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.