SISSA DIGITAL LIBRARYInstitutional Research Information System (Statistiche: prodotti, OA)
Per informazioni contatta [email protected]

The digitalization of clinical reports and the ever-growing usage of electronic health records make possible the collection of huge amounts of data. This data can be used to explore strategies to come in aid of both the patients and the clinical personnel, in terms of inference tools that could hint diagnostic decisions in a relevant manner, or as a general research pool. This project specifically makes use of reports of Computed Tomography Scans of patients with metastatic breast cancer. The aim of the thesis is to explore methods for multi-label text classification. The reports of interest are classified with a varying number of tags, depending on the location of the metastasis inferred from the report, that comes in the form of a free text description. To address this problem, I used a set of algorithms, namely logistic regression (multinomial and one-vs-rest), k-Nearest-Neighbors (with ’uniform’ and ’distance’ weight), Multi-k-Nearest-Neighbors, and Support Vector Classifier; these algorithms were fed with different types of word embeddings (TF-IDF and doc2vec). Moreover, the fastText library was explored in its integrated word embedding and text classification capabilities. At last, I used Fast-Bert, an open-source extension of Google’s BERT to specifically perform text classification.The results were not satisfying, due to the small size and the high class imbalance of the dataset. However, the investigation of different techniques has shed light to the promising possibilities of some of the strategies used.

Multi-label classification of computed tomography scan reports / Zampieri, M.. - (2019 Dec 20).

Multi-label classification of computed tomography scan reports

Zampieri, Matteo

2019-12-20

Abstract

The digitalization of clinical reports and the ever-growing usage of electronic health records make possible the collection of huge amounts of data. This data can be used to explore strategies to come in aid of both the patients and the clinical personnel, in terms of inference tools that could hint diagnostic decisions in a relevant manner, or as a general research pool. This project specifically makes use of reports of Computed Tomography Scans of patients with metastatic breast cancer. The aim of the thesis is to explore methods for multi-label text classification. The reports of interest are classified with a varying number of tags, depending on the location of the metastasis inferred from the report, that comes in the form of a free text description. To address this problem, I used a set of algorithms, namely logistic regression (multinomial and one-vs-rest), k-Nearest-Neighbors (with ’uniform’ and ’distance’ weight), Multi-k-Nearest-Neighbors, and Support Vector Classifier; these algorithms were fed with different types of word embeddings (TF-IDF and doc2vec). Moreover, the fastText library was explored in its integrated word embedding and text classification capabilities. At last, I used Fast-Bert, an open-source extension of Google’s BERT to specifically perform text classification.The results were not satisfying, due to the small size and the high class imbalance of the dataset. However, the investigation of different techniques has shed light to the promising possibilities of some of the strategies used.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
				20-dic-2019
			
	Relatore/i afferenti alla SISSA
	
				Heltai, Luca
			
	Relatore/i esterni
	
				Bortolussi, Luca
			
	Appare nelle tipologie:
	
				8.4 Master thesis in High Performance Computing (HPC)

File in questo prodotto:

File	Dimensione	Formato
Zampieri.pdf accesso aperto Descrizione: MHPC Thesis Tipologia: Tesi Licenza: Non specificato Dimensione 3.41 MB Formato Adobe PDF Visualizza/Apri	3.41 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11767/116065

Citazioni

ND

ND

ND

social impact