Exploiting Text Data to Improve Critical Care Mortality Prediction

0 views

Create Account or Sign In to post comments

There has been a significant increase in the quantity, quality, and availability of unstructured clinical notes, motivating numerous machine learning approaches that leverage such data to improve predictive capabilities in medical settings. However, the question of whether patient group properties under observation influence the effectiveness of including unstructured data sources remains unanswered. The inclusion of unstructured clinical notes adds both an acquisition cost such as recording the notes by a clinician and converting records to an appropriate digital format, and a computational cost such as more complex and computationally expensive machine learning algorithms. Thus, it is important to understand the potential benefits offered by these unstructured data sources before attempting to use them. We empirically evaluate the performance impact of including unstructured clinical notes when performing mortality prediction by reproducing 29 previously published studies in this area. We use two common feature extraction methods, Word2Vec and Bag-Of-Words, with two existing machine learning models, XGBoost and Logistic Regression. Our results show that these off-the-shelf approaches show significant performance differences depending on the properties of the patient group under study. Additionally, we identify several key findings that can be used to predict whether the inclusion of data from unstructured clinical notes will be beneficial based on properties of the patient groups.

December 7, 2020