Abstract: ICU readmission is associated with longer hospitalization, mortality and adverse outcomes. An early recognition of ICU readmission can help prevent patients from worse situation and lead to lower treatment cost. As the abundance of Electronics Health Records (EHR), it is popular to design clinical decision tools with machine learning techniques manipulating on healthcare large scale data. To this end, we designed data-driven predictive models to estimate the risk of Intensive Care Unit (ICU) readmission. The discharge summary of each hospital admission was carefully represented by natural language processing algorithms. Unified Medical Language System (UMLS) was further used to standardize inconsistency of discharge summaries. 5 machine learning classifiers including naïve Bayes, support vector machine, logistic regression, gradient boosting decision tree and 2 feature representations including Bag-of-Words and Bag-of-CUIs were adopted to construct predictive configurations. The best configuration yielded a competitive AUC of 0.748. High contribution words and medical terms were further investigated to ensure that they were clinical meaningful. A comparative study between two feature representations were also discussed. Our work suggests that natural language processing of discharge summaries is capable to extract meaningful information from discharge summary automatically and to send clinicians the warning of unplanned 30-day readmission upon discharge.Abstract: ICU readmission is associated with longer hospitalization, mortality and adverse outcomes. An early recognition of ICU readmission can help prevent patients from worse situation and lead to lower treatment cost. As the abundance of Electronics Health Records (EHR), it is popular to design clinical decision tools with machine learning techniques man...Show More