(Left) Predicted vs. experimental values of flash point on full integrated distribution test set for a deep learning method. (Middle) Xiaoyu Sun and (Right) Nathaniel J. Krakauer from skunkworks who are co-lead authors on the paper we wrote on this work.
The goal of this work was to explore the efficacy of deep learning vs. more traditional human-designed featurization for predicting organic molecule flash points. Flash points of organic molecules play an important role in preventing flammability hazards and large databases of measured values exist, although millions of compounds remain unmeasured. To rapidly extend existing data to new compounds many researchers have used quantitative structure-property relationship (QSPR) analysis to effectively predict flash points. In recent years graph based deep learning (GBDL) has emerged as a powerful alternative method to traditional QSPR. In this paper we assess GBDL models by comparing against 12 previous QSPR studies using more traditional methods. Our result shows that GBDL yields slightly worse but comparable performance with previous QSPR studies. To further explore GBDL models, we collected the largest flash point dataset to date, which contains 10575 unique molecules. Overall our results showed that deep learning was not clearly advantageous for this problem. This project involved 9 Skunkworks researchers over a few years. The research has been published in: Sun, Xiaoyu, Nathaniel J. Krakauer, Alexander Politowicz, Wei Ting Chen, Qiying Li, Zuoyi Li, Xianjia Shao, et al. 2020. “Assessing Graph-Based Deep Learning Models for Predicting Flash Point.” Molecular Informatics 39 (6): 1–14. https://doi.org/10.1002/minf.201900101.
(Left) CT scan of pancreatic cyst and (Right) Adam Awe, a medical student and lead author, presenting the results at the Shapiro forum at the UW Medical School
Current diagnostic and treatment modalities for pancreatic cysts (PCs) are invasive and are associated with patient morbidity. The goal of this project was to develop and evaluate machine learning algorithms to delineate mucinous from non-mucinous PCs using non-invasive CT-based radiomics. This work used features extracted from CT images to assess the nature of pancreatic cysts, with the long term goal of using non-invasive CT to determine the potential health risks of such cysts and assist doctors on making choices about whether to perform surgery. Overall, 99 patients and 103 PCs were included in the analyses. Eighty (78%) patients had mucinous PCs on surgical
pathology. Using multiple fivefold cross validations, the texture features only and combined XGBoost mucinous classifiers demonstrated an area under the curve of 0.72 ± 0.14 and 0.73 ± 0.14, respectively. By SHAP analysis, root mean square, mean attenuation, and kurtosis were the most predictive features in the texture features only model. Root mean square, cyst location, and mean attenuation were the most predictive features in the combined model. Machine learning principles can be applied to PC texture features to create a mucinous phenotype classifier. Model performance did not improve with the combined model. However, specific radiomic, radiologic, and clinical features most predictive in our models can be identified using SHAP analysis. This project was in collaboration with the visionary Machine Learning for Medical Imaging (ML4AI) program at UW. It involved 7 students across multiple departments on the UW campus. This work was published in: Awe, Adam M, Michael M Vanden Heuvel, Tianyuan Yuan, Victoria R Rendell, Mingren Shen, Agrima Kampani, Shanchao Liang, Dane D Morgan, Emily R Winslow, and Meghan G Lubner. 2022. “Machine Learning Principles Applied to CT Radiomics to Predict Mucinous Pancreatic Cysts.” Abdominal Radiology 47 (1): 221–31. https://doi.org/10.1007/s00261-021-03289-0.