Informatics Skunkworks members are working on a range of research projects which are summarized in the table below. Prospective participants are encouraged to reach out to the listed team leads to inquire about participation in their respective projects.
In addition to projects hosted by mentors at UW, there are a number of projects hosted by collaborators at Boise State University (BSU) and students are able to participate remotely if they have strong interest in a project at another institution. Please see the Boise projects page here.
More detailed information such as project requirements can be found in the full project spreadsheet which feeds into the project table below. For project mentors interested in adding or editing existing projects please reach out to Dane Morgan or Ben Afflerbach (email@example.com, firstname.lastname@example.org) via email or the Skunkworks Slack Workspace to receive the editable link to the project spreadsheet.
Skunkworks Successful Project Highlights
For select research projects that have concluded we have created a brief Project Highlight which summarizes the goals, structure, and outcomes of the project.
|Index||Short Name||Title||Project Description||Application deadline||Requirements: hrs/wk||Requirements:skills/courses/other||Contact: First Name||Contact: Last Name||Contact: Department||Contact: Email||Contact: Phone||Team leads (Names, Departments, Emails)||Number of Positions (max)||Funds available?||Credits available||Response date||Relevant links||Status||Date last updated|
|1||Brain imaging||Removing motion effects in brain MRI||Unlike a camera, which takes an image with exposure rates of about 1/100th of a second and therefore “freezes” motion, magnetic resonance imaging (MRI) of the human brain takes several minutes and is therefore sensitive to motion, even subtle motions less than a fraction of a millimeter. We have developed a successful method to correct for motion, but it is computationally slow and does not correct for rapid motions like those that could occur from a crying baby. This project will develop deep learning motion correction methods that are computationally faster (near real time) and offer improved motion correction for challenging cases with rapid motions. Methods to image with higher spatial resolution also introduce similar artifacts to motion, so potential extensions and applications of this work are/would be cortical assessments with higher image spatial resolution.||Ongoing||10||Python, basic machine learning||Steven||Kecskemeti||Medical Physicsemail@example.com||Steven Kecskemeti, Medical Physics, firstname.lastname@example.org||4||No||Yes||Ongoing||Active-Full||01/06/2022|
|7||MAST-ML Coding||Development of the Materials Simulation Toolkit for Machine Learning||The Materials Simulation Toolkit for Machine Learning (MAST-ML) is an automated, open-source toolkit designed to acclerate data-driven materials research. We are interested in augmenting our current MAST-ML software with new analysis routines central to understanding the performance of machine learning models as they pertain to materials research. Some examples include a robust model error estimation framework, establishing domain of valid model performance, and materials-centric statistical tests, such as training and validating models on certain data subsets as defined, for example, by materials within some composition threshold. This project will provide opportunities for learning skills in machine learning, data analysis, python programming, and materials science.||Inactive||10||Python||Ryan||Jacobs||Materials Science and Engineering||Ryan Jacobs||Ryan Jacobs, MS&E, email@example.com||4||No||Yes||Ongoing||https://github.com/uw-cmg/MAST-ML||Active-Full||15/09/2023|
|11||BLADDER||A New Paradigm for Systems Physiology Modeling: Biomechanistic Learning Augmentation with Deep Differential Equation Representations (BLADDER)||A major unsolved challenge that cuts across many domains of computational science and engineering is to bridge the gap between data-driven machine-learning approaches and insight-rich mechanistic modeling by differential equations. ML scales with the data, but offers little or no mechanistic insight; in contrast, ODE models can incorporate physical, chemical, or biological mechanisms and yield insights into how systems work, but they don’t scale well with ‘big data.’ In the domain of human health, Stimulating Peripheral Activity to Relieve Conditions (SPARC) is a program of the National Institutes of Health (NIH), is seeking to bridge this gap. Our multi-investigator two-year project (Fall20 to Fall22) is bringing together computer scientists, mathematicians, biologists, and engineers to create a hybrid computational modeling approach, guided by data from laboratory experiments, linking ODE models using neural networks. Priority in the selection of student co-workers will be for those with experience or interest in ODE-modeling of physical, chemical, or biological systems and a strong foundation in computational science or engineering. More details: https://projectreporter.nih.gov/project_info_description.cfm?aid=10206953||ongoing||10||Python||John||Yin||Chemical and Biological Engineering, Wisconsin Institute for Discovery (WID)||John Yin||608-316-4323||John Yin, CBE/WID, firstname.lastname@example.org||4||Possibly (summer 2021)||Possibly||Ongoing||https://www.youtube.com/watch?v=2zL2E50oRr8||Active-Full||25/01/2023|
|13||CNN-based Conductivity Prediction||Predicting the ionic conductivity of polymer-ceramic composite solid electrolytes using 3D Convolutional Neural Network||Currently used Li-ion Batteries usually consist of flammable liquid electrolytes. Nonflammable solid-state electrolytes have been proposed to replace the liquid electrolytes for making an all-solid-state battery that is safe, durable, and particularly attractive for powering large devices such as electric vehicles and drones. Although significant progress has been made through almost five decades of development, hallmarked by the discovery of various solid electrolytes with excellent Li-ion conductivity which allows for fast charging, it is still challenging to achieve a fast Li-ion transport across electrolyte-electrode interface. To overcome this barrier, a polymer-ceramic composite electrolyte has been proposed, which has the potential of enabling a fast Li-ion transport both in the electrolyte and across the electrode-electrolyte interface. The purpose of this project is to train a 3D convolutional neural network (CNN) model for achieving a fast and accurate prediction of the effective Li-ion conductivity of the composite polymer-ceramic solid electrolytes. It is a regression type task, and we will have the datasets of both the input “microstructure” of the composite and the target Li-ion conductivity “three numbers” ready. Extensive experiences in CNN regression type task are required. We already have a preliminary 3D CNN model for you to start with. Students with good writing skills are preferred. The student is expected to work in a relatively high degree of independence, complete the majority of CNN model training, testing, and data representation, and have a weekly discussion with the faculty advisor and a third-year PhD student. The student needs to be enrolled in independent study for credits. We will cover the expense needed to run CNN on Euler at WACC https://wacc.wisc.edu/.||ongoing||10||Relatively experienced in CNN regression; Good skills in writing science||Jiamian||Hu||Materials Science and Engineering||Jiamian Hu||Jiamian Hu, MS&E, email@example.com||1||No||Yes||Ongoing||Active-Full||25/01/2023|
|14||COoL||Chemical Origins of Life (COoL): a reaction network approach||The chemistry which led to the origin of life on the early Earth remains poorly understood. One of the major challenges is the combinatorial explosion that occurs when unspecific chemistries are used to synthesize sequence-specific polymers, like proteins. We aim to investigate the kinetics of amino acids forming peptides in prebiotic systems, a process containing hundreds of potential reactions and thus requires advanced network reconstruction tools. Similar tasks have been addressed by modelling tools; however, these tools have not yet been adapted to the specific constraints of origins of life systems. The goal of this project is to adapt the Rule Input Network Generator (RING) to study prebiotic peptide reactions. Students will be co-mentored by Prof. John Yin (UW Madison, Chemical and Biological Engineering) and Prof. Srinivas Rangarajan (Lehigh University, Chemical and Biomolecular Engineering). Priority will be given to students with an interest in computational network modeling and a background in chemical engineering or computational science.||ongoing||10||domain specific English-like reaction language||John||Yin||Chemical and Biological Engineering, Wisconsin Institute for Discovery (WID)||John Yin||John Yin, CBE/WID, firstname.lastname@example.org||4||possibly summer 2021||yes||Ongoing||https://wid.wisc.edu/featured-science/origins-of-life-and-john-yin/||Active-Full||25/01/2023|
|19||ML tools for AM||Machine learning tool for identifying defect formation mechanisms from in-situ high-speed X-ray imaging data||Defect formation in metal additive manufacturing (AM, also known as 3D printing) severely hinders their applications because defects can cause failure of parts. Keyhole-induced porosity is one of the most common and detrimental defects in metal additive manufacturing. However, the formation mechanism is not fully understood. Here, the real-time keyhole porosity generation process during additive manufacturing has been experimentally recorded by high-speed high-resolution X-ray imaging. The first objective of this work is to develop an automated approach using machine vision to identify the keyhole (a depression cavity in liquid metal induced by intensive vaporization of materials during laser heating) and keyhole-induced pores out of thousands of X-ray images. The algorithm is expected to achieve fast and accurate marking/labelling of keyhole profile and pore profile in given X-ray images. After all the images were processed, the second objective is to use machine learning to explore the correlations between keyhole fluctuations and pore formation. This project will provide opportunities for learning skills in machine vision, machine learning, and X-ray imaging of additive manufacturing processes.||Ongoing||10||Python||Qilin||Guo||Mechanical Engineering||Qilin GUO||608-234-2906||Lianyi Chen, email@example.com||8||Yes||Yes||Ongoing||Active-Full||29/08/2023|
|21||Computational Design||Undergraduate Research in Compuational Design||Computational Design and Manufacturing Lab is interested in advancing computer methods for design and manufacturing applications. Current research activities in CDM Lab focus on machine learning and topology optimization for design and additive manufacturing applications. Example projects for undergraduates include the training of deep neural network for optimal design of heat sinks and heat exchangers, and the training of convolutional neural network for computer-aided design (CAD) modeling such as constructing NURBS surfaces.||Sept. 1, 2023||10||Students with interest in mathematics and computer programing (Python and/or C++) are encouraged to apply.||Xiaoping||Qian||Mechanical Engineering||Xiaoping Qian||608-890-1925||Xiaoping Qian, Mech E, firstname.lastname@example.org||No||Yes||Ongoing||Active-Full||25/01/2023|
|22||Kidney projects-Cancer||Deep Learning to identify aggressive tumor features in Renal cell carcinoma (kidney cancer)||As the volume of computed tomography (CT) performed for a variety of indications continues to increase, the incidence of renal cell carcinoma (RCC) has also continued to rise. Spatial heterogeneity is a common feature of RCC, with multiple studies demonstrating variability within tumors with respect to pathologic features, genomics, and RNA/protein expression. This heterogeneity gives rise to a spectrum of biologic and clinical behavior, with an increasingly less aggressive management approach in more indolent disease and a move towards nephron sparing approaches in cases where intervention is warranted. Pathologic markers of tumor aggressiveness such as higher nuclear grade or presence of sarcomatoid features may only be present in a small portion of the tumor but may profoundly impact treatment decisions and prognosis. These small areas can be challenging to identify on biopsy, and although radiomic features provide more global assessment and have shown some promise in capturing and characterizing tumor heterogeneity, some aggressive tumor features have remained elusive at imaging. The purpose of this project is to apply deep learning to previously segmented renal cell carcinomas on CT images with known pathologic features to help identify an imaging signature for aggressive features (high nuclear grade, sarcomatoid features). We have manually segmented approximately 124 large renal cell carcinomas (> 7cm) into volumetric regions of interest (ROIs) which represents the nuclear grade cohort. We have segmented a second data set of 45 RCCs with sarcomatoid features with 49 size matched non-sarcomatoid RCCs to serve as controls with extracted radiomics features on which we can perform similar analysis. The team would create a deep learning model to evaluate the CT images in this patient cohort with known kidney cancers. The model may require coding to incorporate the mask (image segmentations) with the CT image stacks showing the kidney tumor, or could also be run without the annotations on the date.||Sept. 1, 2023||10||Machine learning, ?CNNs||Meghan||Lubner||Radiology, UWSMPHemail@example.com||608-263-9028||Meghan Lubner (radiology), Andrew Wentland (radiology), E. Jason Abel (urology), Dane Morgan (Material Sciences)||0||Possibly||Possibly||Ongoing||Active-Full||28/07/2023|
|23||Kidney projects-benign masses||Machine and Deep Learning to predict clinical outcomes in angiomyolipomas||Angiomyolipomas are histologically benign tumors that occur in the kidneys, either sporadically or associated with syndromes such as tuberous sclerosis. They consist of fat, muscle, and abnormal vessels. Because of these abnormal vessels, these lesions can be at risk for spontaneous and at times, life threatening hemorrhage. Based on small clinical series, a size threshold of 4 cm has been associated with hemorrhage and need for treatment. We have collected a series of patients with angiomyolipomas imaged over time with at least one baseline computed tomography study (many pts in the cohorts have both imaging and clinical follow up). We have segmented the tumors and extracted a large number of radiomics features. This extracted data would be amenable to machine learning analysis to try to identify radiomic features that may predict bleeding risk or need for intervention. This could be performed with XGBoost models etc. In addition, this segmented image data could be used as input to a deep learning model that could be used to predict similar outcomes. In theory, a model similar to the project above could be re-trained and applied to this data.||Sept. 1, 2023||10||MAchine learning, ?CNNs||Meghan||Lubner||Radiology, UWSMPHfirstname.lastname@example.org||608-263-9028||Meghan Lubner (radiology), Andrew Wentland (radiology), E. Jason Abel (urology), Dane Morgan (Material Sciences)||0||Possibly||Possibly||Ongoing||Active-Full||25/01/2023|
|25||Automated Feature Selection||Best practices for selecting and validating feature sets for accurate machine learning models||Building accurate and transferable machine learning models depends critically on the feature set used to numerically represent the data of interest. Here, we are interested in developing new practical methods of feature selection, with an emphasis on evaluating our methods on a number of experimental and computed materials property datasets. This work will leverage and build on the MAST-ML package, which was developed in our group to streamline and automate construction of machine learning models. This work will result not only in building practical machine learning and data science experience, but potentially result in a generally useful method for constructing robust feature sets for production of accurate models of numerous materials properties.||Sept. 1, 2023||10||Python, MAST-ML familiarity||Benjamin||Afflerbach||Materials Science and Engineering||Benjamin Afflerbach||512-934-4497||Benjamin Afflerbach, MS&E, email@example.com||5||No||Yes||Ongoing||Active-Full||29/08/2022|
|26||Hirsch Index||Factors Controlling the Hirsch Index||The Hirsch Index (HI) is the maximum number of papers n a researchers has with over n citations. It is perhaps the most widely used single number to assess the impact of scientific researchers. It has some known issues that makes its implications difficult to assess, e.g., strong bias to have higher values for people working in large fields where there are many papers. However, there are likely many other issues that are less clear. One possible issue is that HI does not distinguish work which a researcher led intellectually from work they just supported or in which they were guided by others. In this project we seek to determine the effect of such factors on HI. We will collect data on researchers and attempt to quantify impact on HI work done under guidance of their PhD/postdoc advisors or coauthored works not led by them, ideally as a function of key variables like time, research HI, advisor's HIs, etc. This project will likely involve a lot of data extraction, text processing, and correlative analysis. It may involve some Natural Language Processing and/or machine learning but it is likely that this will not play a role in the early stages of the work. Meetings for this project are held online via MS Teams.||Sept. 1, 2023||10||Python||Maciej||Polak||Materials Science and Engineeringfirstname.lastname@example.org||Maciej Polak, Izabela Szlufarska, Dane Morgan (MS&E)||6||No||Yes||Ongoing||https://en.wikipedia.org/wiki/H-index||Active-Full||29/08/2022|
|27||Article abstracts||Assessing the impact of scientific papers based on their abstracts||Each journal article begins with an abstract, a short (usually below 200 words) paraph, in which the main objective as well as the key results of the paper are summarized. A good abstract contains all the relevant information about the content of the paper and its relevance in the field. In this project we will explore if with we could assess the potential impact of the whole paper based solely on its abstract, with the use of Natural Language Processing (NLP) models. The project will involve working with journal publishers and journal database APIs, fine-tuning and applying various NLP models and processing text. Meetings for this project are held online via MS Teams.||Sept. 1, 2023||10||Python, basics of string/text processing||Maciej||Polak||Materials Science and Engineeringemail@example.com||Maciej Polak (firstname.lastname@example.org), Dane Morgan (email@example.com), MS&E||6||No||Yes||Ongoing||Active-Full||05/09/2023|
|28||Generative Machine Learning||Assessing generative models for predicting materials structure and properties||This project is a remote and in-person project and is open to students from outside of UW-Madison. One of the fundamental goals of materials science is to be able to understand and predict a potential material's properties without having to first synthesize it. Machine learning models can potentially give us this information by training on large datasets of properties and building up an understanding of the aspects of each material that impacting properties. One of these aspects that has been hard to handle is the structure of the material. Intuitively we might know for example that a liquid, solid, and gas of the same elements might have dramatically different properties. One reason for that is that the atoms are arranged into drastically different structures. The same is true for materials we describe as single crystals, poly-crsytals or amorphous to name a few. All of these descriptions exist to tell scientists some information about the underlying structure of atoms. By using this information as the input along with deep learning diffusion models, we can ask the model to first encode the property and structural information, and then decode that information again to give new prediction about materials with those targetted properties. That's the ultimate goal of the project, to use generative models to predict novel new materials with targeted properties of interest!||Sept. 1, 2023||10||Ideally experience working with Python deep learning packages (pytorch or tensorflow). Familiarity with running code from the command line will be useful as we'll be using remote computing resources based in Linux.||Benjamin||Afflerbach||Materials Science and Engineeringfirstname.lastname@example.org||Benjamin Afflerbach, email@example.com, Dane Morgan (firstname.lastname@example.org), MS&E||20||No||Yes (UW-Madison)||9/11||will be building off cdvae code here initially: https://github.com/txie-93/cdvae||Active-Full||14/08/2023|
|29||LLM Reliability||Language Model Use for Automated Essay Feedback: An Error Analysis||Writing scientific explanations is an authentic science practice and hence central to science learning. Past research demonstrates that students struggle with learning to write “good” scientific explanations (e.g., causal, complete, backed with evidence). Meanwhile, teachers are limited in time to provide timely feedback to mitigate these struggles. Hence, recent research has investigated using NLP to automatically score students’ essays, provide automated feedback for revision, and report student progress to teachers to help students. However, the success of these NLP programs rely on accurately inferring student understanding from their written explanations. Erroneous inference of student writing when transferred to downstream pedagogical decisions harms student learning and engagement. Hence, an open question for reliable NLP use in supporting students’ science learning that we will explore in this project is: What kinds of errors do NLP technology make in inferring student understanding from their written science explanations? Moreover, LLMs are known to systematically fail on rare groups not obvious in aggregate evaluation. Hence, a secondary research question for our analysis is: In what ways do these errors reflect social and linguistic biases in NLP algorithms? After training a large language model to infer scientific concepts from student writing, we will conduct systematic error analysis using state-of-the-art NLP tools and algorithms.||Ongoing||9||- Comfortable with programming in python, especially reading, writing and manipulating CSV files, familiarity with data processing and analysis libraries - (Preferred) Experience with machine learning or data science through coursework or past projects.||Shamya||Karumbaiah||Educational Psychologyemail@example.com||Shamya Karumbaiah, EdPysch, firstname.lastname@example.org||6||No||Yes||Ongoing||Apply here: https://forms.gle/ZESTmXr5eyjdEUcY9||Active-Full||06/09/2023|