Research in Scientific Machine Learning

In the field of scientific machine learning, TOELT applies advanced mathematical frameworks and machine learning models to analyze complex physical processes. Their recent innovations address limitations of traditional methods in handling noisy, sparse data, especially in applications like high-speed spectroscopy and error-adjusted metrics in scientific measurements. This work holds promise for both improving experimental reliability and facilitating real-time, data-driven decisions across scientific disciplines.

Selected Papers and Books

Fundamental Mathematical Concepts for Machine Learning in Science, Springer Nature, (2024 )

Michelucci, U. (2024). Fundamental Mathematical Concepts for Machine Learning in Science, Springer.

The book "Fundamental Mathematical Concepts for Machine Learning in Science" by Umberto Michelucci is designed to bridge the gap for scientists in fields such as physics, biology, and chemistry who seek to harness machine learning without an extensive computational background. By focusing on core mathematical concepts — such as calculus, linear algebra, and probability theory — the book provides readers with the necessary groundwork to understand and apply machine learning techniques effectively in scientific contexts. Each chapter is crafted to be self-contained, making the material accessible and manageable, even for those with limited prior experience in the topics covered.

The book goes beyond foundational mathematics by connecting each mathematical concept directly to applications in scientific machine learning. For instance, linear algebra topics are explored through the lens of data representation, essential for working with multidimensional data, while calculus is discussed in the context of gradient-based optimization, a key technique for model training. Statistical concepts are carefully selected to aid in understanding data variability and uncertainty, fundamental for experiments in natural sciences. This contextual approach ensures that each mathematical area is not only explained theoretically but is also grounded in practical, science-driven applications.

In addressing the unique needs of scientific machine learning, the book emphasizes techniques relevant to real-world data challenges faced in scientific research. From handling noisy measurements to working with sparse data, Michelucci’s approach empowers scientists to build reliable, interpretable models suited to their specific experimental needs. This book serves as both a foundational resource for newcomers and a practical reference for seasoned scientists, fostering an understanding of machine learning that enhances and complements traditional scientific methods.

Deep learning domain adaptation to understand physico-chemical processes from fluorescence spectroscopy small datasets and application to the oxidation of olive oil Nature Scientific Reports, 14(1), 22291, (2024)

Michelucci, U., & Venturini, F. (2024). Deep learning domain adaptation to understand physico-chemical processes from fluorescence spectroscopy small datasets and application to the oxidation of olive oil. Scientific Reports, 14(1), 22291.

Revolutionizing Food Quality and Sustainability through AI and Spectroscopy

At the intersection of artificial intelligence and food science, our research is transforming how we understand and maintain the quality of extra virgin olive oil (EVOO) and other food products. Using fluorescence spectroscopy data and innovative deep learning techniques, we've developed a powerful new approach to predict, interpret, and optimize food quality over time.

Our Solution: Domain Adaptation and Interpretability in AI

Fluorescence spectroscopy is widely recognized for its accuracy and detail in detecting molecular changes, making it invaluable in fields like food science, environmental monitoring, and medical diagnostics. However, traditional methods have struggled with small, sparse datasets and the complex nature of spectral data. To overcome these challenges, we apply domain adaptation—leveraging pretrained neural networks designed for image recognition to analyze spectroscopic data. By adapting the MobileNetv2 model, we've successfully trained AI to predict critical quality indicators in olive oil, even on limited datasets.

Our work goes beyond prediction. We’ve developed an Information Elimination Algorithm (IEA) that identifies the most relevant spectral bands linked to chemical changes in EVOO, like oxidation. This novel interpretability technique effectively turns our model into a tool for scientific discovery, uncovering how molecular components—such as chlorophyll and oxidation products—evolve during oil ageing. By making AI’s insights transparent, our approach enables a deeper understanding of food chemistry, supporting producers in making data-driven quality assessments.

Impact on Industry and Future Applications

The ability to accurately predict and understand food quality without large datasets opens exciting possibilities. From enhancing product shelf-life to supporting real-time quality monitoring, this research sets a new standard for AI’s role in food science. Our model can be easily adapted to other products and processes, such as tracking the freshness of perishable goods or detecting chemical changes in pharmaceuticals.

New metric formulas that include measurement errors in machine learning for natural sciences. Expert Systems with Applications, 224, 120013 (2023)

Michelucci, U., & Venturini, F. (2023). New metric formulas that include measurement errors in machine learning for natural sciences. Expert Systems with Applications, 224, 120013.

This paper addresses a critical gap in machine learning applications within physics and other sciences. The authors present new formulas for commonly used metrics—mean squared error (MSE), mean absolute error (MAE), and accuracy—that incorporate measurement errors in target variables, a factor often overlooked in model evaluations. By incorporating errors into these metrics, the revised formulas yield more conservative, realistic performance estimates, reducing overly optimistic results common in traditional metrics.

The paper begins by highlighting how ignoring measurement errors can lead to false interpretations in scientific modeling, especially in fields where experimental data contain inherent uncertainties. The authors then mathematically derive revised formulas that consider these uncertainties, providing models with error-adjusted assessments across regression and classification tasks. These formulas are not only model-independent but also widely applicable, covering scenarios with variable measurement precision across samples.

Ultimately, the research emphasizes the importance of accounting for measurement variability to provide scientifically accurate assessments of machine learning performance. This approach has broad implications, suggesting that the revised metrics should be adopted across scientific domains where precise measurement is essential, enabling researchers to draw more reliable conclusions from their models.