BiGmax Workshop 2021

Europe/Berlin
Virtually

Virtually

Peter Benner (MPIDyktS), Peter Fratzl (MPIKG), Richard Weinkamer (MPIKG)
Description

The BiGmax Workshop 2021 on Big-Data-Driven Materials Science will be held virtually from April 14 - 15, 2021.

The workshop is aimed at presenting results and new insights into data-driven materials science. Those can be based on approaches in statistical and machine learning, compressed sensing and other recent technologies from mathematics, computer science, statistics and information technology.

Keynote speakers:

Participants
  • Alaukik Saxena
  • Amir Omranpoor
  • Andreas Leitherer
  • Andreas Marek
  • Arghya Dutta
  • Baptiste Gault
  • Byung Chul Yeo
  • Christian Liebscher
  • Christoph Freysoldt
  • Christoph Koch
  • Dagmar Kainmueller
  • Dierk Raabe
  • Gerhard Dehm
  • Gerhard Weikum
  • Haobo Li
  • Jaber Rezaei Mianroodi
  • Jan Rost
  • Janine Holzmann
  • Janis Sälker
  • Jilles Vreeken
  • Jon Eunan Quinlivan Dominguez
  • Karsten Reuter
  • Kurt Kremer
  • Laurenz Rettig
  • Leigh Stephenson
  • Luca Bertinetti
  • Luca Curcuraci
  • Luca Ghiringhelli
  • Madhulika Mazumder
  • Mao Yang
  • Marco Personeni
  • Mariana Rossi
  • Markus Buehler
  • Markus Kuehbach
  • Markus Rampp
  • Markus Scheidgen
  • Matthias Scheffler
  • Maximilian Rummler
  • Michael Ortiz
  • Navyanth Kusampudi
  • Nicolas Fabas
  • Niels Cautaerts
  • Pablo Castro Latorre
  • Pawan Goyal
  • Peter Benner
  • Peter Fratzl
  • Ralph Ernstorfer
  • Riccardo Farris
  • Richard Weinkamer
  • Thomas Purcell
  • Timoteo Colnaghi
  • Tom Rothe
  • Tristan Bereau
  • Vasileios Athanasiou
  • Vincent Stimper
  • Wolfgang Wagermaier
  • Xian Patrick
  • Ye Wei
  • Yue Li
  • Ziyuan Rao
  • Wednesday, 14 April
    • 13:30
      Welcome
    • Session I: Computational material synthesis (Chair: Peter Fratzl)
      • 1
        Computational material synthesis: Atomistic and molecular dynamics, and bio-inspired AI (30 min talk + 15 min discussion)

        Nature produces a variety of materials with many functions, often out of simple and abundant materials, and at low energy. Such systems - examples of which include silk, bone, nacre or diatoms - provide broad inspiration for engineering. Here we explore the translation of biological composites to engineering applications, using a variety of tools including molecular modeling, AI and machine learning, and experimental synthesis and characterization. We review a series of studies focused on the mechanical behavior of materials, especially fracture, and how these phenomena can be modeled using a combination of molecular dynamics and machine learning. We also present various case studies of hierarchical material optimization using genetic algorithms, applied to 3D printed composites, protein design, and a translation of protein folding to music and back, to offer a broad bio-inspired AI-driven material synthesis platform. As an example we present a recent study in which we translated JS Bach’s Goldberg Variations into protein form, and elucidated salient features of the resulting molecular conformations and material functions.

        Speaker: Markus Buehler (invited) (MIT)
      • 2
        AI-driven discovery of material "genes": application to CO2 activation on semiconductor oxide surfaces (12 min talk + 3 min discussion)

        Using subgroup discovery (SGD), an AI approach that discovers statistically exceptional subgroups in a dataset, we develop a strategy for a rational design of catalytic materials. SGD allows for the identification of distinct, possibly competing mechanisms of a catalytic activation. Here, it is applied to the problem of converting CO$_2$ into useful chemicals. We demonstrate that the bending of CO$_2$, previously proposed as the indicator of activation, is insufficient to account for the good catalytic performance of experimentally characterized oxide surfaces. Instead, our approach identifies the asymmetric strong elongation of the molecular C-O bond as a more accurate indicator.

        Speaker: Luca Ghiringhelli (Fritz Haber Institute of the Max Planck Society)
      • 3
        Design of high-entropy Invar alloys via machine learning (12 min talk + 3 min discussion)

        Invar alloys exhibit a very low thermal expansion coefficient (TEC) below 2×10−6 K−1 around room temperature. There is a strong impetus to design novel Invar alloys with better physical, mechanical and chemical properties. Here, we develop and apply an active learning strategy to accelerate the design of novel Invar alloys in a practically infinite compositional space of quaternary and quinary high-entropy alloys (HEAs).It is demonstrated that this strategy is of great potential to accelerate the discovery of novel functional materials, especially for those with large unexplored phase space such as compositions, crystal structures, and microstructures.

        Speaker: Ziyuan Rao (max-planck-institut für eisenforschung gmbh)
      • 4
        Data-driven search for drug–membrane permeability models (12 min talk + 3 min discussion)

        Passive drug–membrane permeability of a drug molecule quantifies its capacity to cross cell membranes on the
        way of reaching its target. In this contribution, I will present results from our work where we used sure-independence screening and sparsifying operator (SISSO) to find equations for the permeability coefficient that combine both hydrophobicity and acidity of the drugs. The predicted equations provide more accurate values of permeability, on average, than the existing models for a diverse and exhaustive class of small drug molecules. Analysis of the inhomogeneous solubility-diffusion model in several asymptotic acidity regimes further leads to a rationalization of the equations.

        Speaker: Arghya Dutta (Max Planck Institute for Polymer Research)
      • 5
        Data driven approaches to design damage tolerant Dual-phase steel microstructures (12 min talk + 3 min discussion)

        Identifying representations and characterizing the microstructure features is crucial for developing the damage-tolerant dual-phase steels. These representations serve as variables to establish a structure-property relationship and to design microstructures of desired mechanical property. However, the complex nature of the DP steel microstructure, poses a challenge and the existing characterization methods are limited in encoding this information using the handcrafted features.
        To tackle this challenge, I will introduce machine learning models
        1. To extract features automatically from synthetic DP steel images and represent the microstructure in low dimensional latent space.
        2. To conditionally generate damage-tolerant microstructure patterns with desired yield stress.

        Speaker: Navyanth Kusampudi
    • 15:30
      Coffee break
    • Session II: Mathematical approaches in materials science (Chair: Peter Benner)
      • 6
        Model-Free Data-Driven Science: Cutting out the Middleman (30 min talk + 15 min discussion)
        Speaker: Michael Ortiz (invited) (Caltech/Universität Bonn)
      • 7
        Using phase-field models to coarse-grain data sets from scanning transmission electron microscopy (12 min talk + 3 min discussion)

        To extract transferable insights from scanning transmission electron microscopy (STEM), one must deal with noise arising from electron scattering and of the investigated sample. This noise hinders a quantitative analysis of the observation, notably when the features of interest lie in the gradients of the raw data. Physics-informed neural networks have been proposed as a means to incorporate compliance with physical equations that are chosen a priori. We show here that phase field models can help to efficiently coarse-grain STEM video sequences of phase transformations.

        Speaker: Christoph Freysoldt (MPI Eisenforschung)
      • 8
        High-performance data-anayltics and AI at MPCDF (12 min talk + 3 min discussion)

        Since few years, in addition to classical (simulation-based) HPC usage, we observe a steadily growing need of our users for support of high-performance data-analytics (HPDA) and AI workflows.
        In this presentation, we will give an overview of the HPC clusters available at MPCDF for HPDA and AI workflows, the available (HPC-optimized) software stack and we will present recently introduced services such as Jupiter Notebooks as a Service (JNaS) and containers for software deployment. Furthermore, by means of a few use cases, we will demonstrate how the user can run data-analytics or AI algorithms at MPCDF on several HPC cluster nodes.

        Speaker: Andreas Marek (MPCDF)
      • 9
        Uncovering the Relationship Between Thermal Conductivity and Anharmonicity with Symbolic Regression (12 min talk + 3 min discussion)

        Quantitatively understanding the link between anharmonicity and thermal conductivity, $\kappa$, is pivotal to the search for better thermal insulators. To help find this link we present new descriptors of $\kappa$ based on our new measure of anharmonicity, $\sigma^\mathrm{A}$. Using an updated sure-independence screening and sparsifying operator (SISSO) method, we find analytical expressions with symbolic regression and generate expressions for $\kappa_L$ that are competitive with those previously reported in the literature using only a third of the primary features. Finally, we discuss the implications of the new models on future materials design.

        Speaker: Thomas Purcell (Fritz Haber Institute)
      • 10
        Resampling Base Distributions of Normalizing Flows (12 min talk + 3 min discussion)

        Normalizing flows are a popular class of models for approximating probability distributions. However, in many tasks such as image generation benchmarks they are still outperformed by autoregressive models and generative adversarial networks. This is in part due to their invertible nature limiting their ability to model target distributions with a complex topological structure. Several approaches have been proposed to solve this problem but they sacrifice invertibility and thereby trackability of the log-likelihood as well as other desirable properties. In this work, we introduce a base distribution for normalizing flows based on learned rejection sampling, allowing them to model complex topologies without giving up on bijectivity. We applied our model to various sample problems, such as images generation and approximating Boltzmann distributions, and it outperforms the baseline qualitatively and quantitatively.

        Speaker: Vincent Stimper (Max Planck Institute for Intelligent Systems)
    • 18:15
      Meeting of the Steering Committee
  • Thursday, 15 April
    • Session III: Analysis of images and multidimensional data (Chair: Claudia Draxl)
      • 11
        3D image analysis (30 min talk + 15 min discussion)

        Microscopy data is often large and 3d, and thus convolutional neural networks (CNNs) need to be applied in a tile-and-stitch manner to cope with GPU memory constraints. Concerning pixel-wise predictions obtained with UNet-style CNNs via tile-and-stitch, issues with discontinuities at output tile boundaries have been reported. However, a formal analysis of the causes has been lacking. In particular, it had not been understood how inconsistencies can arise even in case of valid padding. Our work shows that the potential for discontinuities to arise is intricately tied to the shift equivariance properties of the employed CNNs. Our theoretical analysis entails simple rules for designing CNNs that are necessary to avoid discontinuities when predictions are obtained in a tile-and-stitch manner.

        Speaker: Dagmar Kainmueller (invited) (MDC Berlin)
      • 12
        Towards an automatized data analysis of large 3d volumetric data (12 min talk + 3 min discussion)

        Recent developments in bio-imaging technologies have allowed researchers to collect larger and larger tomographic datasets which contain an immense amount of details. To achieve a quantitative understanding, however, these datasets need to be cleaned-up and segmented. These two tasks are tedious, very time consuming, and still performed mostly manually. In our work we aim to develop a full workflow from 3D image pre-processing to DL-based 3D segmentation and analysis of large volumetric datasets. Here we present the pre-processing pipeline, including handling of metadata, the plan for the implementation of the segmentation tools and an example of large volumetric data analysis.

        Speakers: Luca Curcuraci (Max-Planck-Institut für Kolloid- und Grenzflächenforschung), Markus Kühbach (Fritz Haber Institute, NOMAD Laboratory), Luca Bertinetti (Max Planck Institute of Colloids and Interfaces)
      • 13
        Convolutional neural network-assisted recognition of nanoscale L12 ordered structures in face-centred cubic alloys (12 min talk + 3 min discussion)

        L12-type nano-ordered structures are typically fully-coherent with FCC matrix, which is challengeable to be characterized. Spatial distribution maps are used to probe local order within reconstructed APT data. However, it is almost impossible to manually analyse the complete point cloud in search for the partial crystallographic information retained within the data. Here, we proposed an intelligent L12-ordered structure recognition method based on convolutional neural networks. The approach was successfully applied to reveal the 3D distribution of L12–type nanoparticles with an average radius of 2.54nm in an Al-Li-Mg system. The minimum radius of detectable nanodomain is even down to 5 Å.

        Speaker: Yue Li (Max-Planck-Institut für Eisenforschung GmbH)
      • 14
        Artificial-Intelligence-Driven Characterization of Crystallographic Interfaces from Electron Microscopy (12 min talk + 3 min discussion)

        Characterizing crystallographic interfaces in synthetic nanomaterials is an important step for the design of novel materials. Trained materials scientists can assign interface structures of materials by looking at high-resolution imaging and diffraction data obtained by aberration-corrected scanning transmission electron microscopy (STEM). However, STEM datasets cannot be fully exploited due to the lack of automatic analysis tools. Here, we present AI-STEM, a newly developed AI tool, based on a a Bayesian neural network, for accurately extracting the key features of (poly)crystalline materials from atomic-resolution STEM images. It achieves excellent predictive performance for identifying crystal structure and lattice misorientations on experimental images.

        Speaker: Byung Chul Yeo (Pukyong National University)
    • 11:00
      Coffee break
    • Session III: Continuation of the previous session (Chair: Richard Weinkamer)
      • 15
        Machine-learning-enhanced time-of-flight mass spectrometry analysis (12 min talk + 3 min discussion)

        Mass spectrometry is a widespread approach used to work out what the constituents of a material are. Atoms and molecules are removed from the material and collected, and subsequently, a critical step is to infer their correct identities based on patterns formed in their mass-to-charge ratios and relative isotopic abundances. However, this identification step still mainly relies on individual users’ expertise, making its standardization challenging, and hindering efficient data processing. Here, we introduce an approach that leverages modern machine learning technique to identify peak patterns in time-of-flight mass spectra within microseconds, outperforming human users without loss of accuracy.

        Speaker: Ye Wei (Max Planck institute for iron research)
      • 16
        Robust recognition and exploratory analysis of crystal structures via Bayesian deep learning (12 min talk + 3 min discussion)

        Due to their ability to recognize complex patterns, neural networks can drive a paradigm shift in the analysis of materials-science data. As a major improvement, we introduce a crystal-structure identification method based on Bayesian deep learning that is robust to structural noise and can treat more than 100 crystal structures. While being trained on ideal structures only, our method correctly characterizes strongly perturbed single- and polycrystalline systems, from both synthetic and experimental resources. Robust crystal classification, principled uncertainty estimates, and exploratory analysis of internal neural-network representations (via unsupervised learning) enable hitherto hindered investigations of noisy atomic structural data.

        Speaker: Andreas Leitherer (Fritz Haber Institute of the Max Planck Society)
    • Poster session (meet us in gather.town): (Chair: Richard Weinkamer)
      • 17
        Learning Dynamics of STEM by Enforcing Physical Consistency with Phase-Field Models

        In this poster, we present our research goals of a recently BiGmax funded project towards learning dynamics of scanning transmission electron microscopy (STEM) by incorporating physical consistency with phase-field models. The primary idea of this project is to develop machine learning (ML)-based modeling of an interpretable coarse-grained dynamic model utilizing in situ STEM video sequences fulfilling a suitable dynamical phase-field equation. The modeling approach aims to discover governing equations by utilizing the video sequence data and prior physics knowledge that is directly compatible with analytic theories or subsequent ML-based analysis.

        Speaker: Pawan Goyal (Max Planck Institute for Dynamics of Complex Technical Systems)
      • 18
        Consistent atom probe representation for machine learning and data mining

        To correlate mechanical properties of Al alloys with chemical segregation in Atom Probe Tomography (APT), we have developed two approaches. In the first, we collect composition statistics from APT datasets for 2x2x2 nm voxels. These voxel compositions are then clustered in compositional space using Gaussian mixture models to automatically identify key phases and their corresponding statistical descriptors. In the second, we employ SOAP (Smooth Overlap of Atomic Positions) descriptors to encode local chemical and structural environment around each atom in APT dataset. Upon using a pairwise similarity criteria on SOAP vectors, atoms lying in similar atomic environments (phases) are identified.

        Speaker: Alaukik Saxena (Max-Planck-Institut für Eisenforschung GmbH )
      • 19
        Gigascale electron event processing for band structure mapping

        Mapping of the electronic band structures of materials using momentum microscopy requires processing single-electron events of a few to hundreds of gigabytes. We construct a flexible computational workflow that allows user interaction with billion-count single-electron events in these band mapping experiments. We demonstrate its compatibility with large facility and tabletop experimental setups. The workflow is open source and offers an end-to-end recipe from data source to database. Both the workflow and processed data can be archived for reuse, providing the infrastructure for documenting the data provenance for high-throughput materials characterization.

        Speakers: R. Patrick Xian (Fritz Haber Institute of the Max Planck Society), Dr Laurenz Rettig (Fritz Haber Institute of the Max Planck Society), Dr Ralph Ernstorfer (Fritz Haber Institute of the Max Planck Society)
    • 12:45
      Lunch break
    • Session IV: Ab initio methods in materials science and FAIR data (Chair: Matthias Scheffler)
      • 20
        Knowledge-Based Approaches in Catalysis and Energy Modelling (30 min talk + 15 min discussion)

        Data sciences are now also entering theoretical catalysis and energy related research with full might. Automatized workflows and the training of machine learning approaches with first-principles data generate predictive-quality insight into elementary processes and process energetics at undreamed-of pace. Computational screening and data mining allows to explore these data bases for promising materials and extract correlations like structure-property relationships. At present, these efforts are still largely based on highly reductionist models that break down the complex interdependencies of working catalysts and energy conversion devices into a tractable number of so-called descriptors, i.e. microscopic parameters that are believed to govern the macroscopic function. Generally, static predefined databases are also the norm. Future efforts will concentrate on using artificial intelligence also in the actual generation and reinforced improvement of the reductionist models, and in devising active learning approaches that generate the truly required data on demand. In this talk, I will briefly survey these developments, providing examples from our own research, in particular on data-efficient approaches to reaction kinetics and active machine learning for the design of organic semiconductors.

        Speaker: Karsten Reuter (invited) (FHI Berlin)
      • 21
        NOMAD – A FAIR Data Sharing Platform for Materials Science (12 min talk + 3 min discussion)

        As an integral part of the FAIR-DI/FAIRmat initiatives, NOMAD is extending it's scope. NOMAD evolves from a central repository for publishing electronic structure codes data into a federated data management network that covers all branches of materials science. Instead of just using NOMAD to publish final results, we want to show how on-site installations of NOMAD can help to manage your local daily research data, how we plan to build a distributed network of NOMAD servers, how we combine data artefacts from various material science domains, and how we plan to involve a larger community.

        Speaker: Markus Scheidgen (Humboldt Universität zu Berlin / Fritz Haber Institut der Max Planck Gesellschaft)
      • 22
        On Software Tools Which Assist Electron Microscopists with Sharing Metadata and Numerical Results in Accordance with the FAIR Data Stewardship Principles (12 min talk + 3 min discussion)

        Microscopy and spectroscopy experiments and the associated computational and theoretical analyses of data from such experiments are the resources of laboratory and data-processing workflows that yield numerical data and contextualization through metadata. The purpose of such experiments is ideally accurate and precise delivery of quantitative evidence in support of or against a formulated set of research hypotheses.

        Faced with a large variety of hardware and software tools, and individually larger and faster acquirable data, makes a comprehensive documentation of data and metadata a challenging task. These challenges have consequences for how findable, accessible, interoperable, and how reproducible research with experiments at present is; and thus how efficiently such data can be exchanged between scientists.

        To take action the German Research Foundation has made a cross-displicinary call to form a number of national consortia to work on the building of a national research data infrastructure for experimentalists. One proposed consortium is FAIRmat, whose aim is the building of such an infrastructure for methods of the condensed-matter physics community.

        In this talk, I will report an example of the methods used and the software tools which FAIRmat will develop. Specifically, the example will be on how we organize metadata of electron microscopy experiments using a common metadata schema. We will report on the role and value of parsers as tools within automated protocols for filling in the respective metadata schema. Our results support that it is possible to find a common schema to store detailed microscopy (meta)data.

        Speaker: Markus Kühbach
      • 23
        Approaches to FAIR TEM data (12 min talk + 3 min discussion)

        Transmission electron microscopy data is rich in quantitative information about materials, information that could in theory be coupled to atomistic simulations, but extracting and harnessing that information is non-trivial. Machine learning approaches may facilitate this, but these are hampered by the limited availability and interoperability of the data. In this talk we present approaches, progress, and challenges to making TEM data more F.A.I.R. (findable, accessible, interoperable and reusable) with the ultimate aim to bridge the gap between TEM, machine learning and atomistics.

        Speaker: Niels Cautaerts (Max-Planck-Institut für Eisenforschung)
    • 15:30
      Wrap up of the BiGmax Workshop