BiGmax Workshop 2022

Europe/Berlin
University Conference Centre in Bochum

University Conference Centre in Bochum

Gerhard Dehm, Gerhard Weikum, Joerg Neugebauer
Description

The BiGmax Workshop 2022 on Big-Data-Driven Materials Science will be held at the University Conference Centre in Bochum from April 11 - 13, 2022 under 2G+ conditions.

The workshop is aimed at presenting results and new insights into data-driven materials science. Those can be based on approaches in statistical and machine learning, compressed sensing and other recent technologies from mathematics, computer science, statistics and information technology.

Keynote speakers:

Participants
  • Adisorn Panasawatwong
  • Alaukik Saxena
  • Ali Ahmadian
  • Andreas Leitherer
  • Aytekin Demirci
  • Baoying Dou
  • Christian Liebscher
  • Christian Theobalt
  • Christoph Freysoldt
  • Colin Ophus
  • Erwin Laure
  • Gerhard Dehm
  • Gerhard Weikum
  • Herzain Isaac Rivera Arrieta
  • Hongbin Zhang
  • Jaber Rezaei Mianroodi
  • Jörg Neugebauer
  • Kishan Govind
  • Lekshmi Sreekala
  • Luca Curcuraci
  • Luca Ghiringhelli
  • Marjolein Dijkstra
  • Matthias Scheffler
  • Mohammad Sarkari Khorrami
  • Murali Uddagiri
  • Navyanth Kusampudi
  • Nico Fransaert
  • Pavlo Potapenko
  • Pawan Goyal
  • Peter Benner
  • Ralf Drautz
  • Ray Miyazaki
  • Sandeep Reddy Bukka
  • Santiago Rigamonti
  • Shalini Bhatt
  • Shyam Katnagallu
  • Vincent Stimper
  • Xuyang Zhou
  • Yilun Gong
  • Yue Li
  • Ziyuan Rao
    • 14:00 14:15
      Welcome 15m
    • 14:15 16:15
      Session I
      • 14:15
        From BiGmax to FAIR Data Sciences 1h

        A FAIR research data management is of fundamental importance for new discoveries in materials science and related disciplines. This is even more so in case "FAIR" is interpreted as "Findable, and AI Ready". In order to facilitate the process to a FAIR data management across all scientific disciplines and to leverage the hidden treasures in available experimental and computational data sets, Germany has established the National Research Data Infrastructure (NFDI). In this talk, we will particularly discuss those NFDI consortia and their goals, that have close links to, or even have their roots in, BiGmax.

        Speakers: Peter Benner (MPI Magdeburg), Santiago Rigamonti (Humboldt-Universität zu Berlin)
      • 15:15
        Robust recognition and exploratory analysis of crystal structures via Bayesian deep learning (canceled) 30m

        Due to their ability to recognize complex patterns, neural networks can drive a paradigm shift in the analysis of materials-science data. As a major improvement, we introduce a crystal-structure identification method based on Bayesian deep learning that is robust to structural noise and can treat more than 100 crystal structures. While being trained on ideal structures only, our method correctly characterizes strongly perturbed single- and polycrystalline systems, from both synthetic and experimental resources. Robust crystal classification, principled uncertainty estimates, and exploratory analysis of internal neural-network representations (via unsupervised learning) enable hitherto hindered investigations of noisy atomic structural data.

        https://cloud.fhi-berlin.mpg.de:8443/getlink/fiA1s3hPPgAmDotS7GAiqupa/BiGmax_talk_Leitherer_April_11_2022.pdf

        Speaker: Andreas Leitherer (Fritz Haber Institute of the Max Planck Society)
      • 15:45
        Short Poster Introductions 30m
    • 16:15 17:00
      Coffee Break 45m
    • 17:00 18:30
      Session II
      • 17:00
        Machine learning and Inverse design of soft materials 1h

        Predicting the emergent properties of a material from a microscopic description is a scientific challenge. Machine learning and reverse-engineering have opened new paradigms in the understanding and design of materials. However, the soft-matter field has lagged far behind in embracing this approach for materials design. The main difficulty stems from the importance of entropy, the ubiquity of multi-scale and many-body interactions, and the prevalence of non-equilibrium and active matter systems. The abundance of exotic soft-matter phases with (partial) orientation and positional order like liquid crystals, quasicrystals, plastic crystals, along with the omnipresent thermal noise, makes the classification of these states of matter using ML tools highly non-trivial. In this talk, I will address questions like: Can we use machine learning to autonomously identify local structures [1], detect phase transitions, classify phases and find the corresponding order parameters [2] in soft-matter systems, can we identify the kinetic pathways for phase transformations [3], and can we use machine learning to coarse-grain our models? [4,5] Finally, I will show how one can use machine learning to reverse-engineer the particle interactions to stabilize nature’s impossible phase of matter, namely quasicrystals? [6]

        [1] Unsupervised learning for local structure detection in colloidal systems
        E. Boattini, M. Dijkstra, and L. Filion The Journal of Chemical Physics 151, 154901 (2019).
        [2] Classifying crystals of rounded tetrahedra and determining their order parameters using dimensionality reduction
        R. van Damme, G.M. Coli, R. van Roij, and M. Dijkstra, ACS Nano 14, 15144-15153 (2020).
        [3] An artificial neural network reveals the nucleation mechanism of a binary colloidal AB13 crystal
        G.M.Coli and M. Dijkstra, ASC Nano 15 (3), 4335-4346 (2021).
        [4] Machine learning many-body potentials for colloidal systems
        G. Campos-Villalobos, E. Boattini, L. Filion and M. Dijkstra, The Journal of Chemical Physics 155 (17), 174902 (2021).
        [5] Machine learning free-energy functionals using density profiles from simulations
        P. Cats, S. Kuipers, S. de Wind, R. van Damme, G.M. Coli, M. Dijkstra and R. van Roij, APL Materials 9, 031109 (2021).
        [6] Inverse design of soft materials via a deep learning–based evolutionary strategy
        G.M. Coli, E. Boattini, L. Filion, and M. Dijkstra, Science Advances 8 (3), eabj6731 (2022).

        Speaker: Marjolein Dijkstra (Utrecht University)
      • 18:00
        Automatizing the analysis of 3d structures in biological sample 30m

        Segmentation and analysis of structures in 3d biological samples may be an ambiguous operation, due to the difficulties in the data visualization. Machine learning may help for this kind of task, but they may remain opaque regarding the scientific reasons leading to a particular result. Thanks to recent advancement in the field of explainable machine learning, human interpretable explanations can be still obtained, suggesting possible investigation direction. In this talk a procedure for the automatic analysis of 3d texture-like properties in biological samples, the extraction of human interpretable explanation is briefly presented, together with practical applications to real data.

        Speaker: Luca Curcuraci (Max-Planck-Institut für Kolloid- und Grenzflächenforschung)
    • 18:30 19:00
      Break 30m
    • 19:00 20:30
      Poster Session / Steering Committee
      • 19:00
        A machine-learning-based approach for the elastoplastic response of polycrystalline materials 1h 30m

        We developed a machine-learning-based approach for solving computing the elastoplastic mechanical response of polycrystalline structures. In particular, a recursive deep neural network based on U-Net and applied recursively is proposed as a surrogate model for predicting the von Mises stress field under quasi-static tensile loading. We show that the model can accurately predict both the average response as well as the local von Mises stress field in the history-dependent elastoplastic problems. The trained model can predict the nonlinear mechanical response of any grain structure, orders of magnitude faster than conventional numerical approaches such as the spectral solvers.

        Speaker: Mohammad Sarkari Khorrami (Max-Planck Institut für Eisenforschung)
      • 19:00
        AI with experimental and theoretical data: role of the support material for CO2 hydrogenation 1h 30m

        The performance of heterogeneous catalysts is governed by an intricate interplay of several multi-scale processes. Thus, it is rather challenging to identify the most relevant parameters for the design of the catalyst and its support material. Here, we combine experimental and theoretical descriptive parameters characterizing cobalt nanoparticles dispersed on SiO2 supports modified with Ti, Zr, Al, Ca, or Mg, and adopt the sure-independence-screening-and-sparsifying (SISSO) AI approach to identify correlations describing the selectivity of these materials measured for the CO2 hydrogenation to methanol.

        Speaker: Ray Miyazaki (Fritz Haber Institute of the Max Planck Society)
      • 19:00
        Hydrogen Adsorption on Pd Surfaces and Its Effect on CO2 Activation 1h 30m

        An accurate description of the surface of Pd-based catalysts under reaction conditions is a critical step toward a deeper understanding of catalyst reactivity. Herein, by modeling the phase diagram of the (111) and (100) surfaces of face-centered cubic Pd via ab initio atomistic thermodynamics, we predict the stable hydrogen coverages for a wide range of temperatures and H2 pressures. The hydrogen coverage at the experimental conditions used for CO2 hydrogenation plays a major role in the reactivity, as it hinders the chemisorption of activated CO2. The calculated data will serve as basis for subsequent subgroup-discovery analysis on CO2 activation.

        Speaker: Herzain Isaac Rivera Arrieta (Fritz Haber Institute of the Max Planck Society)
      • 19:00
        Machine learning to push the limits of time-of-flight secondary ion mass spectrometry. 1h 30m

        Time-of-flight secondary ion mass spectrometry (ToF-SIMS) obtains chemical information on a sub-micron scale. Traditionally, experts analyze the spectra in a time-consuming manner, and the complexity of the data limits what can be extracted by inspection. Machine learning could push the limits of ToF-SIMS on various aspects. Machine-learning-enhanced identification of atomic and molecular fragments could increase the effectiveness of ToF-SIMS, especially when considering biological samples with convoluted spectra. Interlaced measurements of ToF-SIMS and scanning probe microscopy (SPM) allow the chemical maps to become three-dimensional. Additionally, images generated by image fusion based on deep learning could allow rapid examination of material composition.

        Speaker: Nico Fransaert (X-LAB, Hasselt University)
      • 19:00
        Measuring complexity & synthetic Hamilton matrices 1h 30m

        Photo-electron spectra obtained with intense pulses generated by free-electron lasers through self-amplified spontaneous emission are intrinsically noisy and vary from shot to shot. We extract the purified spectrum, corresponding to a Fourier-limited pulse, with the help of a deep neural network. It is trained on a huge number of spectra, which was made possible by an extremely efficient propagation of the Schrödinger equation with synthetic Hamilton matrices and random realizations of fluctuating pulses. We show that the trained network is sufficiently generic such that it can purify atomic or molecular spectra, dominated by resonant two- or three-photon ionization, non-linear processes which are particularly sensitive to pulse fluctuations. This is possible without training on those systems. This purification method implies the hidden information in the spectra that never be extracted in the analytical solution manner. By utilizing the perspective on the autoencoder model, we can then measure the complexity (information) of any given spectrum dataset.

        Speaker: Adisorn Panasawatwong (Max Plank Institute for the Physics of Complex Systems)
    • 10:00 12:00
      Session III
      • 10:00
        Programmatic and Deep Learning Analysis Pipelines for 4D-STEM Materials Science Experiments 1h

        Many materials science studies use scanning transmission electron microscopy (STEM) to characterize atomic-scale structure. Conventional STEM imaging experiments produce only a few intensity values at each probe position. However, modern high-speed detectors allow us to measure a full 2D diffraction pattern, over a grid of 2D probe positions, forming a four dimensional (4D)-STEM dataset. These 4D-STEM datasets record information about the local phase, orientation, deformation, and other parameters, for both crystalline and amorphous materials. However, 4D-STEM datasets can contain millions of images and therefore require highly automated and robust software codes in order to extract the target properties. In this talk, I will introduce our open source py4DSTEM analysis toolkit, and show how we use these codes to perform data-intensive studies of materials over functional length scales. I will also demonstrate some applications of modern machine learning tools, in order to perform measurements on electron diffraction patterns were property signals have been scrambled by multiple scattering of the electron beam. All of our analysis, simulation, and machine learning codes and datasets are freely available for download, as we try to adhere to FAIR data principles.

        Speaker: Colin Ophus (Lawrence Berkeley National Laboratory)
      • 11:00
        Coffee Break 30m
      • 11:30
        Development of an open source tool for automated crystal orientation mapping in the STEM 30m

        We present the development of an open source tool within the Python library pyxem for automated crystal orientation mapping in the scanning transmission electron microscope (STEM). An efficient and flexible template matching algorithm is developed, where simulated electron diffraction patterns are compared to experimental patterns obtained from scanning precession nanobeam electron diffraction. The tool is scalable allowing use of multi-core and GPU accelerated computation enabling fast analysis of the multidimensional data. Special emphasis was laid to present strategies for how such complex datasets can be shared and analyzed in a FAIR manner.

        Speaker: Christian Liebscher (MPIE)
    • 12:00 13:30
      Group Picture / Lunch Break 1h 30m
    • 13:30 17:45
      Session IV
      • 13:30
        Neural Methods for Reconstruction and Rendering of Real World Scenes 1h

        In the Visual Computing and Artificial Intelligence Department at MPI for Informatics, we investigate research questions at the intersection of computer graphics, computer vision and artificial intelligence. In this presentation, I will talk about some of the recent work we did on new methods for reconstructing high quality computer graphics models (shape, motion, appearance, material, illumination etc.) of real world scenes from sparse or even monocular video data.
        These methods bring together neural network-based and explicit model-based approaches and pave the way for better real world understanding from sparse camera data. I will also talk about new neural rendering approaches that combine explicit model-based and neural network based concepts for image formation in new ways. They enable new means to synthesize highly realistic imagery and videos of real world scenes under user control.

        Speaker: Christian Theobalt (MPI for Informatics)
      • 14:30
        Scientific Machine Learning for discovery of Phase Field models 30m

        In this work the concepts from scientific machine learning are employed to learn continuum phase field models directly from the experimental data of Scanning Transmission Electron Microscopy (STEM). Currently, we assume the form of the continuum model is known to be as Cahn-Hilliard/Allen-Cahn equations with a prior expression for free energy function. The unknown parameters of the continuum model are estimated using physics-informed neural networks (PINN). First the validation of the PINN approach is carried out on a synthetic dataset coming from a Cahn-Hilliard equation with known parameters. Later, it is applied on raw and
        noisy experimental data.

        Speaker: Sandeep Reddy Bukka (MPI Magdeburg)
      • 15:00
        Coffee Break 30m
      • 15:30
        Poster Session continued 45m
      • 16:15
        Data mining potential in atom probe tomography 30m

        Atom probe tomography is now an established near atomic-scale characterization technique. However, the traditional analysis often limits the subtle inherent details of field evaporation processes occurring near defects or multiple phases. We present two cases employing unconventional data mining routines on experimental data to extract valuable physical insights, supported by simulations. First, we utilize the correlations exhibited by field desorption and field evaporation in mass spectra to enable analytical field ion microscopy. Second, the real energy deficit due to defects is resolved from mass spectra using an approach that we term field evaporation energy loss spectroscopy.

        Speaker: Shyam Katnagallu (Max Planck Institut fur Eisenforschung)
      • 16:45
        A materials informatics framework to discover patterns in atom probe tomography data. 30m

        Atom probe tomography (APT) is a unique technique that provides 3D elemental distribution with near atomic resolution for a given material. However, the large amount of data acquired during the experiment and the complexity of the 3D microstructures poses a challenge to fully quantify APT data. Here, taking APT measurements corresponding to a Fe-doped Sm-Co alloy as an example, we present an approach based on unsupervised machine learning to extract different phases in the data. On top of this method, we have built a PCA-based workflow to quantify in-plane compositional and thickness fluctuations, and relative orientations of the precipitates.

        Speaker: Alaukik Saxena (Max-Planck-Institut für Eisenforschung GmbH ( Helmholtz School for Data Science in Life, Earth and Energy (HDS-LEE) ))
      • 17:15
        Quantitative three-dimensional imaging of chemical short-range order via machine learning enhanced atom probe tomography 30m

        Chemical short-range order (CSRO), referring to specific elements self-organising within a disordered matrix, can modify the properties of materials. CSRO is typically characterized via two-dimensional microscopy techniques that fail to capture three-dimensional atomistic architectures. Here, we present a machine-learning enhanced approach to reveal three-dimensional imaging of CSRO in body-centred-cubic Fe-18Al alloys. After validating our method against artificial data, we unearth non-statistical B2-CSRO instead of the generally-expected D03-CSRO. We propose quantitative correlations among annealing temperature, CSRO, and the nano-hardness and electrical resistivity. The proposed strategy can be generally employed to investigate short/medium/long-range ordering phenomena in a vast array of materials.

        Speaker: Yue Li (Max-Planck-Institut für Eisenforschung GmbH)
    • 18:30 20:30
      Conference Dinner 2h
    • 20:30 21:30
      Panel Discussion
    • 09:30 12:30
      Session V
      • 09:30
        Atomic Cluster Expansion and application to modelling of materials 1h

        Classical and machine learning interatomic potentials alike incorporate design choices that reflect the intuition of their authors and that are justified only a-posteriori by the performance of the model. Design choices comprise, for example, the form of the embedding function of an embedded atom potential or a specific angular dependence of a descriptor in a machine learning potential.
        The atomic cluster expansion (ACE) [1-3] takes a different route. Based on the broad assumption of locality, it establishes a complete and orthonormal basis for the space of local atomic configurations. The ACE basis functions immediately comply with the basic symmetry requirements of atomic scale physics, they are invariant under translation, rotation, inversion and permutation of atoms. This enables the systematic expansion and convergence of atomic scale properties in analogy to quantum mechanics, where one is used to converging basis functions for the accurate representation of energies and forces. And the completeness enables ACE to represent common machine learning descriptors and potentials.
        ACE has been implemented in the LAMMPS molecular dynamics simulation software package and its numerical efficiency is competitive or superior to other ML potentials [4]. After an introduction to ACE, I will discuss the parameterization of ACE from first principles reference data and the computation of thermodynamic and mechanical properties.
        Three factors are critical for obtaining accurate and transferable ACE, (i) an extensive, diverse and high-quality reference dataset, (ii) a robust and efficient training procedure, and (iii) a thorough validation including assessment of uncertainty. I will show how our parameterization strategy incorporates the three factors and enables near automatic construction and convergence of ACE [5].
        I will then discuss ACE for a number of elements, compounds and molecules and review their properties against reference data. Analysis of mechanical properties and automated free energy and phase diagram calculations [6] will be presented.

        [1] R. Drautz, Atomic cluster expansion for accurate and transferable interatomic potentials, Phys. Rev. B99, 014104 (2019).
        [2] G. Dusson, M. Bachmayr, G. Csanyi, R. Drautz, S. Etter, C. van der Oord, and C. Ortner, Atomic cluster expansion: Completeness, efficiency and stability (2020), arXiv:1911.03550v3.
        [3] R. Drautz, Atomic cluster expansion of scalar, vectorial, and tensorial properties including magnetism and charge transfer, Phys. Rev. B102, 024104 (2020).
        [4] Y. Lysogorskiy, C. van der Oord, A. Bochkarev, S. Menon, M. Rinaldi, T. Hammerschmidt, M. Mrovec, A. Thompson, G. Csanyi, C. Ortner, et al., Performant implementation of the atomic cluster expansion (PACE): Application to copper and silicon, Npj Computational Materials (2021).
        [5] A. Bochkarev, Y. Lysogorskiy, S. Menon, M. Qamar, M. Mrovec, and R. Drautz, Efficient parametrization of the atomic cluster expansion, Phys. Rev. Materials 6, 013804 (2022).
        [6] S. Menon, Y. Lysogorskiy, J. Rogal, and R. Drautz, Automated free-energy calculation from atomistic simulations, Phys. Rev. Materials 5, 103801 (2021).

        Speaker: Ralf Drautz (Ruhr-Universität Bochum)
      • 10:30
        Coffee Break 30m
      • 11:00
        Inverse design of multicomponent crystalline materials 30m

        Autonomous materials discovery with desired properties is one of the ultimate goals of materials science. We implemented and applied constrained crystal deep convolutional generative adversarial networks to design unreported (meta-)stable crystal structures. Using an image-based continuous latent space, the physical properties can be optimized while exploring a big chemical space. Our approach has been successfully applied to predict stable binary and multicomponent systems. This paves the way to achieve the inverse design of crystalline materials with optimal properties.

        Speaker: Hongbin Zhang (TU Darmstadt)
      • 11:30
        Partial order-disorder transitions in thermoelectric clathrates: toward non-linear modelling of materials properties 30m

        Intermetallic clathrate alloys are promising materials for thermoelectric applications. Their cage-like unit cell allows for tailoring the electronic properties through doping. Yet, a realistic theoretical description is hard to achieve due to the complex interplay between temperature, (dis)order and electronic properties. In this work, we show a novel approach to compute the temperature-dependent band structure of alloys and apply it to the clathrate Ba8AlxSi46-x. By doing so, we i) anticipate favorable concentration and temperature ranges for thermoelectric applications, and ii) highlight the need for improved non-linear modelling of complex materials, advancing ideas based on cluster expansion.

        Speaker: Santiago Rigamonti (Hubboldt-Universität zu Berlin)
      • 12:00
        Improving Normalizing Flows to Sample from Boltzmann Distributions 30m

        Sampling from Boltzmann distributions through normalizing flows promises to be computationally much cheaper than molecular dynamics (MD) simulations. However, flows struggle to approximate complicated target distributions due to topological constraints and still heavily rely on MD samples to be trained on. Here, we present two lines of research addressing these issues, the former by introducing a more expressive base distribution for normalizing flows and the latter through a novel bootstrapping training procedure using only samples from the flow as well as the density of the target.

        Speaker: Vincent Stimper (Max Planck Institute for Intelligent Systems)
    • 12:30 13:00
      Lunch and Farewell 30m