# Source Code

## LACOS Software Package

### Introduction

Metals that we encounter in our daily life are mostly alloys created by mixing two or more elements in some specific ratio that control its properties (mechanical, physical, chemical etc.). Almost all alloys are __solid solutions__ where atoms are arranged on crystal lattice with no spatial order. The arrangement of atoms on the crystal lattice (called configuration) varies across the material, and changes each time it is synthesized. This is in contrast to ordered crystalline materials, like common salt (NaCl) where Na and Cl occupy same well-defined positions in the lattice every time it is prepared. For the ordered crystalline materials, it is possible to define a unit-cell - consisting of few atoms - which can be periodically repeated in space in all directions to generate the bulk material. In solid solutions, such a unit-cell cannot be defined due to random occupation of lattice sites by atoms. Solid solutions differ from amorphous materials (like SiO_{2}) where even the periodic lattice framework is absent.

*Why information of atomic arrangement on lattice is important?* Thanks to rapid progress in atomic-scale modeling of materials in last few decades, if the atomic configuration is known for any material, today it is possible to predict its properties with good accuracy using quantum-calculations based on Density Functional Theory (DFT). These calculations, also referred to as the first-principles or abinitio modeling, can be used to search materials with required properties without synthesizing in lab (in-silico high-throughput screening). For ordered crystalline materials, occupation of lattice sites in the unit-cell can be obtained from experiments (and corresponds to ground-state or lowest energy configuration), which in turn can be used to calculate all material properties using DFT. In the absence of a unit-cell in solid solutions, the material properties are obtained by averaging over all "probable" configurations of atoms on the lattice sites. Thermodynamically, the probability of a configuration is determined by its energy through Boltzmann factor: e^{-E(σ)/kBT} : lower the energy higher the probability for the occurrence of that configuration. Interestingly, the lowest-energy configuration changes with system size or number of atoms in solid solutions (and is the reason why a unit-cell cannot be described for solid solutions).

*Why not use DFT to find all probable configurations?* Determining all the "low" energy configurations as the number of atoms is increased becomes computationally challenging due to exponential growth in the size of the configuration space. A binary alloy system with two elements, such as Ni-Al, containing only 100 atoms will have 2^{100} possible configurations. Even if one invokes lattice symmetries to remove equivalent configurations, this number decreases but can still be very large. It is easy to gauge the size of the configuration space when the number of atoms or number of components (elements) in the material increases. The DFT calculations, though very accurate, is computationally expensive and routine calculations are restricted to less than 200 atoms. In such a scenario, attempting to search "probable" low-energy configurations in the full configurational space of even 50 atoms by direct DFT becomes computationally prohibitive even on a supercomputer.

*How to access the low-energy configurations in solid-solutions?* Since direct DFT calculation is not practical, one resorts to creating a model of the configurational energy. Any such model should have following two characteristics: (a) It should output energy with almost same accuracy as DFT (closer the energy to DFT energy, better the model). (b) It should perform the calculations with minimal computational cost (faster the better). Such a model can be used to either directly enumerate all configurations (for small system size) or can be combined with statistical methods like Monte Carlo to access low-energy configurations.

** Cluster-expansion**: an approach to model solid-solutions. Currently, the state-of-the art approach is to perform cluster expansion (CE) of the configuration energy E(σ). The basis-set for CE is formed by cluster of lattice sites (also called "figures") - a geometrical entity and represented by correlation function ϕ which can be calculated readily for any cluster given its site occupations. Mathematically, the CE is represented by E = Σ J

_{α}Φ

_{α}where the summation is over a set of clusters {α} and the J

_{α}’s are the parameters of the model. Both these quantities are obtained by "training" the CE energy model using a database of energies E

_{DFT}calculated using DFT for a set of configurations. Given the training-set and the model energy function, the problem of obtaining J

_{α}and {α} is similar to the problems addressed by Machine Learning methods today. Once an accurate CE model has been developed, it can be used to perform Monte Carlo simulation which "generates" the required low-energy configurations (through Markov chain). In a recent work, the DFT+CEMC framework has been used to reveal atomic scale details in quaternary Ni-Al-Ti-Co alloys which is the base composition for many Ni-based superalloys used in aircraft engines and gas turbines (see reference below).

*How and where modeling solid solutions will help?* As mentioned in the beginning, almost all alloys that we use today, from complex engineering structures to consumer products, are solid solutions. Some of these can be very complex, such as Ni-based superalloys which are used in turbines for power generation and aircraft engines and contains more than 8 elements! There is ever growing demand to tune the composition of these materials to optimize various properties. Using only experimental approach to achieve this require significant investment in R&D. Moreover, the "Edisonian" trial-and-error approach practiced for many decades has reached its limits. It is imperative to understand and correlate material properties to fundamental physics at atomic-scale, creating a systematic methodology to design and discover new materials and alloys. Need for modeling solid-solutions at atomic scale also appears while connecting engineering models (or parametric models) to the chemistry/chemical composition of the material. Parameters in such models if calculated accurately using abinitio calculations would help remove errors/ambiguity that arises when using fitted or empirical values.

### LACOS Package

The above set of questions led to the development of LACOS package, a suite of codes for performing abinitio modeling of multicomponent solid-solution materials. The general philosophy of the package is to construct models of the configurations energy E(σ) by combining statistical techniques with accurate DFT calculations (performed by external codes). Currently the LACOS package - abbreviation of __L__attice __A__tomic __C__onfiguration __S__imulation __P__ackage - contains two different implementations, one based on DFT+CEMC technique, and the other is a new machine learning approach. The DFT+CEMC technique follows the framework used in Reference given below, with additional advanced algorithms added to make model-selection more robust and to automate repetitive tasks (such as DFT calculations for generating database). Various components and the communication flow between them is shown in the above figure, and is described in brief below.

**ConfMaker**

This script is used to create symmetry-distinct configurations based on the composition ranges for each component of the material. The script creates the directory structure with required input files for the DFT calculation for each configuration (currently, LACOS supports only VASP software). Written in Python, the code allows consideration of vacancies in the structure.

**DFTrunner**

The entire directory structure created by ConfMaker can be passed on to this script which submits the calculation to computing cluster in batches. The script monitors the calculations, checks for convergence (and resubmits if required), parses the output-files to create a summary file containing important results of the calculation (energy, final atomic coordinates, magnetic moment, band-gap, etc.). These files (with prefix DBinfo*) form the DFT database to be used for training cluster expansion or machine learning models of configurational energy.

**PyClex**

An abbreviation of __Py__thon __Cl__uster __Ex__pansion, PyClex performs cluster expansion on the DFT database generated by DFTrunner. The code can be used for multisite, multicomponent solid solutions. The "best" set of clusters {α} for the cluster expansion is obtained using model selection techniques. Various algorithms implemented in PyClex for model selection broadly falls in three categories:

*Heuristic technique*: Genetic Algorithm (GA), modified GA for sparse model selection*Approximate L*: Orthogonal Matching Pursuit (OMP), smoothed L_{0}-minimization using "greedy" algorithms_{0}algorithm (SL0)*L*: LASSO, Split-Bregman method, Iterative hard thresholding (IHT)_{1}-minimization for sparse model selection

Bayesian approach for model selection, which is currently under implementation, will be made available in future version. The PyClex code is written in Python and uses Scikit-Learn library for machine learning. Heavy computation part of the code is handled by C functions (coupled to Python through *ctypes*) with *OpenMP* parallelization.

**CPyMonC**

This is Monte Carlo code to perform finite temperature simulation (the name is derived from __C__-__P__ython __M__onte __C__arlo). The "best" {α} and J_{α} from PyClex is fed as input to this code. Monte Carlo simulation gives access to finite-T configurations of the material system allowing various thermodynamic quantities to be calculated. The C functions of the code, which handles the core Monte Carlo loops, is parallelized using OpenMP.

**PyMLCE**

Though DFT+CEMC is the state of the art in modeling solid-solutions, this method can become computationally expensive for routine calculations of material properties, especially for multicomponent materials. Sometimes one is interested in accessing only the low-energy configurations of the system. Recently, a new method has been developed to access these low-energy configurations using a machine learning model of E(σ) using certain descriptors of the real-space distribution of atoms. The PyMLCE (Python Machine Learning of Configuration Energy) code implements this method and can be used to obtain low-energy representative configurations, which can be used for further DFT calculations.

The LACOS package is currently aimed at bulk system and is expected to evolve in future to include surfaces, thus expanding cluster expansion to model catalysts, electrochemistry of materials, near-surface alloys etc. Integration of the package with in-house developed material science modeling platform CINEMAS will also be undertaken.

Dr. Mahesh Chandran (www.ikst.res.in/mahesh.shtml) is the developer of the LACOS package.

*ConfMaker and DFTrunner creates database of quantum-calculated quantities (such as configuration energy)*

## Reference

- Mahesh Chandran, Multiscale abinitio simulation of Ni-based alloys: Real-space distribution of atoms in γ+γ' phase
__Journal__: Computational Materials Science 108, 192 (2015) Access Journal Site