# Source Code

## LACOS Software Package

### Introduction

Metals that we encounter in our daily life are mostly alloys made by mixing two or more elements in some specific ratio. Almost all alloys are __ solid solutions__ in which arrangement of atoms in a lattice looks almost random or disordered, varying from region to region in the material and also changing each time it is synthesized. This is in contrast to ordered crystalline materials, like common salt (NaCl) in which Na and Cl atoms occupy well-defined positions on the lattice and is same for all crystals. For such an ordered crystal, one can define a unit-cell consisting of few atoms that can be periodically repeated in space in three directions to generate the bulk material. This also implies that all fundamental material properties can be understood from the unit-cell for an ordered crystalline material. For a solid solution, absence of atomic ordering on the lattice means such a unit-cell cannot be defined. It is important to distinguish solid solutions from amorphous materials like glass (SiO

_{2}) where the lattice framework itself is absent. Thus, in terms of

*orderliness of atoms*in materials, solid solutions fall in between fully ordered crystalline materials and disordered amorphous materials.

*Why should we care about atomic configuration?* The arrangement (or *configuration*) of atoms in the lattice is a reflection of the interaction, and hence the bonding, between atoms in the material. The configuration determines varied properties of a material; a good example is carbon that can take graphitic or diamond structure with widely different properties. From applied science and technology perspective, configuration of atoms in a material is a critical information that aids in accurately predicting and modeling its properties. This has been made possible because of significant progress made in material modeling based on Density Functional Theory (DFT). The DFT calculations, which treat atoms and electrons using quantum physics, has given power to probe fundamental nature of material properties (hence, they are also referred to as the first-principles or *abinitio* modeling). Questions like "what would happen to material property if atom X is replaced by atom Y?" can now be answered in detail, which makes it is possible to perform "experiment" on a computer without synthesizing in lab (*in-silico experiments*).

Due to quantum treatment of atoms, DFT calculations are computationally expensive and hence are limited to not more than 100-1000 atoms (most routine calculations would be limited to less than 100 atoms). For most ordered crystalline materials this is not a bottleneck as the number of atoms in their unit-cell is generally much less than 100. If their unit cell structure is known (from experiments), their fundamental properties can be predicted using DFT. For solid solutions, direct use of DFT has limitations. Absence of a unit-cell means that material properties must be obtained by averaging over all "probable" configurations. Thermodynamically, the probability of a configuration is determined by its energy through Boltzmann factor, e^{-E(σ)/kBT}, which implies that the probability for the occurrence of a configuration is higher at lower energies. Hence, to find these low-energy configurations, one need to search the space of all possible combinations of the arrangement of atoms on the lattice.

*Why not use DFT directly to perform the search?* Searching the configuration space of a solid solution requires calculating the energy of any configuration accurately and rapidly. Why rapidly? Because the configurational space is very very large. To illustrate the size, consider a region containing 100 atoms in a binary alloy with two elements, say Ni and Al. This will have 2^{100} possible configurations (since each site can be either Ni or Al). Even if one invokes lattice symmetries to remove equivalent configurations, the remaining number is still very large for applying DFT calculations. Extrapolating this to >100 atoms and more number of components (ternary, quaternary, quinary *etc*.), the configurational space size becomes astronomical. Searching such a large space for low-energy configurations require rapid calculation of energy which is not possible even with the biggest supercomputers of today (can quantum computer do this? – we leave this for future generation to answer).

*How to access the probable configurations in a solid-solution?* Since direct DFT calculation is not practical, one resorts to alternate route: create a model of the configuration energy. The model should satisfy following two criterions: (a) the accuracy of energy should be as close to DFT as possible, (b) the computational cost of the model should be minimal (both time and memory). Such a model can be used to either directly enumerate all configurations (for a reasonable representation of the system), or can be combined with statistical methods like Monte Carlo to access low-energy "probable" configurations of the solid solution.

* Cluster-expansion : An approach to model solid-solutions*. Currently, the state-of-the art approach is to perform cluster expansion (CE) of the configuration energy E(σ). The basis-set for CE is formed by clusters of lattice sites (also called "figures") - represented by correlation function ϕ that can be calculated readily for any cluster given its site occupations. Mathematically, the CE is represented by linear equation

**E = ΦJ**where

**J**is a vector of model parameters,

**E**is a vector of energies and

**Φ**is a matrix(formed by basic functions). Practical implementation of CE first solves the inverse problem to determine the parameters

*J*. Since the requirement is that CE model should be as accurate as DFT, the set of clusters(or "figures") and the corresponding

_{i}*J*'s are obtained by

_{i}*training*the model with a database of DFT energies. Thanks to the advancements in Machine Learning methods, this can be performed very efficiently and accurately. To search for low-energy configurations at

*finite temperatures*, the CE model is combined with Monte Carlo simulation. In a recent work, the DFT+CEMC framework has been used to reveal atomic-scale details in quaternary Ni-Al-Ti-Co alloys that forms the base composition for many Ni-based superalloys (see reference[1] below).

*Where would CE modeling help?* As mentioned in the beginning, most alloys and materials that we use today, from complex engineering structures to consumer products, are solid solutions. For *e.g.*, Ni-based superalloys used in hot-gas path of turbines for power generation and aircraft engines, cathode materials for battery and solid oxide fuel cell (SOFC)in electric vehicle, III-V semiconductor materials for low-power electronic applications, *etc*. There is ever growing demand to tune the composition of these materials to optimize various properties. Following only experimental approach to achieve this increases the cost of R&D. Moreover, the "Edisonian" trial-and-error approach practiced for many decades has reached its limits. It is imperative to understand and correlate material properties to fundamental physics at atomic-scale, creating a systematic methodology to design new materials and alloys. Need for modeling solid-solutions at atomic scale also appear while connecting engineering models (or parametric models) to the chemistry/chemical composition of the material. Parameters of such models when calculated accurately using *abinitio* would help remove error/ambiguity that arises when using fitted or empirical values.

*LACOS *Package

*LACOS*Package

To solve the problem of solid solution, LACOS package, a suite of codes for performing *abinitio* modeling of multicomponent solid solutions, has been developed. The general philosophy of the package is to construct models of the configurations energy E(σ) by combining statistical techniques with accurate DFT calculations (performed by external codes). Currently the LACOS package - an abbreviation of __L__attice __A__tomic __Co__nfiguration __S__imulation Package - contain two different implementations, one based on DFT+CEMC technique and the other based on machine learning approach. The DFT+CEMC technique follows the framework used in reference [1] given below, with additional advanced algorithms to perform __sparse feature selection.__ The sparse feature selection employs compressive from sensing (CS) theory from the signal processing domain which allows sparse signal reconstruction from under-sampled linear measurements (see reference [2] for an introduction to compressive sensing theory). If the data matrix, the matrix **Φ** in the case of cluster expansion, is such that it satisfies RIP (restricted isometry property), alternatively has low coherence, the CS theory guarantees the sparsest solution by solving the *L*_{0}-norm minimization problem. In LACOS, several "greedy" iterative algorithms for *L*_{0} minimization and algorithms for convex *L*_{1} minimization are available to construct sparse CE model. Also, various repetitive tasks have been automated. LACOS is written in Python with C modules to take care of compute-intense part of the modelling. Various components of the package and the flow between them is shown in the above figure, and is described briefly below.

*ConfMaker*

This code is used to create symmetry-distinct configurations based on the composition ranges for each component of the material. It creates the directory structure with required input files for the DFT calculations (currently, LACOS supports only VASP software). The script can also select configurations those are "almost" orthogonal to each other in cluster-space (a requirement for sparse CE).

*DFTrunner*

This code automates the generation of a DFT database. The directory tree created by *ConfMaker* becomes an input to this code that submits the calculation to a computing cluster in batches. The script automates monitoring the calculation, checking for convergence (and resubmission if required), parsing the outputs to create a database file (with prefix *DBinfo**). The database is used for training CE or machine learning models of configurational energy.

*PyClex*

An abbreviation of __Py__thon __Cl__uster __Ex__pansion, *PyClex* performs cluster expansion on the database generated by *DFTrunner*. The code can be used for multisite, multicomponent solid solutions. The "best" set of clusters {α} is obtained using model selection techniques. Specifically, the linear equation **E=ΦJ** is solved for under-determined condition M<N where M×N is the size of the matrix **Φ**; the matrix **Φ** is constructed such that it has low coherence. Various algorithms implemented in PyClex for model selection broadly falls in three categories:

- -
*Heuristic technique*: Genetic Algorithm (GA), modified GA for sparse model selection - -
*Approximate L*using "greedy" algorithms: Orthogonal Matching Pursuit (OMP), CoSaMP, Iterative hard thresholding (IHT) and its variants._{0}- minimization - -
*L*for sparse model selection: LASSO, Split-Bregman method, convex optimization (using cvxpy), ISTA/FISTA_{1}- minimization - -
*Mixed norm L*for_{2}/ L_{q}- minimization*block*sparse feature selection: iterative reweighted lease square (IRLS) algorithm - to be used for multicomponent materials.

The PyClex code is written in Python and uses Scikit-Learn library for machine learning. Heavy computation part of the code is handled by C functions with OpenMP parallelization.

__CPyMonC__

This is a Monte Carlo code to perform finite temperature simulation (the name is derived from __C__-__Py__thon __Mon__te __C__arlo). The "best" {α} and J_{α} from PyClex are inputs to this code. Monte Carlo simulation gives access to finite-T configurations of the material system, thus allowing various thermodynamic quantities to be calculated. The C functions of the code that handles the core Monte Carlo loop is parallelized using OpenMP.

__PyMLCE__

Though DFT+CEMC is the state-of-the-art in modeling solid-solutions, this method can become computationally expensive for routine calculations, especially for multicomponent materials. Sometimes one is interested in accessing only the low-energy configurations of the system. Recently, a new method has been developed to access these low-energy configurations using a machine learning model of E(σ). The *PyMLCE* (Python Machine Learning of Configuration Energy) code implements this method and can be used to obtain low-energy representative structures of the material that can be used for further DFT calculations.

The LACOS package is currently aimed at bulk system and is expected to evolve in future to include surfaces, thus expanding cluster expansion to model catalysts, electrochemistry of materials, near-surface alloys *etc*. Integration of the package with in-house developed modeling platform CINEMAS will also be undertaken in coming years.

Dr. Mahesh Chandran (www.ikst.res.in/mahesh) is the developer of the LACOS package.

__References__

- Mahesh Chandran,
*Multiscale abinitio simulation of Ni-based alloys: Real-space distribution of atoms in γ+γ' phase,*__Journal__: Computational Materials Science**108**, 192 (2015). Access Journal Site - M. A. Davenport,M. F. Duarte, Y. C. Eldar, G. Kutyniok, in
*Compressed Sensing: Theory and Applications,*Cambridge University Press, Cambridge (2012). To book site