# Source Code

## LACOS Software Package

### Introduction

Metals that we encounter in our daily life are mostly alloys made by mixing two or more elements in
some specific ratio. Almost all alloys are __ solid solutions__ in which arrangement of
atoms in a lattice looks almost random or disordered, varying from region to region in the material
and also changing each time it is synthesized. This is in contrast to ordered crystalline materials,
like common salt (NaCl) in which Na and Cl atoms occupy well-defined positions on the lattice and is
same for all crystals. For such an ordered crystal, one can define a unit-cell consisting of few
atoms that can be periodically repeated in space in three directions to generate the bulk material.
This also implies that all fundamental material properties can be understood from the unit-cell for
an ordered crystalline material. For a solid solution, absence of atomic ordering on the lattice
means such a unit-cell cannot be defined. It is important to distinguish solid solutions from
amorphous materials like glass (SiO

_{2}) where the lattice framework itself is absent. Thus, in terms of

*orderliness of atoms*in materials, solid solutions fall in between fully ordered crystalline materials and disordered amorphous materials.

*Why should we care about atomic configuration?* The arrangement (or *configuration*) of
atoms in the lattice is a reflection of the interaction, and hence the bonding, between atoms in the
material. The configuration determines varied properties of a material; a good example is carbon
that can take graphitic or diamond structure with widely different properties. From applied science
and technology perspective, configuration of atoms in a material is a critical information that aids
in accurately predicting and modeling its properties. This has been made possible because of
significant progress made in material modeling based on Density Functional Theory (DFT). The DFT
calculations, which treat atoms and electrons using quantum physics, has given power to probe
fundamental nature of material properties (hence, they are also referred to as the first-principles
or *abinitio* modeling). Questions like "what would happen to material property if atom X is
replaced by atom Y?" can now be answered in detail, which makes it is possible to perform
"experiment" on a computer without synthesizing in lab (*in-silico experiments*).

Due to quantum treatment of atoms, DFT calculations are computationally expensive and hence are
limited to not more than 100-1000 atoms (most routine calculations would be limited to less than 100
atoms). For most ordered crystalline materials this is not a bottleneck as the number of atoms in
their unit-cell is generally much less than 100. If their unit cell structure is known (from
experiments), their fundamental properties can be predicted using DFT. For solid solutions, direct
use of DFT has limitations. Absence of a unit-cell means that material properties must be obtained
by averaging over all "probable" configurations. Thermodynamically, the probability of a
configuration is determined by its energy through Boltzmann factor, e^{-E(σ)/kBT},
which implies that the probability for the occurrence of a configuration is higher at lower
energies. Hence, to find these low-energy configurations, one need to search the space of all
possible combinations of the arrangement of atoms on the lattice.

*Why not use DFT directly to perform the search?* Searching the configuration space of a solid
solution requires calculating the energy of any configuration accurately and rapidly. Why rapidly?
Because the configurational space is very very large. To illustrate the size, consider a region
containing 100 atoms in a binary alloy with two elements, say Ni and Al. This will have
2^{100} possible configurations (since each site can be either Ni or Al). Even if one
invokes lattice symmetries to remove equivalent configurations, the remaining number is still very
large for applying DFT calculations. Extrapolating this to >100 atoms and more number of components
(ternary, quaternary, quinary *etc*.), the configurational space size becomes astronomical.
Searching such a large space for low-energy configurations require rapid calculation of energy which
is not possible even with the biggest supercomputers of today (can quantum computer do this? – we
leave this for future generation to answer).

*How to access the probable configurations in a solid-solution?* Since direct DFT calculation is
not practical, one resorts to alternate route: create a model of the configuration energy. The model
should satisfy following two criterions: (a) the accuracy of energy should be as close to DFT as
possible, (b) the computational cost of the model should be minimal (both time and memory). Such a
model can be used to either directly enumerate all configurations (for a reasonable representation
of the system), or can be combined with statistical methods like Monte Carlo to access low-energy
"probable" configurations of the solid solution.

* Cluster-expansion : An approach to model solid-solutions*. Currently, the state-of-the
art approach is to perform cluster expansion (CE) of the configuration energy E(σ). The
basis-set for CE is formed by clusters of lattice sites (also called "figures") - represented by
correlation function ϕ that can be calculated readily for any cluster given its site
occupations. Mathematically, the CE is represented by linear equation

**E = ΦJ**where

**J**is a vector of model parameters,

**E**is a vector of energies and

**Φ**is a matrix(formed by basic functions). Practical implementation of CE first solves the inverse problem to determine the parameters

*J*. Since the requirement is that CE model should be as accurate as DFT, the set of clusters(or "figures") and the corresponding

_{i}*J*'s are obtained by

_{i}*training*the model with a database of DFT energies. Thanks to the advancements in Machine Learning methods, this can be performed very efficiently and accurately. To search for low-energy configurations at

*finite temperatures*, the CE model is combined with Monte Carlo simulation. In a recent work, the DFT+CEMC framework has been used to reveal atomic-scale details in quaternary Ni-Al-Ti-Co alloys that forms the base composition for many Ni-based superalloys (see reference[1] below).

*Where would CE modeling help?* As mentioned in the beginning, most alloys and materials that we
use today, from complex engineering structures to consumer products, are solid solutions. For *e.g.*,
Ni-based superalloys used in hot-gas path of turbines for power generation and aircraft engines,
cathode materials for battery and solid oxide fuel cell (SOFC)in electric vehicle, III-V
semiconductor materials for low-power electronic applications, *etc*. There is ever growing
demand to tune the composition of these materials to optimize various properties. Following only
experimental approach to achieve this increases the cost of R&D. Moreover, the "Edisonian"
trial-and-error approach practiced for many decades has reached its limits. It is imperative to
understand and correlate material properties to fundamental physics at atomic-scale, creating a
systematic methodology to design new materials and alloys. Need for modeling solid-solutions at
atomic scale also appear while connecting engineering models (or parametric models) to the
chemistry/chemical composition of the material. Parameters of such models when calculated accurately
using *abinitio* would help remove error/ambiguity that arises when using fitted or empirical
values.

*LACOS *Package

*LACOS*Package

To solve the problem of solid solution, LACOS package, a suite of codes for performing
*abinitio* modeling of multicomponent solid solutions, has been developed. The general
philosophy of the package is to construct models of the configurations energy E(σ) by
combining statistical techniques with accurate DFT calculations (performed by external codes).
Currently the LACOS package - an abbreviation of __L__attice __A__tomic __Co__nfiguration
__S__imulation Package - contain two different implementations, one based on DFT+CEMC technique
and the other based on machine learning approach. The DFT+CEMC technique follows the framework used
in reference [1] given below, with additional advanced algorithms to perform __sparse feature
selection.__ The sparse feature selection employs compressive from sensing (CS) theory from
the signal processing domain which allows sparse signal reconstruction from under-sampled linear
measurements (see reference [2] for an introduction to compressive sensing theory). If the data
matrix, the matrix **Φ** in the case of cluster expansion, is such that it satisfies RIP
(restricted isometry property), alternatively has low coherence, the CS theory guarantees the
sparsest solution by solving the *L*_{0}-norm minimization problem. In LACOS, several
"greedy" iterative algorithms for *L*_{0} minimization and algorithms for convex
*L*_{1} minimization are available to construct sparse CE model. Also, various
repetitive tasks have been automated. LACOS is written in Python with C modules to take care of
compute-intense part of the modelling. Various components of the package and the flow between them
is shown in the above figure, and is described briefly below.

*ConfMaker*

This code is used to create symmetry-distinct configurations based on the composition ranges for
each component of the material. It creates the directory structure with required input files for the
DFT calculations (currently, LACOS supports only VASP software). The script can also select
configurations those are "almost" orthogonal to each other in cluster-space (a requirement for
sparse CE).

*DFTrunner*

This code automates the generation of a DFT database. The directory tree created by *ConfMaker*
becomes an input to this code that submits the calculation to a computing cluster in batches. The
script automates monitoring the calculation, checking for convergence (and resubmission if
required), parsing the outputs to create a database file (with prefix *DBinfo**). The database
is used for training CE or machine learning models of configurational energy.

*PyClex*

An abbreviation of __Py__thon __Cl__uster __Ex__pansion, *PyClex* performs cluster
expansion on the database generated by *DFTrunner*. The code can be used for multisite,
multicomponent solid solutions. The "best" set of clusters {α} is obtained using model
selection techniques. Specifically, the linear equation **E=ΦJ** is solved for
under-determined condition M<N where M×N is the size of the matrix **Φ**; the matrix
**Φ** is constructed such that it has low coherence. Various algorithms implemented in PyClex
for model selection broadly falls in three categories:

- -
*Heuristic technique*: Genetic Algorithm (GA), modified GA for sparse model selection - -
*Approximate L*using "greedy" algorithms: Orthogonal Matching Pursuit (OMP), CoSaMP, Iterative hard thresholding (IHT) and its variants._{0}- minimization - -
*L*for sparse model selection: LASSO, Split-Bregman method, convex optimization (using cvxpy), ISTA/FISTA_{1}- minimization - -
*Mixed norm L*for_{2}/ L_{q}- minimization*block*sparse feature selection: iterative reweighted lease square (IRLS) algorithm - to be used for multicomponent materials.

The PyClex code is written in Python and uses Scikit-Learn library for machine learning. Heavy computation part of the code is handled by C functions with OpenMP parallelization.

__CPyMonC__

This is a Monte Carlo code to perform finite temperature simulation (the name is derived from
__C__-__Py__thon __Mon__te __C__arlo). The "best" {α} and J_{α} from
PyClex are inputs to this code. Monte Carlo simulation gives access to finite-T configurations of
the material system, thus allowing various thermodynamic quantities to be calculated. The C
functions of the code that handles the core Monte Carlo loop is parallelized using OpenMP.

__PyMLCE__

Though DFT+CEMC is the state-of-the-art in modeling solid-solutions, this method can become
computationally expensive for routine calculations, especially for multicomponent materials.
Sometimes one is interested in accessing only the low-energy configurations of the system. Recently,
a new method has been developed to access these low-energy configurations using a machine learning
model of E(σ). The *PyMLCE* (Python Machine Learning of Configuration Energy) code
implements this method and can be used to obtain low-energy representative structures of the
material that can be used for further DFT calculations.

The LACOS package is currently aimed at bulk system and is expected to evolve in future to include
surfaces, thus expanding cluster expansion to model catalysts, electrochemistry of materials,
near-surface alloys *etc*. Integration of the package with in-house developed modeling platform
CINEMAS will also be undertaken in coming years.

Dr. Mahesh Chandran (www.ikst.res.in/mahesh) is the developer of the LACOS package.

__References__

- Mahesh Chandran,
*Multiscale abinitio simulation of Ni-based alloys: Real-space distribution of atoms in γ+γ' phase,*__Journal__: Computational Materials Science**108**, 192 (2015). Access Journal Site - M. A. Davenport,M. F. Duarte, Y. C. Eldar, G. Kutyniok, in
*Compressed Sensing: Theory and Applications,*Cambridge University Press, Cambridge (2012). To book site