Getting Started¶
Comparing crystals¶
amd.compare()
extracts crystals from one or more CIFs
and compares them by AMD or PDD. For example, to compare all crystals in a
cif by AMD with k = 100:
import amd
df = amd.compare('crystals.cif', by='AMD', k=100)
To compare by PDD, use by='PDD'
. A distance matrix is returned as a pandas DataFrame.
amd.compare()
can also take two paths to compare all
crystals in one file with those in the other.
If csd-python-api is installed, amd.compare()
can also accept lists of
CSD refcodes, or other formats.
Read, calculate descriptors and compare separately¶
amd.compare()
reads crystals, calculates AMD or PDD, and compares them. It is
sometimes useful to do these steps separately, e.g. to save the descriptors to a file. The code above using
amd.compare()
is equivalent to the following:
import amd
import pandas as pd
from scipy.spatial.distance import squareform
crystals = list(amd.CifReader('crystals.cif')) # read crystals
amds = [amd.AMD(crystal, 100) for crystal in crystals] # calculate AMDs
dm = squareform(amd.AMD_pdist(amds)) # compare AMDs pairwise
names = [crystal.name for crystal in crystals]
df = pd.DataFrame(dm, index=names, columns=names)
Here, amd.AMD_pdist()
is used to compare the AMDs pairwise, returning a condensed distance matrix (see
scipy.spatial.distance.squareform()
, which converts it to a symmetric 2D distance matrix). There is
an equivalent function for comparing PDDs, amd.PDD_pdist()
. There are also two cdist functions, which take
two collections of descriptors and compares everything in one set with the other returning a 2D distance matrix.
Write crystals or their descriptors to a file¶
pickle
is an easy way to store crystals or their descriptors.
import amd
import pickle
crystals = list(amd.CifReader('crystals.cif'))
with open('crystals.pkl', 'wb') as f: # write
pickle.dump(crystals, f)
with open('crystals.pkl', 'rb') as f: # read
crystals = pickle.load(f)
List of optional parameters¶
amd.compare()
reads crystals, computes their
invariants and compares them in one function for convinience. It accepts
most of optional parameters from any of these steps, all are listed below.
Reader options¶
Parameters of amd.CifReader
or amd.CSDReader
.
reader
(defaultase
) controls the backend package used to parse the file. Acceptsase
,pycodcif
,pymatgen
,gemmi
andccdc
(if these packages are installed). The ccdc reader can read formats accepted byccdc.io.EntryReader
.remove_hydrogens
(defaultFalse
) removes Hydrogen atoms from the structure.disorder
(defaultskip
) controls how disordered structures are handled. The default skips any crystal with disorder, since disorder conflicts somewhat with the periodic set model. Alternatively,ordered_sites
removes atoms with disorder andall_sites
includes all atoms regardless.show_warnings
(defaultTrue
) chooses whether to print warnings during reading, e.g. from disordered structures or crystals with missing data.heaviest_component
(defaultFalse
, CSD Python API only) removes all but the heaviest molecule in the asymmetric unit, intended for removing solvents.molecular_centres
(defaultFalse
, CSD Python API only) uses centres of molecules instead of atoms as the motif of the periodic set.families
(defaultFalse
, CSD Python API only) interprets the list of strings given as CSD refcode families and reads all crystals in those families.
PDD options¶
Parameters of amd.PDD()
. amd.AMD()
does not accept any optional parameters.
collapse
(defaultTrue
) chooses whether to collpase rows of PDDs which are similar enough (elementwise).collapse_tol
(default 0.0001) is the tolerance for collapsing PDD rows into one. The merged row is the average of those collapsed.
Comparison options¶
The first parameter metric
below is available to amd.PDD_pdist()
,
amd.PDD_cdist()
, amd.AMD_pdist()
and
amd.AMD_cdist()
. n_jobs
and verbose
only apply to PDD comparisons and
low_memory
only applies to AMD comparisons.
metric
(defaultchebyshev
) chooses the metric used to compare AMDs or PDD rows. See SciPy’s cdist/pdist for a list of accepted metrics.n_jobs
(requiresby='PDD'
, defaultNone
) is the number of cores to use for multiprocessing (passed tojoblib.Parallel
). Pass -1 to use the maximum.verbose
(requiresby='PDD'
, default 0) controls the verbosity level, increasing with larger numbers. This is passed tojoblib.Parallel
, see its documentation for details.low_memory
(requiresby='AMD'
andmetric='chebyshev'
, defaultFalse
) uses a slower algorithm with a smaller memory footprint, better for large input sizes.