Using PDDs¶
Calculation¶
The pointwise distance distribution (PDD) of a crystal is given by amd.PDD()
.
It accepts a crystal and an integer k, returning the \(\text{PDD}_k\) as a 2D
NumPy array with k + 1 columns, the weights of each row being in the first column.
If you have a .cif file, use amd.CifReader
to read the crystals
(see Reading cifs). If csd-python-api is installed, amd.CSDReader
accepts CSD refcodes (see Reading from the CSD).
# get PDDs of crystals in a .cif
reader = amd.CifReader('file.cif')
pdds = [amd.PDD(crystal, 100) for crystal in reader]
# get PDDs of DEBXIT01 and DEBXIT02 from the CSD
crystals = list(amd.CSDReader(['DEBXIT01', 'DEBXIT02']))
pdds = [amd.PDD(crystal, 50) for crystal in crystals]
You can also give the coordinates of motif points and unit cell as a tuple of numpy arrays, in Cartesian form:
# PDD (k=10) of 3D cubic lattice
motif = np.array([[0,0,0]])
cell = np.identity(3)
cubic_lattice = (motif, cell)
cubic_pdd = amd.PDD(cubic_lattice, 10)
The object returned by amd.PDD()
is a NumPy array with k + 1 columns.
Calculation options¶
amd.PDD()
accepts a few optional arguments (not relevant to amd.AMD()
):
amd.PDD(
periodic_set, # input crystal (a PeriodicSet or tuple)
k, # k value (number of neighbours)
lexsort=True, # lexicograpgically sort rows
collapse=True, # collpase identical rows (within tolerance)
collapse_tol=1e-4, # tolerance for collapsing rows
return_row_groups=False # return info about which rows collapsed
)
lexsort
lexicograpgically orders the rows of the PDD, and collapse
merges rows
if all elements of rows are within collapse_tol
. The definition of PDD requires both
in order to technically satisfy invariance (that two isometric sets have equal PDDs). But,
Earth mover’s distance does not depend on row order, so lexsort=False
makes no
difference to comparisons. Setting collapse=False
will also not change Earth mover’s
distances, but comparisons may take longer.
Sometimes it’s useful to know which rows of the PDD came from which motif points in the input.
Setting return_row_groups=True
makes the function return a tuple (pdd, groups)
, where
groups[i]
contains the indices of the point(s) corresponding to pdd[i]
. Note that these
indices are for the asymmetric unit, whose indices in periodic_set.motif
are
accessible through periodic_set.asym_unit
if it exists.
Comparison¶
The Earth mover’s distance is
the appropriate metric to compare PDDs. The amd.compare
module contains functions
for these comparisons.
amd.PDD_pdist()
and amd.PDD_cdist()
are like SciPy’s functions
pdist
and cdist
. pdist
takes one set and compares all elements pairwise,
whereas cdist
takes two sets and compares elements in one with the other.
cdist
returns a 2D distance matrix, but pdist
returns a condensed distance matrix
(see SciPy’s pdist function).
# compare crystals in file1.cif with those in file2.cif by PDD, k=100
pdds1 = [amd.PDD(crystal, 100) for crystal in amd.CifReader('file1.cif')]
pdds2 = [amd.PDD(crystal, 100) for crystal in amd.CifReader('file2.cif')]
distance_matrix = amd.PDD_cdist(pdds1, pdds2)
# compare everything in file1.cif with each other
condensed_dm = amd.PDD_pdist(pdds1)
You can compare one PDD with another with amd.EMD()
:
# compare DEBXIT01 and DEBXIT02 by PDD, k=100
pdds = [amd.PDD(crystal, 100) for crystal in amd.CSDReader(['DEBXIT01', 'DEBXIT02'])]
distance = amd.EMD(pdds[0], pdds[1])
amd.EMD()
, amd.PDD_pdist()
and amd.PDD_cdist()
all accept
an optional argument metric
, which can be anything accepted by SciPy’s pdist/cdist functions.
The metric used to compare PDDs is always Earth mover’s distance, but this still requires another metric
between the rows of PDDs (so technically there’s a different Earth mover’s distance for each choice of metric).
Comparison options and multiprocessing¶
amd.PDD_pdist()
and amd.PDD_cdist()
share the following optional arguments:
metric
(defaultchebyshev
) chooses the metric used to compare PDD rows. See SciPy’s cdist/pdist for a list of accepted metrics.n_jobs
(new in 1.2.3, defaultNone
) is the number of cores to use for multiprocessing (passed tojoblib.Parallel
). Pass -1 to use the maximum.backend
(defaultmultiprocessing
) is the parallelization backend implementation for PDD comparisons.verbose
(requiresby='PDD'
, defaultFalse
) controls the verbosity level. With parallel processing the verbose argument ofjoblib.Parallel
is used, otherwisetqdm
is used.