amd.compare module¶
Functions for comparing AMDs and PDDs of crystals.
- amd.compare.compare(crystals, crystals_=None, by='AMD', k=100, **kwargs)¶
Given one or two sets of periodic set(s), refcode(s) or cif(s), compare them returning a DataFrame of the distance matrix. Default is to comapre by PDD with k=100. Accepts most keyword arguments accepted by the CifReader, CSDReader and compare functions, for a full list see the documentation Quick Start page. Note that using refcodes requires csd-python-api.
- Parameters
crystals (array or list of arrays) – One or a collection of paths, refcodes, file objects or
periodicset.PeriodicSet
s.crystals_ (array or list of arrays, optional) – One or a collection of paths, refcodes, file objects or
periodicset.PeriodicSet
s.by (str, default 'AMD') – Invariant to compare by, either ‘AMD’ or ‘PDD’.
k (int, default 100) – k value to use for the invariants (length of AMD, or number of columns in PDD).
- Returns
df – DataFrame of the distance matrix for the given crystals compared by the chosen invariant.
- Return type
- Raises
ValueError – If by is not ‘AMD’ or ‘PDD’, if either set given have no valid crystals to compare, or if crystals or crystals_ are an invalid type.
Examples
Compare everything in a .cif (deafult, AMD with k=100):
df = amd.compare('data.cif')
Compare everything in one cif with all crystals in all cifs in a directory (PDD, k=50):
df = amd.compare('data.cif', 'dir/to/cifs', by='PDD', k=50)
Examples (csd-python-api only)
Compare two crystals by CSD refcode (PDD, k=50):
df = amd.compare('DEBXIT01', 'DEBXIT02', by='PDD', k=50)
Compare everything in a refcode family (AMD, k=100):
df = amd.compare('DEBXIT', families=True)
- amd.compare.EMD(pdd: numpy.ndarray, pdd_: numpy.ndarray, metric: Optional[str] = 'chebyshev', return_transport: Optional[bool] = False, **kwargs)¶
Earth mover’s distance (EMD) between two PDDs, also known as the Wasserstein metric.
- Parameters
pdd (numpy.ndarray) – PDD of a crystal.
pdd_ (numpy.ndarray) – PDD of a crystal.
metric (str or callable, default 'chebyshev') – EMD between PDDs requires defining a distance between PDD rows. By default, Chebyshev (L-infinity) distance is chosen as with AMDs. Accepts any metric accepted by
scipy.spatial.distance.cdist()
.return_transport (bool, default False) – Return a tuple
(distance, transport_plan)
with the optimal transport.
- Returns
emd – Earth mover’s distance between two PDDs.
- Return type
float
- Raises
ValueError – Thrown if
pdd
andpdd_
do not have the same number of columns (k
value).
- amd.compare.AMD_cdist(amds: Union[numpy.ndarray, List[numpy.ndarray]], amds_: Union[numpy.ndarray, List[numpy.ndarray]], metric: str = 'chebyshev', low_memory: bool = False, **kwargs) numpy.ndarray ¶
Compare two sets of AMDs with each other, returning a distance matrix. This function is essentially identical to
scipy.spatial.distance.cdist()
with the default metricchebyshev
.- Parameters
amds (array_like) – A list of AMDs.
amds_ (array_like) – A list of AMDs.
metric (str or callable, default 'chebyshev') – Usually AMDs are compared with the Chebyshev (L-infinitys) distance. Can take any metric accepted by
scipy.spatial.distance.cdist()
.low_memory (bool, default False) – Use a slower but more memory efficient method for large collections of AMDs (Chebyshev metric only).
- Returns
dm – A distance matrix shape
(len(amds), len(amds_))
.dm[ij]
is the distance (given bymetric
) betweenamds[i]
andamds[j]
.- Return type
- amd.compare.AMD_pdist(amds: Union[numpy.ndarray, List[numpy.ndarray]], metric: str = 'chebyshev', low_memory: bool = False, **kwargs) numpy.ndarray ¶
Compare a set of AMDs pairwise, returning a condensed distance matrix. This function is essentially identical to
scipy.spatial.distance.pdist()
with the default metricchebyshev
.- Parameters
amds (array_like) – An array/list of AMDs.
metric (str or callable, default 'chebyshev') – Usually AMDs are compared with the Chebyshev (L-infinity) distance. Can take any metric accepted by
scipy.spatial.distance.pdist()
.low_memory (bool, default False) – Optionally use a slightly slower but more memory efficient method for large collections of AMDs (Chebyshev metric only).
- Returns
Returns a condensed distance matrix. Collapses a square distance matrix into a vector, just keeping the upper half. See
scipy.spatial.distance.squareform()
to convert to a square distance matrix or for more on condensed distance matrices.- Return type
- amd.compare.PDD_cdist(pdds: List[numpy.ndarray], pdds_: List[numpy.ndarray], metric: str = 'chebyshev', n_jobs=None, verbose=0, **kwargs) numpy.ndarray ¶
Compare two sets of PDDs with each other, returning a distance matrix.
- Parameters
pdds (List[numpy.ndarray]) – A list of PDDs.
pdds_ (List[numpy.ndarray]) – A list of PDDs.
metric (str or callable, default 'chebyshev') – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric accepted by
scipy.spatial.distance.cdist()
.n_jobs (int, default None) – Maximum number of concurrent jobs for parallel processing with joblib. Set to -1 to use the maximum possible. Note that for small inputs (< 100), using parallel processing may be slower than the default n_jobs=None.
verbose (int, default 0) – The verbosity level. Higher = more verbose, see joblib.Parallel.
- Returns
Returns a distance matrix shape
(len(pdds), len(pdds_))
. The \(ij\) th entry is the distance betweenpdds[i]
andpdds_[j]
given by Earth mover’s distance.- Return type
- amd.compare.PDD_pdist(pdds: List[numpy.ndarray], metric: str = 'chebyshev', n_jobs=None, verbose=0, **kwargs) numpy.ndarray ¶
Compare a set of PDDs pairwise, returning a condensed distance matrix.
- Parameters
pdds (List[numpy.ndarray]) – A list of PDDs.
metric (str or callable, default 'chebyshev') – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric accepted by
scipy.spatial.distance.pdist()
.n_jobs (int, default None) – Maximum number of concurrent jobs for parallel processing with joblib. Set to -1 to use the maximum possible. Note that for small inputs (< 100), using parallel processing may be slower than the default n_jobs=None.
verbose (int, default 0) – The verbosity level. Higher = more verbose, see joblib.Parallel for more.
- Returns
Returns a condensed distance matrix. Collapses a square distance matrix into a vector just keeping the upper half. See
scipy.spatial.distance.squareform()
to convert to a square distance matrix or for more on condensed distance matrices.- Return type
- amd.compare.emd(pdd: numpy.ndarray, pdd_: numpy.ndarray, metric: Optional[str] = 'chebyshev', return_transport: Optional[bool] = False, **kwargs)¶
Alias for amd.EMD().