amd.compare module

Functions for comparing AMDs and PDDs of crystals.

amd.compare.compare(crystals, crystals_=None, by='AMD', k=100, **kwargs)

Given one or two sets of periodic set(s), refcode(s) or cif(s), compare them returning a DataFrame of the distance matrix. Default is to comapre by PDD with k=100. Accepts most keyword arguments accepted by the CifReader, CSDReader and compare functions, for a full list see the documentation Quick Start page. Note that using refcodes requires csd-python-api.

Parameters
  • crystals (array or list of arrays) – One or a collection of paths, refcodes, file objects or periodicset.PeriodicSet s.

  • crystals_ (array or list of arrays, optional) – One or a collection of paths, refcodes, file objects or periodicset.PeriodicSet s.

  • by (str, default 'AMD') – Invariant to compare by, either ‘AMD’ or ‘PDD’.

  • k (int, default 100) – k value to use for the invariants (length of AMD, or number of columns in PDD).

Returns

df – DataFrame of the distance matrix for the given crystals compared by the chosen invariant.

Return type

pandas.DataFrame

Raises

ValueError – If by is not ‘AMD’ or ‘PDD’, if either set given have no valid crystals to compare, or if crystals or crystals_ are an invalid type.

Examples

Compare everything in a .cif (deafult, AMD with k=100):

df = amd.compare('data.cif')

Compare everything in one cif with all crystals in all cifs in a directory (PDD, k=50):

df = amd.compare('data.cif', 'dir/to/cifs', by='PDD', k=50)

Examples (csd-python-api only)

Compare two crystals by CSD refcode (PDD, k=50):

df = amd.compare('DEBXIT01', 'DEBXIT02', by='PDD', k=50)

Compare everything in a refcode family (AMD, k=100):

df = amd.compare('DEBXIT', families=True)
amd.compare.EMD(pdd: numpy.ndarray, pdd_: numpy.ndarray, metric: Optional[str] = 'chebyshev', return_transport: Optional[bool] = False, **kwargs)

Earth mover’s distance (EMD) between two PDDs, also known as the Wasserstein metric.

Parameters
  • pdd (numpy.ndarray) – PDD of a crystal.

  • pdd_ (numpy.ndarray) – PDD of a crystal.

  • metric (str or callable, default 'chebyshev') – EMD between PDDs requires defining a distance between PDD rows. By default, Chebyshev (L-infinity) distance is chosen as with AMDs. Accepts any metric accepted by scipy.spatial.distance.cdist().

  • return_transport (bool, default False) – Return a tuple (distance, transport_plan) with the optimal transport.

Returns

emd – Earth mover’s distance between two PDDs.

Return type

float

Raises

ValueError – Thrown if pdd and pdd_ do not have the same number of columns (k value).

amd.compare.AMD_cdist(amds: Union[numpy.ndarray, List[numpy.ndarray]], amds_: Union[numpy.ndarray, List[numpy.ndarray]], metric: str = 'chebyshev', low_memory: bool = False, **kwargs) numpy.ndarray

Compare two sets of AMDs with each other, returning a distance matrix. This function is essentially identical to scipy.spatial.distance.cdist() with the default metric chebyshev.

Parameters
  • amds (array_like) – A list of AMDs.

  • amds_ (array_like) – A list of AMDs.

  • metric (str or callable, default 'chebyshev') – Usually AMDs are compared with the Chebyshev (L-infinitys) distance. Can take any metric accepted by scipy.spatial.distance.cdist().

  • low_memory (bool, default False) – Use a slower but more memory efficient method for large collections of AMDs (Chebyshev metric only).

Returns

dm – A distance matrix shape (len(amds), len(amds_)). dm[ij] is the distance (given by metric) between amds[i] and amds[j].

Return type

numpy.ndarray

amd.compare.AMD_pdist(amds: Union[numpy.ndarray, List[numpy.ndarray]], metric: str = 'chebyshev', low_memory: bool = False, **kwargs) numpy.ndarray

Compare a set of AMDs pairwise, returning a condensed distance matrix. This function is essentially identical to scipy.spatial.distance.pdist() with the default metric chebyshev.

Parameters
  • amds (array_like) – An array/list of AMDs.

  • metric (str or callable, default 'chebyshev') – Usually AMDs are compared with the Chebyshev (L-infinity) distance. Can take any metric accepted by scipy.spatial.distance.pdist().

  • low_memory (bool, default False) – Optionally use a slightly slower but more memory efficient method for large collections of AMDs (Chebyshev metric only).

Returns

Returns a condensed distance matrix. Collapses a square distance matrix into a vector, just keeping the upper half. See scipy.spatial.distance.squareform() to convert to a square distance matrix or for more on condensed distance matrices.

Return type

numpy.ndarray

amd.compare.PDD_cdist(pdds: List[numpy.ndarray], pdds_: List[numpy.ndarray], metric: str = 'chebyshev', n_jobs=None, verbose=0, **kwargs) numpy.ndarray

Compare two sets of PDDs with each other, returning a distance matrix.

Parameters
  • pdds (List[numpy.ndarray]) – A list of PDDs.

  • pdds_ (List[numpy.ndarray]) – A list of PDDs.

  • metric (str or callable, default 'chebyshev') – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric accepted by scipy.spatial.distance.cdist().

  • n_jobs (int, default None) – Maximum number of concurrent jobs for parallel processing with joblib. Set to -1 to use the maximum possible. Note that for small inputs (< 100), using parallel processing may be slower than the default n_jobs=None.

  • verbose (int, default 0) – The verbosity level. Higher = more verbose, see joblib.Parallel.

Returns

Returns a distance matrix shape (len(pdds), len(pdds_)). The \(ij\) th entry is the distance between pdds[i] and pdds_[j] given by Earth mover’s distance.

Return type

numpy.ndarray

amd.compare.PDD_pdist(pdds: List[numpy.ndarray], metric: str = 'chebyshev', n_jobs=None, verbose=0, **kwargs) numpy.ndarray

Compare a set of PDDs pairwise, returning a condensed distance matrix.

Parameters
  • pdds (List[numpy.ndarray]) – A list of PDDs.

  • metric (str or callable, default 'chebyshev') – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric accepted by scipy.spatial.distance.pdist().

  • n_jobs (int, default None) – Maximum number of concurrent jobs for parallel processing with joblib. Set to -1 to use the maximum possible. Note that for small inputs (< 100), using parallel processing may be slower than the default n_jobs=None.

  • verbose (int, default 0) – The verbosity level. Higher = more verbose, see joblib.Parallel for more.

Returns

Returns a condensed distance matrix. Collapses a square distance matrix into a vector just keeping the upper half. See scipy.spatial.distance.squareform() to convert to a square distance matrix or for more on condensed distance matrices.

Return type

numpy.ndarray

amd.compare.emd(pdd: numpy.ndarray, pdd_: numpy.ndarray, metric: Optional[str] = 'chebyshev', return_transport: Optional[bool] = False, **kwargs)

Alias for amd.EMD().