amd.compare module

Functions for comparing AMDs or PDDs, finding nearest neighbours and minimum spanning trees.

amd.compare.emd(pdd, pdd_, metric='chebyshev', return_transport=False, **kwargs)

Earth mover’s distance between two PDDs.

Parameters
  • pdd (ndarray) – A PDD given by calculate.PDD().

  • pdd_ (ndarray) – A PDD given by calculate.PDD().

  • metric (str or callable, optional) – Usually rows are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by scipy.spatial.distance.cdist.

Returns

Earth mover’s distance between PDDs, where rows of the PDDs are compared with metric.

Return type

float

Raises

ValueError – Thrown if reference and comparison do not have the same number of columns.

amd.compare.AMD_cdist(amds: Union[numpy.ndarray, List[numpy.ndarray]], amds_: Union[numpy.ndarray, List[numpy.ndarray]], k: Optional[int] = None, metric: str = 'chebyshev', low_memory: bool = False, **kwargs) numpy.ndarray

Compare two sets of AMDs with each other, returning a distance matrix.

Parameters
  • amds (array_like) – An array/list of AMDs.

  • amds_ (array_like) – An array/list of AMDs.

  • k (int, optional) – If None, compare entire AMDs. Set k to an int to compare for a specific k (less than the maximum).

  • low_memory (bool, optional) – Optionally use a slower but more memory efficient method for large collections of AMDs (Chebyshev/l-inf distance only).

  • metric (str or callable, optional) – Usually AMDs are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by scipy.spatial.distance.cdist.

Returns

Returns a distance matrix shape (len(amds), len(amds_)). The \(ij\) th entry is the distance between amds[i] and amds[j] given by the metric.

Return type

ndarray

amd.compare.AMD_pdist(amds: Union[numpy.ndarray, List[numpy.ndarray]], k: Optional[int] = None, low_memory: bool = False, metric: str = 'chebyshev', **kwargs) numpy.ndarray

Compare a set of AMDs pairwise, returning a condensed distance matrix.

Parameters
  • amds (array_like) – An array/list of AMDs.

  • k (int, optional) – If None, compare whole AMDs (largest k). Set k to an int to compare for a specific k (less than the maximum).

  • low_memory (bool, optional) – Optionally use a slightly slower but more memory efficient method for large collections of AMDs (Chebyshev/l-inf distance only).

  • metric (str or callable, optional) – Usually AMDs are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by scipy.spatial.distance.cdist.

Returns

Returns a condensed distance matrix. Collapses a square distance matrix into a vector just keeping the upper half. Use scipy.spatial.distance.squareform to convert to a square distance matrix.

Return type

ndarray

amd.compare.PDD_cdist(pdds: List[numpy.ndarray], pdds_: List[numpy.ndarray], k: Optional[int] = None, metric: str = 'chebyshev', verbose: bool = False, **kwargs) numpy.ndarray

Compare two sets of PDDs with each other, returning a distance matrix.

Parameters
  • pdds (list of ndarrays) – A list of PDDs.

  • pdds_ (list of ndarrays) – A list of PDDs.

  • k (int, optional) – If None, compare whole PDDs (largest k). Set k to an int to compare for a specific k (less than the maximum).

  • metric (str or callable, optional) – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by scipy.spatial.distance.cdist.

  • verbose (bool, optional) – Optionally print an ETA to terminal as large collections can take some time.

Returns

Returns a distance matrix shape (len(pdds), len(pdds_)). The \(ij\) th entry is the distance between pdds[i] and pdds_[j] given by Earth mover’s distance.

Return type

ndarray

amd.compare.PDD_pdist(pdds: List[numpy.ndarray], k: Optional[int] = None, metric: str = 'chebyshev', verbose: bool = False, **kwargs) numpy.ndarray

Compare a set of PDDs pairwise, returning a condensed distance matrix.

Parameters
  • pdds (list of ndarrays) – A list of PDDs.

  • k (int, optional) – If None, compare whole PDDs (largest k). Set k to an int to compare for a specific k (less than the maximum).

  • metric (str or callable, optional) – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by scipy.spatial.distance.cdist.

  • verbose (bool, optional) – Optionally print an ETA to terminal as large collections can take some time.

Returns

Returns a condensed distance matrix. Collapses a square distance matrix into a vector just keeping the upper half. Use scipy.spatial.distance.squareform to convert to a square distance matrix.

Return type

ndarray

amd.compare.filter(n: int, pdds: List[numpy.ndarray], pdds_: Optional[List[numpy.ndarray]] = None, k: Optional[int] = None, low_memory: bool = False, metric: str = 'chebyshev', verbose: bool = False, **kwargs) Tuple[numpy.ndarray, numpy.ndarray]

For each item in pdds, get the n nearest items in pdds_ by AMD, then compare references to these nearest items with PDDs. Tries to comprimise between the speed of AMDs and the accuracy of PDDs.

If pdds_ is None, this essentially sets pdds_ = pdds, i.e. do an ‘AMD neighbourhood graph’ for one set whose weights are PDD distances.

Parameters
  • n (int) – Number of nearest neighbours to find.

  • pdds (list of ndarrays) – A list of PDDs.

  • pdds_ (list of ndarrays, optional) – A list of PDDs.

  • k (int, optional) – If None, compare entire PDDs. Set k to an int to compare for a specific k (less than the maximum).

  • low_memory (bool, optional) – Optionally use a slightly slower but more memory efficient method for large collections of AMDs (Chebyshev/l-inf distance only).

  • metric (str or callable, optional) – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by scipy.spatial.distance.cdist.

  • verbose (bool, optional) – Optionally print an ETA to terminal as large collections can take some time.

Returns

For the \(i\) th item in reference and some \(j<n\), distance_matrix[i][j] is the distance from reference i to its j-th nearest neighbour in comparison (after the AMD filter). indices[i][j] is the index of said neighbour in pdds_.

Return type

tuple of ndarrays (distance_matrix, indices)

amd.compare.AMD_mst(amds: Union[int, float, complex, str, bytes, numpy.generic, Sequence[Union[int, float, complex, str, bytes, numpy.generic]], Sequence[Sequence[Any]], numpy.typing._array_like._SupportsArray], k: Optional[int] = None, low_memory: bool = False, metric: str = 'chebyshev', **kwargs) List[Tuple[int, int, float]]

Return list of edges in a minimum spanning tree based on AMDs.

Parameters
  • amds (ndarray or list of ndarrays) – An array/list of AMDs.

  • k (int, optional) – If None, compare whole PDDs (largest k). Set k to an int to compare for a specific k (less than the maximum).

  • metric (str or callable, optional) – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by scipy.spatial.distance.cdist.

Returns

Each tuple (i,j,w) is an edge in the mimimum spanning tree, where i and j are the indices of nodes and w is the AMD distance.

Return type

list of tuples

amd.compare.PDD_mst(pdds: List[numpy.ndarray], amd_filter_cutoff: Optional[int] = None, k: Optional[int] = None, metric: str = 'chebyshev', verbose: bool = False, **kwargs) List[Tuple[int, int, float]]

Return list of edges in a minimum spanning tree based on PDDs.

Parameters
  • pdds (list of ndarrays) – A list of PDDs.

  • amd_filter_cutoff (int, optional) – If specified, apply the AMD filter behaviour of filter(). This is the n passed to filter(), the number of neighbours to connect in the neighbourhood graph.

  • k (int, optional) – If None, compare whole PDDs (largest k). Set k to an int to compare for a specific k (less than the maximum).

  • metric (str or callable, optional) – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by scipy.spatial.distance.cdist.

  • verbose (bool, optional) – Optionally print an ETA to terminal as large collections can take some time.

Returns

Each tuple (i,j,w) is an edge in the mimimum spanning tree, where i and j are the indices of nodes and w is the PDD distance.

Return type

list of tuples

amd.compare.neighbours_from_distance_matrix(n: int, dm: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray]

Given a distance matrix, find the n nearest neighbours of each item.

Parameters
  • n (int) – Number of nearest neighbours to find for each item.

  • dm (ndarray) – 2D distance matrix or 1D condensed distance matrix.

Returns

For item i, nn_dm[i][j] is the distance from item i to its j+1 st nearest neighbour, and inds[i][j] is the index of this neighbour (j+1 since index 0 is the first nearest neighbour).

Return type

tuple of ndarrays (nn_dm, inds)

amd.compare.mst_from_distance_matrix(dm)