amd.compare module
Functions for comparing AMDs or PDDs, finding nearest neighbours and minimum spanning trees.
- amd.compare.emd(pdd, pdd_, metric='chebyshev', return_transport=False, **kwargs)
Earth mover’s distance between two PDDs.
- Parameters
pdd (ndarray) – A PDD given by
calculate.PDD()
.pdd_ (ndarray) – A PDD given by
calculate.PDD()
.metric (str or callable, optional) – Usually rows are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by
scipy.spatial.distance.cdist
.
- Returns
Earth mover’s distance between PDDs, where rows of the PDDs are compared with
metric
.- Return type
float
- Raises
ValueError – Thrown if reference and comparison do not have the same number of columns.
- amd.compare.AMD_cdist(amds: Union[numpy.ndarray, List[numpy.ndarray]], amds_: Union[numpy.ndarray, List[numpy.ndarray]], k: Optional[int] = None, metric: str = 'chebyshev', low_memory: bool = False, **kwargs) numpy.ndarray
Compare two sets of AMDs with each other, returning a distance matrix.
- Parameters
amds (array_like) – An array/list of AMDs.
amds_ (array_like) – An array/list of AMDs.
k (int, optional) – If
None
, compare entire AMDs. Setk
to an int to compare for a specifick
(less than the maximum).low_memory (bool, optional) – Optionally use a slower but more memory efficient method for large collections of AMDs (Chebyshev/l-inf distance only).
metric (str or callable, optional) – Usually AMDs are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by
scipy.spatial.distance.cdist
.
- Returns
Returns a distance matrix shape
(len(amds), len(amds_))
. The \(ij\) th entry is the distance betweenamds[i]
andamds[j]
given by themetric
.- Return type
ndarray
- amd.compare.AMD_pdist(amds: Union[numpy.ndarray, List[numpy.ndarray]], k: Optional[int] = None, low_memory: bool = False, metric: str = 'chebyshev', **kwargs) numpy.ndarray
Compare a set of AMDs pairwise, returning a condensed distance matrix.
- Parameters
amds (array_like) – An array/list of AMDs.
k (int, optional) – If
None
, compare whole AMDs (largestk
). Setk
to an int to compare for a specifick
(less than the maximum).low_memory (bool, optional) – Optionally use a slightly slower but more memory efficient method for large collections of AMDs (Chebyshev/l-inf distance only).
metric (str or callable, optional) – Usually AMDs are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by
scipy.spatial.distance.cdist
.
- Returns
Returns a condensed distance matrix. Collapses a square distance matrix into a vector just keeping the upper half. Use
scipy.spatial.distance.squareform
to convert to a square distance matrix.- Return type
ndarray
- amd.compare.PDD_cdist(pdds: List[numpy.ndarray], pdds_: List[numpy.ndarray], k: Optional[int] = None, metric: str = 'chebyshev', verbose: bool = False, **kwargs) numpy.ndarray
Compare two sets of PDDs with each other, returning a distance matrix.
- Parameters
pdds (list of ndarrays) – A list of PDDs.
pdds_ (list of ndarrays) – A list of PDDs.
k (int, optional) – If
None
, compare whole PDDs (largestk
). Setk
to an int to compare for a specifick
(less than the maximum).metric (str or callable, optional) – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by
scipy.spatial.distance.cdist
.verbose (bool, optional) – Optionally print an ETA to terminal as large collections can take some time.
- Returns
Returns a distance matrix shape
(len(pdds), len(pdds_))
. The \(ij\) th entry is the distance betweenpdds[i]
andpdds_[j]
given by Earth mover’s distance.- Return type
ndarray
- amd.compare.PDD_pdist(pdds: List[numpy.ndarray], k: Optional[int] = None, metric: str = 'chebyshev', verbose: bool = False, **kwargs) numpy.ndarray
Compare a set of PDDs pairwise, returning a condensed distance matrix.
- Parameters
pdds (list of ndarrays) – A list of PDDs.
k (int, optional) – If
None
, compare whole PDDs (largestk
). Setk
to an int to compare for a specifick
(less than the maximum).metric (str or callable, optional) – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by
scipy.spatial.distance.cdist
.verbose (bool, optional) – Optionally print an ETA to terminal as large collections can take some time.
- Returns
Returns a condensed distance matrix. Collapses a square distance matrix into a vector just keeping the upper half. Use
scipy.spatial.distance.squareform
to convert to a square distance matrix.- Return type
ndarray
- amd.compare.filter(n: int, pdds: List[numpy.ndarray], pdds_: Optional[List[numpy.ndarray]] = None, k: Optional[int] = None, low_memory: bool = False, metric: str = 'chebyshev', verbose: bool = False, **kwargs) Tuple[numpy.ndarray, numpy.ndarray]
For each item in
pdds
, get then
nearest items inpdds_
by AMD, then compare references to these nearest items with PDDs. Tries to comprimise between the speed of AMDs and the accuracy of PDDs.If
pdds_
isNone
, this essentially setspdds_ = pdds
, i.e. do an ‘AMD neighbourhood graph’ for one set whose weights are PDD distances.- Parameters
n (int) – Number of nearest neighbours to find.
pdds (list of ndarrays) – A list of PDDs.
pdds_ (list of ndarrays, optional) – A list of PDDs.
k (int, optional) – If
None
, compare entire PDDs. Setk
to an int to compare for a specifick
(less than the maximum).low_memory (bool, optional) – Optionally use a slightly slower but more memory efficient method for large collections of AMDs (Chebyshev/l-inf distance only).
metric (str or callable, optional) – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by
scipy.spatial.distance.cdist
.verbose (bool, optional) – Optionally print an ETA to terminal as large collections can take some time.
- Returns
For the \(i\) th item in reference and some \(j<n\),
distance_matrix[i][j]
is the distance from reference i to its j-th nearest neighbour in comparison (after the AMD filter).indices[i][j]
is the index of said neighbour inpdds_
.- Return type
tuple of ndarrays (distance_matrix, indices)
- amd.compare.AMD_mst(amds: Union[int, float, complex, str, bytes, numpy.generic, Sequence[Union[int, float, complex, str, bytes, numpy.generic]], Sequence[Sequence[Any]], numpy.typing._array_like._SupportsArray], k: Optional[int] = None, low_memory: bool = False, metric: str = 'chebyshev', **kwargs) List[Tuple[int, int, float]]
Return list of edges in a minimum spanning tree based on AMDs.
- Parameters
amds (ndarray or list of ndarrays) – An array/list of AMDs.
k (int, optional) – If
None
, compare whole PDDs (largestk
). Setk
to an int to compare for a specifick
(less than the maximum).metric (str or callable, optional) – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by
scipy.spatial.distance.cdist
.
- Returns
Each tuple
(i,j,w)
is an edge in the mimimum spanning tree, wherei
andj
are the indices of nodes andw
is the AMD distance.- Return type
list of tuples
- amd.compare.PDD_mst(pdds: List[numpy.ndarray], amd_filter_cutoff: Optional[int] = None, k: Optional[int] = None, metric: str = 'chebyshev', verbose: bool = False, **kwargs) List[Tuple[int, int, float]]
Return list of edges in a minimum spanning tree based on PDDs.
- Parameters
pdds (list of ndarrays) – A list of PDDs.
amd_filter_cutoff (int, optional) – If specified, apply the AMD filter behaviour of
filter()
. This is then
passed tofilter()
, the number of neighbours to connect in the neighbourhood graph.k (int, optional) – If
None
, compare whole PDDs (largestk
). Setk
to an int to compare for a specifick
(less than the maximum).metric (str or callable, optional) – Usually PDD rows are compared with the Chebyshev/l-infinity distance. Can take any metric + kwargs accepted by
scipy.spatial.distance.cdist
.verbose (bool, optional) – Optionally print an ETA to terminal as large collections can take some time.
- Returns
Each tuple
(i,j,w)
is an edge in the mimimum spanning tree, wherei
andj
are the indices of nodes andw
is the PDD distance.- Return type
list of tuples
- amd.compare.neighbours_from_distance_matrix(n: int, dm: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray]
Given a distance matrix, find the
n
nearest neighbours of each item.- Parameters
n (int) – Number of nearest neighbours to find for each item.
dm (ndarray) – 2D distance matrix or 1D condensed distance matrix.
- Returns
For item
i
,nn_dm[i][j]
is the distance from itemi
to itsj+1
st nearest neighbour, andinds[i][j]
is the index of this neighbour (j+1
since index 0 is the first nearest neighbour).- Return type
tuple of ndarrays (nn_dm, inds)
- amd.compare.mst_from_distance_matrix(dm)