scipy.spatial.distance.

- scipy.spatial.distance.pdist(
*X*,*metric='euclidean'*,***,*out=None*,***kwargs*)[source]# Pairwise distances between observations in n-dimensional space.

See Notes for common calling conventions.

- Parameters:
**X**array_likeAn m by n array of m original observations in ann-dimensional space.

**metric**str or function, optionalThe distance metric to use. The distance function canbe ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’,‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’,‘jaccard’, ‘jensenshannon’, ‘kulczynski1’,‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’,‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’,‘sqeuclidean’, ‘yule’.

**out**ndarray, optionalThe output array.If not None, condensed distance matrix Y is stored in this array.

****kwargs**dict, optionalExtra arguments to

*metric*: refer to each metric documentation for alist of all possible arguments.Some possible arguments:

p : scalarThe p-norm to apply for Minkowski, weighted and unweighted.Default: 2.

w : ndarrayThe weight vector for metrics that support weights (e.g., Minkowski).

V : ndarrayThe variance vector for standardized Euclidean.Default: var(X, axis=0, ddof=1)

VI : ndarrayThe inverse of the covariance matrix for Mahalanobis.Default: inv(cov(X.T)).T

- Returns:
**Y**ndarrayReturns a condensed distance matrix Y. For each \(i\) and \(j\)(where \(i<j<m\)),where m is the number of original observations.The metric

`dist(u=X[i], v=X[j])`

is computed and stored in entry`m* i + j - ((i + 2) * (i + 1)) // 2`

.

See also

- squareform
converts between condensed distance matrices and square distance matrices.

Notes

See

`squareform`

for information on how to calculate the index ofthis entry or to convert the condensed distance matrix to aredundant square matrix.The following are common calling conventions.

`Y = pdist(X, 'euclidean')`

Computes the distance between m points using Euclidean distance(2-norm) as the distance metric between the points. The pointsare arranged as m n-dimensional row vectors in the matrix X.

`Y = pdist(X, 'minkowski', p=2.)`

Computes the distances using the Minkowski distance\(\|u-v\|_p\) (\(p\)-norm) where \(p > 0\) (notethat this is only a quasi-metric if \(0 < p < 1\)).

`Y = pdist(X, 'cityblock')`

Computes the city block or Manhattan distance between thepoints.

`Y = pdist(X, 'seuclidean', V=None)`

See AlsoHow Many Significant Figures in 2.23606798?Simplify: x2+2.23606798x+1=0 Tiger Algebra SolverComputes the standardized Euclidean distance. The standardizedEuclidean distance between two n-vectors

`u`

and`v`

is\[\sqrt{\sum {(u_i-v_i)^2 / V[x_i]}}\]

V is the variance vector; V[i] is the variance computed over allthe i’th components of the points. If not passed, it isautomatically computed.

`Y = pdist(X, 'sqeuclidean')`

Computes the squared Euclidean distance \(\|u-v\|_2^2\) betweenthe vectors.

`Y = pdist(X, 'cosine')`

Computes the cosine distance between vectors u and v,

\[1 - \frac{u \cdot v} {{\|u\|}_2 {\|v\|}_2}\]

where \(\|*\|_2\) is the 2-norm of its argument

`*`

, and\(u \cdot v\) is the dot product of`u`

and`v`

.`Y = pdist(X, 'correlation')`

Computes the correlation distance between vectors u and v. This is

\[1 - \frac{(u - \bar{u}) \cdot (v - \bar{v})} {{\|(u - \bar{u})\|}_2 {\|(v - \bar{v})\|}_2}\]

where \(\bar{v}\) is the mean of the elements of vector v,and \(x \cdot y\) is the dot product of \(x\) and \(y\).

`Y = pdist(X, 'hamming')`

Computes the normalized Hamming distance, or the proportion ofthose vector elements between two n-vectors

`u`

and`v`

which disagree. To save memory, the matrix`X`

can be of typeboolean.`Y = pdist(X, 'jaccard')`

Computes the Jaccard distance between the points. Given twovectors,

`u`

and`v`

, the Jaccard distance is theproportion of those elements`u[i]`

and`v[i]`

thatdisagree.`Y = pdist(X, 'jensenshannon')`

Computes the Jensen-Shannon distance between two probability arrays.Given two probability vectors, \(p\) and \(q\), theJensen-Shannon distance is

\[\sqrt{\frac{D(p \parallel m) + D(q \parallel m)}{2}}\]

where \(m\) is the pointwise mean of \(p\) and \(q\)and \(D\) is the Kullback-Leibler divergence.

`Y = pdist(X, 'chebyshev')`

Computes the Chebyshev distance between the points. TheChebyshev distance between two n-vectors

`u`

and`v`

is themaximum norm-1 distance between their respective elements. Moreprecisely, the distance is given by\[d(u,v) = \max_i {|u_i-v_i|}\]

`Y = pdist(X, 'canberra')`

Computes the Canberra distance between the points. TheCanberra distance between two points

`u`

and`v`

is\[d(u,v) = \sum_i \frac{|u_i-v_i|} {|u_i|+|v_i|}\]

`Y = pdist(X, 'braycurtis')`

Computes the Bray-Curtis distance between the points. TheBray-Curtis distance between two points

`u`

and`v`

is\[d(u,v) = \frac{\sum_i {|u_i-v_i|}} {\sum_i {|u_i+v_i|}}\]

`Y = pdist(X, 'mahalanobis', VI=None)`

Computes the Mahalanobis distance between the points. TheMahalanobis distance between two points

`u`

and`v`

is\(\sqrt{(u-v)(1/V)(u-v)^T}\) where \((1/V)\) (the`VI`

variable) is the inverse covariance. If`VI`

is not None,`VI`

will be used as the inverse covariance matrix.`Y = pdist(X, 'yule')`

Computes the Yule distance between each pair of booleanvectors. (see yule function documentation)

`Y = pdist(X, 'matching')`

Synonym for ‘hamming’.

`Y = pdist(X, 'dice')`

Computes the Dice distance between each pair of booleanvectors. (see dice function documentation)

`Y = pdist(X, 'kulczynski1')`

Computes the kulczynski1 distance between each pair ofboolean vectors. (see kulczynski1 function documentation)

`Y = pdist(X, 'rogerstanimoto')`

Computes the Rogers-Tanimoto distance between each pair ofboolean vectors. (see rogerstanimoto function documentation)

`Y = pdist(X, 'russellrao')`

Computes the Russell-Rao distance between each pair ofboolean vectors. (see russellrao function documentation)

`Y = pdist(X, 'sokalmichener')`

Computes the Sokal-Michener distance between each pair ofboolean vectors. (see sokalmichener function documentation)

`Y = pdist(X, 'sokalsneath')`

Computes the Sokal-Sneath distance between each pair ofboolean vectors. (see sokalsneath function documentation)

`Y = pdist(X, 'kulczynski1')`

Computes the Kulczynski 1 distance between each pair ofboolean vectors. (see kulczynski1 function documentation)

`Y = pdist(X, f)`

Computes the distance between all pairs of vectors in Xusing the user supplied 2-arity function f. For example,Euclidean distance between the vectors could be computedas follows:

dm = pdist(X, lambda u, v: np.sqrt(((u-v)**2).sum()))

Note that you should avoid passing a reference to one ofthe distance functions defined in this library. For example,:

dm = pdist(X, sokalsneath)

would calculate the pair-wise distances between the vectors inX using the Python function sokalsneath. This would result insokalsneath being called \({n \choose 2}\) times, whichis inefficient. Instead, the optimized C version is moreefficient, and we call it using the following syntax.:

dm = pdist(X, 'sokalsneath')

Examples

>>> import numpy as np>>> from scipy.spatial.distance import pdist

`x`

is an array of five points in three-dimensional space.>>> x = np.array([[2, 0, 2], [2, 2, 3], [-2, 4, 5], [0, 1, 9], [2, 2, 4]])

`pdist(x)`

with no additional arguments computes the 10 pairwiseEuclidean distances:>>> pdist(x)array([2.23606798, 6.40312424, 7.34846923, 2.82842712, 4.89897949, 6.40312424, 1. , 5.38516481, 4.58257569, 5.47722558])

The following computes the pairwise Minkowski distances with

`p = 3.5`

:>>> pdist(x, metric='minkowski', p=3.5)array([2.04898923, 5.1154929 , 7.02700737, 2.43802731, 4.19042714, 6.03956994, 1. , 4.45128103, 4.10636143, 5.0619695 ])

The pairwise city block or Manhattan distances:

>>> pdist(x, metric='cityblock')array([ 3., 11., 10., 4., 8., 9., 1., 9., 7., 8.])