# Distance metrics

## Available distance metrics

If not specified explicitly, the default distance metric in Weaviate is
`cosine`

. It can be set in the vectorIndexConfig field as part of the schema (here's an example adding a class to the schema) to any of the following types:

In all cases, larger distance values indicate lower similarity. Conversely, smaller distance values indicate higher similarity.

Name | Description | Definition | Range | Examples |
---|---|---|---|---|

`cosine` | Cosine (angular) distance. _{[See note 1 below]} | `1 - cosine_sim(a,b)` | `0 <= d <= 2` | `0` : identical vectors`2` : Opposing vectors. |

`dot` | A dot product-based indication of distance. More precisely, the negative dot product. _{[See note 2 below]} | `-dot(a,b)` | `-∞ < d < ∞` | `-3` : more similar than `-2` `2` : more similar than `5` |

`l2-squared` | The squared euclidean distance between two vectors. | `sum((a_i - b_i)^2)` | `0 <= d < ∞` | `0` : identical vectors |

`hamming` | Number of differences between vectors at each dimensions. | `sum(|a_i != b_i|)` | `0 <= d < ∞` | `0` : identical vectors |

`manhattan` | The distance between two vector dimensions measured along axes at right angles. | `sum(|a_i - b_i|)` | `0 <= d < dims` | `0` : identical vectors |

If you're missing your favorite distance type and would like to contribute it to Weaviate, we'd be happy to review your PR.

- If
`cosine`

is chosen, all vectors are normalized to length 1 at import/read time and dot product is used to calculate the distance for computational efficiency. - Dot Product on its own is a similarity metric, not a distance metric. As a result, Weaviate returns the negative dot product to stick with the intuition that a smaller value of a distance indicates a more similar result and a higher distance value indicates a less similar result.

### Distance implementations and optimizations

On a typical Weaviate use case the largest portion of CPU time is spent calculating vector distances. Even with an approximate nearest neighbor index - which leads to far fewer calculations - the efficiency of distance calculations has a major impact on overall performance.

You can use the following overview to find the best possible combination of distance metric and CPU architecture / instruction set.

Distance | `linux/amd64 AVX2` | `darwin/amd64 AVX2` | `linux/amd64 AVX512` | `linux/arm64` | `darwin/arm64` |
---|---|---|---|---|---|

`cosine` | optimized | optimized | no SIMD | no SIMD | no SIMD |

`dot` | optimized | optimized | no SIMD | optimized From `v1.21` | optimized From `v1.21` |

`l2-squared` | optimized | optimized | no SIMD | optimized From `v1.21` | optimized From `v1.21` |

`hamming` | no SIMD | no SIMD | no SIMD | no SIMD | no SIMD |

`manhattan` | no SIMD | no SIMD | no SIMD | no SIMD | no SIMD |

If you like dealing with Assembly programming, SIMD, and vector instruction sets we would love to receive your contribution for one of the combinations that have not yet received an SIMD-specific optimization.

### Distance fields in the APIs

The `distance`

is exposed in the APIs in two ways:

- Whenever a vector search is involved, the distance can be displayed as part of the results, for example using
`_additional { distance }`

- Whenever a vector search is involved, the distance can be specified as a limiting criterion, for example using
`nearVector({distance: 1.5, vector: ... })`

Note: The `distance`

field was introduced in `v1.14.0`

. In previous versions, only `certainty`

(see below) was available.

### Distance vs Certainty

Prior to version `v1.14`

only `certainty`

was available in the APIs. The
original ideas behind certainty was to normalize the distance score into a
value between `0 <= certainty <= 1`

, where 1 would represent identical vectors
and 0 would represent opposite vectors.

This concept is however unique to `cosine`

distance. With other distance
metrics, scores may be unbounded. As a result the preferred way is to use
`distance`

in favor of `certainty`

.

For backward compatibility, `certainty`

can still be used when the distance is
`cosine`

. If any other distance is selected `certainty`

cannot be used.