# Distance metrics

## Available distance metrics

If not specified explicitly, the default distance metric in Weaviate is
`cosine`

. It can be set in the vectorIndexConfig field as part of the schema (example) to any of the following types:

In all cases, larger distance values indicate lower similarity. Conversely, smaller distance values indicate higher similarity.

Name | Description | Definition | Range | Examples |
---|---|---|---|---|

`cosine` | Cosine (angular) distance. _{[See note 1 below]} | `1 - cosine_sim(a,b)` | `0 <= d <= 2` | `0` : identical vectors`2` : Opposing vectors. |

`dot` | A dot product-based indication of distance. More precisely, the negative dot product. _{[See note 2 below]} | `-dot(a,b)` | `-∞ < d < ∞` | `-3` : more similar than `-2` `2` : more similar than `5` |

`l2-squared` | The squared euclidean distance between two vectors. | `sum((a_i - b_i)^2)` | `0 <= d < ∞` | `0` : identical vectors |

`hamming` | Number of differences between vectors at each dimensions. | `sum(|a_i != b_i|)` | `0 <= d < ∞` | `0` : identical vectors |

`manhattan` | The distance between two vector dimensions measured along axes at right angles. | `sum(|a_i - b_i|)` | `0 <= d < dims` | `0` : identical vectors |

If you're missing your favorite distance type and would like to contribute it to Weaviate, we'd be happy to review your PR.

- If
`cosine`

is chosen, all vectors are normalized to length 1 at read time and dot product is used to calculate the distance for computational efficiency. - Dot Product on its own is a similarity metric, not a distance metric. As a result, Weaviate returns the negative dot product to stick with the intuition that a smaller value of a distance indicates a more similar result and a higher distance value indicates a less similar result.

### Distance implementations and optimizations

On a typical Weaviate use case the largest portion of CPU time is spent calculating vector distances. Even with an approximate nearest neighbor index - which leads to far fewer calculations - the efficiency of distance calculations has a major impact on overall performance.

Weaviate uses SIMD (Single Instruction, Multiple Data) instructions for the following distance metrics and architectures. The available optimizations are resolved in the shown order (e.g. SVE -> Neon).

Distance | `arm64` | `amd64` |
---|---|---|

`cosine` , `dot` , `l2-squared` | SVE or Neon | Sapphire Rapids with AVX512, or Any with AVX2 |

`hamming` , `manhattan` | No SIMD | No SIMD |

If you like dealing with Assembly programming, SIMD, and vector instruction sets we would love to receive your contribution for one of the combinations that have not yet received an SIMD-specific optimization.

### Distance fields in the APIs

The `distance`

is exposed in the APIs in two ways:

- Whenever a vector search is involved, the distance can be displayed as part of the results, for example using
`_additional { distance }`

- Whenever a vector search is involved, the distance can be specified as a limiting criterion, for example using
`nearVector({distance: 1.5, vector: ... })`

Note: The `distance`

field was introduced in `v1.14.0`

. In previous versions, only `certainty`

(see below) was available.

### Distance vs Certainty

Prior to version `v1.14`

only `certainty`

was available in the APIs. The
original ideas behind certainty was to normalize the distance score into a
value between `0 <= certainty <= 1`

, where 1 would represent identical vectors
and 0 would represent opposite vectors.

This concept is however unique to `cosine`

distance. With other distance
metrics, scores may be unbounded. As a result the preferred way is to use
`distance`

in favor of `certainty`

.

For backward compatibility, `certainty`

can still be used when the distance is
`cosine`

. If any other distance is selected `certainty`

cannot be used.

See also distance and certainty _additional{} properties.

## Questions and feedback

If you have any questions or feedback, let us know in the user forum.