Classification Benchmarks

Weaviate on Stackoverflow badge Weaviate issues on Github badge Weaviate total Docker pulls badge

💡 You are looking at older or release candidate documentation. The current Weaviate version is v1.15.1

This page contains various released and unreleased contextionary versions run through classification benchmarks. This is meant to give an initial impression of where the strengths and weaknesses of various versions lie.


Model Version Details

Benchmarks on this page refer to a specific model version, details about each version are contained here:

model versiontraining algoinput sourcesinput sizedimensionswindow sizerelease status
en0.14.0GloVeWiki~14G60015released
en0.16.0GloVeWiki, CommonCrawl~1000G30015released
en0.17.0fasttextWiki~14G30015not released (yet)
en0.17.1fasttextWiki~14G3005not released (yet)
en0.17.2fasttextWiki, CommonCrawl~1000G30015training aborted!
en0.17.3fasttextWiki, CommonCrawl~100G30015not released (yet)

Benchmarks (KNN)

Enron Emails (Subset kaminski-v)

  • Source Repo: semi-technologies/enron-email-classification
  • Current best: en0.14.0 at k=1
contextionarydimensionsk=1k=3k=5k=8k=13k=21
en0.14.0-v0.4.960074%72%71%70%67%63%
en0.16.0-v0.4.930072%70%69%69%65%64%
en0.17.0-v0.4.1530068%68%67%64%63%60%
en0.17.1-v0.4.1530070%68%68%66%64%62%
en0.17.3-v0.4.1530072%70%70%69%66%64%

20 Newsgroups

  • Size: 60 per category
  • Source Repo semi-technologies/20news-classification

Main Category (6 Categories)

  • Current best: en0.17.3 at k=5
contextionarydimensionsk=1k=3k=5k=8k=13k=21
en0.14.0-v0.4.1560076%73%72%74%74%70%
en0.16.0-v0.4.1530083%82%80%82%82%82%
en0.17.0-v0.4.1530078%80%77%77%73%72%
en0.17.1-v0.4.1530077%77%78%77%73%73%
en0.17.3-v0.4.1530083%84%85%82%81%80%

Fine Category (20 Categories)

  • Current best: en0.17.3 at k=1
contextionarydimensionsk=1k=3k=5k=8k=13k=21
en0.14.0-v0.4.1560057%53%53%50%48%46%
en0.16.0-v0.4.1530062%60%57%60%61%59%
en0.17.0-v0.4.1530057%57%56%56%56%51%
en0.17.1-v0.4.1530056%54%55%58%54%53%
en0.17.3-v0.4.1530066%64%64%61%62%61%

Benchmarks (contextual)

20 Newsgroups

  • Size: 60 per category
  • Source Repo: semi-technologies/20news-classification
  • Warning: Take these results (20-news contextual) with a grain of salt, they are not currently testing the best possible hyper-parameters, but just a specific configuration that worked well in the past. TODO: Improve benchmark to test various hyper-parameters

Main Category (6 Categories)

  • Current best: en0.14.0
contextionarydimensionsresult
en0.14.0-v0.4.1560054%
en0.16.0-v0.4.1530050%
en0.17.0-v0.4.1530050%
en0.17.1-v0.4.1530050%
en0.17.3-v0.4.1530050%

Fine Category (20 Categories)

  • Current best: en0.16.0
contextionarydimensionsresult
en0.14.0-v0.4.1560044%
en0.16.0-v0.4.1530056%
en0.17.0-v0.4.1530044%
en0.17.0-v0.4.1530043%
en0.17.3-v0.4.1530050%

More Resources

If you can’t find the answer to your question here, please look at the:

  1. Knowledge base of old issues. Or,
  2. For questions: Stackoverflow. Or,
  3. For issues: Github. Or,
  4. Ask your question in the Slack channel: Slack.