text-embedding-ada-002

  • Better: it outperforms prior OpenAI models on most benchmark tasks.

  • Simpler: a single model for both search and similarity tasks across both text and code.

  • Able to read 4x more: it can embed up to 8,191 tokens (roughly ~10 pages) vs. 2,046 previously.

MODEL GENERATION TOKENIZER MAX INPUT TOKENS KNOWLEDGE CUTOFF
V2 cl100k_base 8191 Sep 2021
V1 GPT-2/GPT-3 2046 Aug 2020
MODEL NAME TOKENIZER MAX INPUT TOKENS OUTPUT DIMENSIONS
text-embedding-ada-002 cl100k_base 8191 1536