Better: it outperforms prior OpenAI models on most benchmark tasks.
Simpler: a single model for both search and similarity tasks across both text and code.
Able to read 4x more: it can embed up to 8,191 tokens (roughly ~10 pages) vs. 2,046 previously.
| MODEL GENERATION | TOKENIZER | MAX INPUT TOKENS | KNOWLEDGE CUTOFF |
|---|---|---|---|
| V2 | cl100k_base | 8191 | Sep 2021 |
| V1 | GPT-2/GPT-3 | 2046 | Aug 2020 |
| MODEL NAME | TOKENIZER | MAX INPUT TOKENS | OUTPUT DIMENSIONS |
|---|---|---|---|
| text-embedding-ada-002 | cl100k_base | 8191 | 1536 |