To enable Glean Generative AI features, you need to select a LLM provider. We recommend using Glean's Azure OpenAI key. We also provide the option of using your own OpenAI key or Google's PaLM 2 and Gemini 1.5 family of models.
This can can be setup in Workspace Settings. To use your own LLM provider, please work with your Glean sales or technical services person.
Option 1: Glean’s Azure OpenAI key (Recommended)
Glean has a signed agreement with Azure OpenAI promising:
0-day retention: Your data will not be stored by Azure.
Data will not be used to train any custom large language models.
Data encryption: All data is encrypted in transit.
Compliance: Azure is compliant with a variety of industry standards. See details here
The advantages of this approach are
Guaranteed capacity: Scale to all your users
Performance: Low latency
Access to all the necessary models out of the box.
0-day retention isn’t the default on Azure Open AI.
Option 2: Your own OpenAI or Azure OpenAI key
If you prefer to use your company’s own provisioned OpenAI or Azure OpenAI key, you should ensure your key has sufficient:
Access: We require access to the GPT-4, GPT-3.5-Turbo, and text-ada-embedding-002 models. We can support GPT-4-32k if configured. To get access on Azure, fill out the Azure OpenAI Service form.
Capacity: We suggest that you first talk to your Azure OpenAI sales representative for guidance on how to optimally request capacity (e.g. what regions are likely to have availability and how much) and then file a quota increase request.
For GPT-4, our capacity requirement is 10 RPM (requests per minute) and 40k TPM (tokens per minute) for every 500 users.
For text-ada-embedding-002, our capacity requirement is 75 RPM and 11.25k TPM for every 500 users.
Capacity Requirements
Users | GPT-4 RPM | GPT-4 TPM | text-ada RPM | text-ada TPM |
500 | 10 | 40,000 | 75 | 11,250 |
1,000 | 20 | 80,000 | 150 | 22,500 |
5,000 | 100 | 400,000 | 750 | 112,500 |
10,000 | 175 | 550,000 | 1,300 | 200,000 |
20,000 | 250 | 800,000 | 1,875 | 280,000 |
Option 3: Google LLMs
Please reach out to the Glean team if you want to configure Google as your LLM provider. We currently support the PaLM 2 family of models (text-unicorn-001, text-bison-001, text-bison-32k, and gecko models) and Gemini 1.5 models. These models are called directly from your project and the data does not leave your deployment.
Access: The Google models must be running in the same GCP project as Glean since we use Application Default Credentials
Capacity: You will need to file a standard GCP quota request to ensure you have enough capacity.
For text-unicorn, our capacity requirement is 10 RPM (requests per minute) for every 500 users.
For gecko, our capacity requirement is 75 RPM for every 500 users.
Capacity Requirements
Users | text-unicorn RPM | gecko RPM |
500 | 10 | 75 |
1,000 | 20 | 150 |
5,000 | 100 | 750 |
10,000 | 175 | 1,300 |
20,000 | 250 | 1,875 |
Other LLMs
We also support Anthropic's Claude 3 Sonnet. We constantly evaluate other language models like Cohere and various other open source models. We will evaluate Gemini Ultra when it becomes generally available.