To enable Glean's Generative AI features, you need to select a LLM provider. We recommend using Glean's Azure OpenAI key. We also provide the option of using your own LLM key with Azure OpenAI, Google Vertex AI (if your deployment is on GCP), and Amazon Bedrock (if your deployment is on AWS).
This can can be setup in Admin console > Platform > Assistant. To use your own LLM provider, please work with your Glean sales or technical services person.
Option 1: Glean’s Azure OpenAI key (Recommended)
Glean has a signed agreement with Azure OpenAI promising:
0-day retention: Your data will not be stored by Azure.
Data will not be used to train any custom large language models.
Data encryption: All data is encrypted in transit.
Compliance: Azure is compliant with a variety of industry standards. See details here
The advantages of this approach are
Guaranteed capacity: Scale to all your users
Performance: Low latency
Access to all the necessary models out of the box.
0-day retention isn’t the default on Azure Open AI.
Option 2: Your own OpenAI or Azure OpenAI key
We currently support using either GPT-4o or GPT-4-Turbo as the primary model for query planning and answer generation.
Fill out the Azure OpenAI Service form and request access to the following models:
Model name | How Glean uses the model |
GPT-4o or GPT-4-Turbo | Query planning and answer generation |
GPT-4o-mini or GPT-3.5-Turbo | Tool selection and followup question generation |
Ada Embeddings | Match sentences in the generated answer with the retrieved search results to generate citations. |
Please see Azure OpenAI Service quotas and limits for the default quotas and instructions for requesting additional quota.
Option 3: Anthropic Claude models on Amazon Bedrock (AWS only)
Log into the AWS Console with a user who has permissions to subscribe to Bedrock models within the AWS account.
If your AWS instance is in one of the following regions, we will send the LLM traffic to Amazon Bedrock in the same region as your instance. If it is not in one of these regions, please contact the Glean team to configure your instance to send the LLM traffic to the nearest region where the models are supported.
us-east-1 (N. Virginia)
us-west-2 (Oregon)
ap-northeast-1 (Tokyo)
eu-central-1 (Frankfurt)
Go to Amazon Bedrock > Model access, select the same region as your Glean AWS instance (or the nearest supported region), and request access to these models:
Model name | How Glean uses the model |
Anthropic > Claude 3.5 Sonnet anthropic.claude-3-5-sonnet-20240620-v1:0 | Query planning and answer generation |
Anthropic > Claude 3 Haiku anthropic.claude-3-haiku-20240307-v1:0 | Tool selection and followup question generation |
Amazon > Titan Embeddings G1 - Text amazon.titan-embed-text-v1 | Match sentences in the generated answer with the retrieved search results to generate citations. |
If you are asked about the use case for the models, you can enter:
Generate answers to questions about internal company documents
Please see Amazon Bedrock Quotas for the default quotas for these models on pay-as-you-go. If you require additional quota, you will need to reach out to your AWS account manager. Bedrock does not provide a self-service method to increase quota at this time.
Glean does not currently support Provisioned Throughput on Bedrock, but we can work with you and the AWS team to enable this in the future.
Option 4: Anthropic Claude models on Google Vertex AI (GCP only)
Go to the Vertex AI Model Garden and make sure you have enabled access to the following foundation models from the GCP project that Glean is running in:
Model name | How Glean uses the model |
claude-3-5-sonnet@20240620 | Query planning and answer generation |
claude-3-haiku@20240307 | Tool selection and followup question generation |
textembedding-gecko@002 | Match sentences in the generated answer with the retrieved search results to generate citations. |
You will need to file a standard GCP quota request, which is expressed in Requests Per Minute (RPM) and Tokens Per Minute (TPM). Filter for “base_model:” on the model name (anthropic-claude-3-5-sonnet, anthropic-claude-3-haiku, textembedding-gecko) and “region:” for the region that your GCP project is running in.
Please note that the quota is not a guarantee of capacity, but is intended by Google to ensure fair use of the shared capacity, and your requests may not be served during peak periods. To obtain guaranteed capacity, please speak with your Google account team about purchasing Provisioned Throughput.
Option 5: Gemini models on Google Vertex AI (GCP only)
Go to the Vertex AI Model Garden and make sure you have enabled access to the following foundation models from the GCP project that Glean is running in:
Model name | How Glean uses the model |
gemini-1.5-pro-002 | Query planning and answer generation |
gemini-1.5-flash-002 | Tool selection and followup question generation |
textembedding-gecko@002 | Match sentences in the generated answer with the retrieved search results to generate citations. |
You will need to file a standard GCP quota request, which is expressed in Requests Per Minute (RPM) and Tokens Per Minute (TPM). Filter for “base_model:” on the model names (gemini-1.5-pro, gemini-1.5-flash, textembedding-gecko) and “region:” for the region that your GCP project is running in.
Please note that the quota is not a guarantee of capacity, but is intended by Google to ensure fair use of the shared capacity, and your requests may not be served during peak periods. To obtain guaranteed capacity, please speak with your Google account team about purchasing Provisioned Throughput.
Capacity Requirements
Query planning and answer generation
Here are the capacity requirements for GPT-4o, GPT-4 Turbo, Claude 3.5 Sonnet, or Gemini 1.5 Pro:
Users | RPM | TPM |
500 | 20 | 40,000 |
1,000 | 40 | 80,000 |
2,500 | 100 | 200,000 |
5,000 | 200 | 400,000 |
10,000 | 350 | 700,000 |
20,000 | 500 | 1,000,000 |
Tool selection and followup question generation
Here are the capacity requirements for GPT-4o mini, GPT-3.5-Turbo, Claude 3 Haiku, or Gemini 1.5 Flash:
Users | RPM | TPM |
500 | 10 | 12,000 |
1,000 | 20 | 24,000 |
2,500 | 50 | 60,000 |
5,000 | 100 | 120,000 |
10,000 | 175 | 200,000 |
20,000 | 250 | 280,000 |
Citation generation
Here are the capacity requirements for Open AI Ada Embeddings or Google Embeddings for Text:
Users | RPM | TPM |
500 | 15 | 12,500 |
1,000 | 30 | 25,000 |
2,500 | 75 | 62,500 |
5,000 | 150 | 125,000 |
10,000 | 260 | 218,000 |
20,000 | 375 | 312,000 |
Here are the capacity requirements for Amazon > Titan Embeddings G1 - Text:
Users | RPM | TPM |
500 | 500 | 12,500 |
1,000 | 1,000 | 25,000 |
2,500 | 2,500 | 62,500 |
5,000 | 5,000 | 125,000 |
10,000 | 8,750 | 218,000 |
20,000 | 12,500 | 312,000 |