To enable Glean's Generative AI features, you need to select a LLM provider. We recommend using Glean's Azure OpenAI key. We also provide the option of using your own LLM key with Azure OpenAI, Google Vertex AI (if your deployment is on GCP), and Amazon Bedrock (if your deployment is on AWS).

This can can be setup in Admin console > Platform > Assistant. To use your own LLM provider, please work with your Glean sales or technical services person.

Option 1: Glean’s Azure OpenAI key (Recommended)

Glean has a signed agreement with Azure OpenAI promising:

0-day retention: Your data will not be stored by Azure.
Data will not be used to train any custom large language models.
Data encryption: All data is encrypted in transit.
Compliance: Azure is compliant with a variety of industry standards. See details here

The advantages of this approach are

Guaranteed capacity: Scale to all your users
Performance: Low latency
Access to all the necessary models out of the box.
0-day retention isn’t the default on Azure Open AI.

Option 2: Your own OpenAI or Azure OpenAI key

We currently support using either GPT-4o or GPT-4-Turbo as the primary model for query planning and answer generation.

Fill out the Azure OpenAI Service form and request access to the following models:

Model name	How Glean uses the model
GPT-4o or GPT-4-Turbo	Query planning and answer generation
GPT-4o-mini or GPT-3.5-Turbo	Tool selection and followup question generation
Ada Embeddings	Match sentences in the generated answer with the retrieved search results to generate citations.

Please see Azure OpenAI Service quotas and limits for the default quotas and instructions for requesting additional quota.

Option 3: Anthropic Claude models on Amazon Bedrock (AWS only)

Log into the AWS Console with a user who has permissions to subscribe to Bedrock models within the AWS account.

If your AWS instance is in one of the following regions, we will send the LLM traffic to Amazon Bedrock in the same region as your instance. If it is not in one of these regions, please contact the Glean team to configure your instance to send the LLM traffic to the nearest region where the models are supported.

us-east-1 (N. Virginia)
us-west-2 (Oregon)
ap-northeast-1 (Tokyo)
eu-central-1 (Frankfurt)

Go to Amazon Bedrock > Model access, select the same region as your Glean AWS instance (or the nearest supported region), and request access to these models:

Model name	How Glean uses the model
Anthropic > Claude 3.5 Sonnet anthropic.claude-3-5-sonnet-20240620-v1:0	Query planning and answer generation
Anthropic > Claude 3 Haiku anthropic.claude-3-haiku-20240307-v1:0	Tool selection and followup question generation
Amazon > Titan Embeddings G1 - Text amazon.titan-embed-text-v1	Match sentences in the generated answer with the retrieved search results to generate citations.

If you are asked about the use case for the models, you can enter:

Generate answers to questions about internal company documents

Please see Amazon Bedrock Quotas for the default quotas for these models on pay-as-you-go. If you require additional quota, you will need to reach out to your AWS account manager. Bedrock does not provide a self-service method to increase quota at this time.

Glean does not currently support Provisioned Throughput on Bedrock, but we can work with you and the AWS team to enable this in the future.

Option 4: Anthropic Claude models on Google Vertex AI (GCP only)

Go to the Vertex AI Model Garden and make sure you have enabled access to the following foundation models from the GCP project that Glean is running in:

Model name	How Glean uses the model
Claude 3.5 Sonnet claude-3-5-sonnet@20240620	Query planning and answer generation
Claude 3 Haiku claude-3-haiku@20240307	Tool selection and followup question generation
Embeddings for Text textembedding-gecko@002	Match sentences in the generated answer with the retrieved search results to generate citations.

You will need to file a standard GCP quota request, which is expressed in Requests Per Minute (RPM) and Tokens Per Minute (TPM). Filter for “base_model:” on the model name (anthropic-claude-3-5-sonnet, anthropic-claude-3-haiku, textembedding-gecko) and “region:” for the region that your GCP project is running in.

Please note that the quota is not a guarantee of capacity, but is intended by Google to ensure fair use of the shared capacity, and your requests may not be served during peak periods. To obtain guaranteed capacity, please speak with your Google account team about purchasing Provisioned Throughput.

Option 5: Gemini models on Google Vertex AI (GCP only)

Go to the Vertex AI Model Garden and make sure you have enabled access to the following foundation models from the GCP project that Glean is running in:

Model name	How Glean uses the model
Gemini 1.5 Pro gemini-1.5-pro-002	Query planning and answer generation
Gemini 1.5 Flash gemini-1.5-flash-002	Tool selection and followup question generation
Embeddings for Text textembedding-gecko@002	Match sentences in the generated answer with the retrieved search results to generate citations.

You will need to file a standard GCP quota request, which is expressed in Requests Per Minute (RPM) and Tokens Per Minute (TPM). Filter for “base_model:” on the model names (gemini-1.5-pro, gemini-1.5-flash, textembedding-gecko) and “region:” for the region that your GCP project is running in.

Capacity Requirements

Query planning and answer generation

Here are the capacity requirements for GPT-4o, GPT-4 Turbo, Claude 3.5 Sonnet, or Gemini 1.5 Pro:

Users	RPM	TPM
500	20	70,000
1,000	40	135,000
2,500	100	335,000
5,000	200	665,000
10,000	350	1,165,000
20,000	500	1,665,000

Tool selection and followup question generation

Here are the capacity requirements for GPT-4o mini, GPT-3.5-Turbo, Claude 3 Haiku, or Gemini 1.5 Flash:

Users	RPM	TPM
500	10	15,000
1,000	20	25,000
2,500	50	60,000
5,000	100	115,000
10,000	175	205,000
20,000	250	290,000

Citation generation

Here are the capacity requirements for Open AI Ada Embeddings or Google Embeddings for Text:

Users	RPM	TPM
500	15	13,000
1,000	30	25,000
2,500	75	63,000
5,000	150	125,000
10,000	260	219,000
20,000	375	313,000

Here are the capacity requirements for Amazon > Titan Embeddings G1 - Text:

Users	RPM	TPM
500	500	13,000
1,000	1,000	25,000
2,500	2,500	63,000
5,000	5,000	125,000
10,000	8,750	219,000
20,000	12,500	313,000