Skip to main content
All CollectionsChat
Glean Assistant LLM Providers
Glean Assistant LLM Providers
Cindy Chang avatar
Written by Cindy Chang
Updated over a week ago

To enable Glean's Generative AI features, you need to select a LLM provider. We recommend using Glean's Azure OpenAI key. We also provide the option of using your own LLM key with Azure OpenAI, Google Vertex AI (if your deployment is on GCP), and Amazon Bedrock (if your deployment is on AWS).

This can can be setup in Admin console > Platform > Assistant. To use your own LLM provider, please work with your Glean sales or technical services person.

Option 1: Glean’s Azure OpenAI key (Recommended)

Glean has a signed agreement with Azure OpenAI promising:

  • 0-day retention: Your data will not be stored by Azure.

  • Data will not be used to train any custom large language models.

  • Data encryption: All data is encrypted in transit.

  • Compliance: Azure is compliant with a variety of industry standards. See details here

The advantages of this approach are

  • Guaranteed capacity: Scale to all your users

  • Performance: Low latency

  • Access to all the necessary models out of the box.

  • 0-day retention isn’t the default on Azure Open AI.

Option 2: Your own OpenAI or Azure OpenAI key

We currently support using either GPT-4o or GPT-4-Turbo as the primary model for query planning and answer generation.

Fill out the Azure OpenAI Service form and request access to the following models:

Model name

How Glean uses the model

GPT-4o or GPT-4-Turbo

Query planning and answer generation

GPT-4o-mini or GPT-3.5-Turbo

Tool selection and followup question generation

Ada Embeddings

Match sentences in the generated answer with the retrieved search results to generate citations.

Please see Azure OpenAI Service quotas and limits for the default quotas and instructions for requesting additional quota.

Option 3: Anthropic Claude models on Amazon Bedrock (AWS only)

Log into the AWS Console with a user who has permissions to subscribe to Bedrock models within the AWS account.

If your AWS instance is in one of the following regions, we will send the LLM traffic to Amazon Bedrock in the same region as your instance. If it is not in one of these regions, please contact the Glean team to configure your instance to send the LLM traffic to the nearest region where the models are supported.

  • us-east-1 (N. Virginia)

  • us-west-2 (Oregon)

  • ap-northeast-1 (Tokyo)

  • eu-central-1 (Frankfurt)

Go to Amazon Bedrock > Model access, select the same region as your Glean AWS instance (or the nearest supported region), and request access to these models:

Model name

How Glean uses the model

Anthropic > Claude 3.5 Sonnet

anthropic.claude-3-5-sonnet-20240620-v1:0

Query planning and answer generation

Anthropic > Claude 3 Haiku

anthropic.claude-3-haiku-20240307-v1:0

Tool selection and followup question generation

Amazon > Titan Embeddings G1 - Text

amazon.titan-embed-text-v1

Match sentences in the generated answer with the retrieved search results to generate citations.

If you are asked about the use case for the models, you can enter:

Generate answers to questions about internal company documents

Please see Amazon Bedrock Quotas for the default quotas for these models on pay-as-you-go. If you require additional quota, you will need to reach out to your AWS account manager. Bedrock does not provide a self-service method to increase quota at this time.

Glean does not currently support Provisioned Throughput on Bedrock, but we can work with you and the AWS team to enable this in the future.

Option 4: Anthropic Claude models on Google Vertex AI (GCP only)

Go to the Vertex AI Model Garden and make sure you have enabled access to the following foundation models from the GCP project that Glean is running in:

Model name

How Glean uses the model

claude-3-5-sonnet@20240620

Query planning and answer generation

claude-3-haiku@20240307

Tool selection and followup question generation

textembedding-gecko@002

Match sentences in the generated answer with the retrieved search results to generate citations.

You will need to file a standard GCP quota request, which is expressed in Requests Per Minute (RPM) and Tokens Per Minute (TPM). Filter for “base_model:” on the model name (anthropic-claude-3-5-sonnet, anthropic-claude-3-haiku, textembedding-gecko) and “region:” for the region that your GCP project is running in.

Please note that the quota is not a guarantee of capacity, but is intended by Google to ensure fair use of the shared capacity, and your requests may not be served during peak periods. To obtain guaranteed capacity, please speak with your Google account team about purchasing Provisioned Throughput.

Option 5: Gemini models on Google Vertex AI (GCP only)

Go to the Vertex AI Model Garden and make sure you have enabled access to the following foundation models from the GCP project that Glean is running in:

Model name

How Glean uses the model

gemini-1.5-pro-002

Query planning and answer generation

gemini-1.5-flash-002

Tool selection and followup question generation

textembedding-gecko@002

Match sentences in the generated answer with the retrieved search results to generate citations.

You will need to file a standard GCP quota request, which is expressed in Requests Per Minute (RPM) and Tokens Per Minute (TPM). Filter for “base_model:” on the model names (gemini-1.5-pro, gemini-1.5-flash, textembedding-gecko) and “region:” for the region that your GCP project is running in.

Please note that the quota is not a guarantee of capacity, but is intended by Google to ensure fair use of the shared capacity, and your requests may not be served during peak periods. To obtain guaranteed capacity, please speak with your Google account team about purchasing Provisioned Throughput.

Capacity Requirements

Query planning and answer generation

Here are the capacity requirements for GPT-4o, GPT-4 Turbo, Claude 3.5 Sonnet, or Gemini 1.5 Pro:

Users

RPM

TPM

500

20

40,000

1,000

40

80,000

2,500

100

200,000

5,000

200

400,000

10,000

350

700,000

20,000

500

1,000,000

Tool selection and followup question generation

Here are the capacity requirements for GPT-4o mini, GPT-3.5-Turbo, Claude 3 Haiku, or Gemini 1.5 Flash:

Users

RPM

TPM

500

10

12,000

1,000

20

24,000

2,500

50

60,000

5,000

100

120,000

10,000

175

200,000

20,000

250

280,000

Citation generation

Here are the capacity requirements for Open AI Ada Embeddings or Google Embeddings for Text:

Users

RPM

TPM

500

15

12,500

1,000

30

25,000

2,500

75

62,500

5,000

150

125,000

10,000

260

218,000

20,000

375

312,000

Here are the capacity requirements for Amazon > Titan Embeddings G1 - Text:

Users

RPM

TPM

500

500

12,500

1,000

1,000

25,000

2,500

2,500

62,500

5,000

5,000

125,000

10,000

8,750

218,000

20,000

12,500

312,000

Did this answer your question?