All Collections
Chat
Glean Assistant LLM Providers
Glean Assistant LLM Providers
Cindy Chang avatar
Written by Cindy Chang
Updated over a week ago

To enable Glean Generative AI features, you need to select a LLM provider. We recommend using Glean's Azure OpenAI key. We also provide the option of using your own OpenAI key or Google's PaLM 2 and Gemini 1.5 family of models.

This can can be setup in Workspace Settings. To use your own LLM provider, please work with your Glean sales or technical services person.

Option 1: Glean’s Azure OpenAI key (Recommended)

Glean has a signed agreement with Azure OpenAI promising:

  • 0-day retention: Your data will not be stored by Azure.

  • Data will not be used to train any custom large language models.

  • Data encryption: All data is encrypted in transit.

  • Compliance: Azure is compliant with a variety of industry standards. See details here

The advantages of this approach are

  • Guaranteed capacity: Scale to all your users

  • Performance: Low latency

  • Access to all the necessary models out of the box.

  • 0-day retention isn’t the default on Azure Open AI.

Option 2: Your own OpenAI or Azure OpenAI key

If you prefer to use your company’s own provisioned OpenAI or Azure OpenAI key, you should ensure your key has sufficient:

  • Access: We require access to the GPT-4, GPT-3.5-Turbo, and text-ada-embedding-002 models. We can support GPT-4-32k if configured. To get access on Azure, fill out the Azure OpenAI Service form.

  • Capacity: We suggest that you first talk to your Azure OpenAI sales representative for guidance on how to optimally request capacity (e.g. what regions are likely to have availability and how much) and then file a quota increase request.

    • For GPT-4, our capacity requirement is 10 RPM (requests per minute) and 40k TPM (tokens per minute) for every 500 users.

    • For text-ada-embedding-002, our capacity requirement is 75 RPM and 11.25k TPM for every 500 users.

Capacity Requirements

Users

GPT-4 RPM

GPT-4 TPM

text-ada RPM

text-ada TPM

500

10

40,000

75

11,250

1,000

20

80,000

150

22,500

5,000

100

400,000

750

112,500

10,000

175

550,000

1,300

200,000

20,000

250

800,000

1,875

280,000

Option 3: Google LLMs

Please reach out to the Glean team if you want to configure Google as your LLM provider. We currently support the PaLM 2 family of models (text-unicorn-001, text-bison-001, text-bison-32k, and gecko models) and Gemini 1.5 models. These models are called directly from your project and the data does not leave your deployment.

  • Access: The Google models must be running in the same GCP project as Glean since we use Application Default Credentials

  • Capacity: You will need to file a standard GCP quota request to ensure you have enough capacity.

    • For text-unicorn, our capacity requirement is 10 RPM (requests per minute) for every 500 users.

    • For gecko, our capacity requirement is 75 RPM for every 500 users.

Capacity Requirements

Users

text-unicorn RPM

gecko RPM

500

10

75

1,000

20

150

5,000

100

750

10,000

175

1,300

20,000

250

1,875

Other LLMs

We also support Anthropic's Claude 3 Sonnet. We constantly evaluate other language models like Cohere and various other open source models. We will evaluate Gemini Ultra when it becomes generally available.

Did this answer your question?