Note: The instructions below will work only for on-prem instances that the Glean Crawler running on GCP can access. Please reach out to Glean Support for any network configuration required.
Overview
Glean requires authentication to the GitHub instance in order to fetch relevant information.
Authentication is done by creating an application in GitHub.
Glean understands all user access permissions and strictly enforces permissions for users at the time of the query which ensures that users are not able to see results which they do not have access to.
It’s important to note that all data is stored in the GCP project in the customer's cloud account and no data leaves the customer's environment.
Integration Features
For GitHub, Glean will capture the following content:
PR descriptions
PR conversations/comments
Issue threads
Commit messages for main branch
Wikis
Additionally, we will capture the following from the latest commit on the main branch:
Directory/file names
Full content of documentation files only (.md and .txt)
We do not currently support code search or Github Pages. Both on-prem and cloud are supported.
API Usage
Glean uses the standard API to ingest all data.
In order to capture changes as quickly as possible, Glean will deploy a webhook which will send push notifications to an endpoint deployed in the GCP project (in your cloud infrastructure).
The app requests access to the following with a read-only scope:
Repository permissions
Administration
Contents
Issues
Metadata
Pull requests
Commit statuses
Organization permissions
Members
It also subscribes to the following events:
Commit comment
Issue comment
Member
Organization
Pull request
Pull request review
Pull request review comment
Push
Repository
Team
Team add
Setup
Prerequisites
User requirements
The user must be an organization administrator in GitHub.
Installation Process
Step 1. Create a GitHub App
This app will be used by Glean to crawl your GitHub instance.
Go to your GitHub Server.
Click on your organization.
Click settings.
Click GitHub Apps.
Click New GitHub App.
Fill the following fields:
Name: Glean
Homepage URL: https://app.glean.com
Identifying and authorizing users
User authorization callback URL: Copy the generated URL from the setup page
Request user authorization: unchecked
Post installation
Leave blank
Webhook
Repository permissions
Set only the following to read-only:
Repository permissions
Administration
Contents
Commit statuses
Issues
Metadata
Pull requests
Pages
Organization permissions
Members
User permissions (or Account Permissions)
Email addresses
Subscribe to events
Check only the following:
Commit comment
Issues
Issue comment
Member
Organization
Pull request
Pull request review
Pull request review comment
Push
Repository
Team
Team add
Where can this App be installed: Any account
Step 2. Configure the GitHub App
Copy the following values into the corresponding fields in Glean:
App ID
Client ID
Client Secret
At the very bottom of the page, click "Generate a private key" It will download the key to your local machine. Upload this file into the corresponding field in Glean.
Step 3. Install the GitHub App
Click on Install App from the menu on the left. Click Install for your organization.
Step 4. Configure additional configs on Admin Console
Enter the following configs in Glean:
Git Domain
Organization Name
Post Setup
Exclusions/Redlisting repositories is possible, as well as control over which file extensions have full content indexed.
Users will be prompted to authenticate to GitHub oauth to help sync user aliases. They will not be able to see data in private repositories until the auth flow is completed for them. Once authentication is complete the next entity crawl will sync the aliases, which happens every hour.
For any questions or issues with this setup, please reach out to support@glean.com.