Glean requires authentication to the GitHub instance in order to fetch relevant information.
Authentication is done by creating an application in GitHub.
Glean understands all user access permissions and strictly enforces permissions for users at the time of the query which ensures that users are not able to see results which they do not have access to.
It’s important to note that all data is stored in the GCP project in the customer's cloud account and no data leaves the customer's environment.
For GitHub, Glean will capture the following content:
Commit messages for main branch
Additionally, we will capture the following from the latest commit on the main branch:
Full content of documentation files only (.md and .txt)
We do not currently support code search or Github Pages. Both on-prem and cloud are supported.
Glean uses the standard API to ingest all data.
In order to capture changes as quickly as possible, Glean will deploy a webhook which will send push notifications to an endpoint deployed in the GCP project (in your cloud infrastructure).
The app requests access to the following with a read-only scope:
It also subscribes to the following events:
Pull request review
Pull request review comment
The user must be an organization administrator in GitHub.
As listed above, you will install the Glean application in your GitHub account. We also need:
The organization name. (github.com/<org-name>)
Please use these instructions for installing on-prem versions.
Redlisting repositories is possible, as well as control over which file extensions have full content indexed.
Users will be prompted to authenticate to GitHub oauth to help sync user aliases. They will not be able to see data in private repositories until the auth flow is completed for them. Once authentication is complete the next entity crawl will sync the aliases, which happens every hour.
For any questions or issues with this setup, please reach out to email@example.com.