Github Connector
Cindy Chang avatar
Written by Cindy Chang
Updated over a week ago

Overview

  • Glean requires authentication to the GitHub instance in order to fetch relevant information.

  • Authentication is done by creating an application in GitHub.

  • Glean understands all user access permissions and strictly enforces permissions for users at the time of the query which ensures that users are not able to see results which they do not have access to.

  • It’s important to note that all data is stored in the GCP project in the customer's cloud account and no data leaves the customer's environment.

Integration Features

For GitHub, Glean will capture the following content:

  • PR descriptions

  • PR conversations/comments

  • Issue threads

  • Commit messages for main branch

  • Wikis

Additionally, we will capture the following from the latest commit on the main branch:

  • Directory/file names

  • Full content of documentation files only (.md and .txt)

We do not currently support code search or Github Pages. Both on-prem and cloud are supported.

API Usage

Glean uses the standard API to ingest all data.

In order to capture changes as quickly as possible, Glean will deploy a webhook which will send push notifications to an endpoint deployed in the GCP project (in your cloud infrastructure).

The app requests access to the following with a read-only scope:

  • Repository permissions

    • Administration

    • Contents

    • Issues

    • Metadata

    • Pull requests

    • Commit statuses

  • Organization permissions

    • Members

It also subscribes to the following events:

  • Commit comment

  • Issue comment

  • Member

  • Organization

  • Pull request

  • Pull request review

  • Pull request review comment

  • Push

  • Repository

  • Team

  • Team add

Setup

Prerequisites

User requirements

  • The user must be an organization administrator in GitHub.

Installation Process

Cloud version

As listed above, you will install the Glean application in your GitHub account. We also need:

  • The organization name. (github.com/<org-name>)

On-prem version

Post Setup

  • Redlisting repositories is possible, as well as control over which file extensions have full content indexed.

  • Users will be prompted to authenticate to GitHub oauth to help sync user aliases. They will not be able to see data in private repositories until the auth flow is completed for them. Once authentication is complete the next entity crawl will sync the aliases, which happens every hour.

For any questions or issues with this setup, please reach out to support@glean.com.

Did this answer your question?