Connect to Gitlab Server
The instructions below will work only for on-prem instances that the Glean Crawler running on GCP can access. Please reach out to Glean Support for any network configuration required.
Step 1. Determine API access token scopes
To authorize our API calls, we need a personal access token from a Gitlab admin account. This account must have access to all projects that you’d like Glean to crawl. For this API token, if you're willing to grant us the api scope, we can programmatically create webhooks during setup. If you want to restrict the token to read-only access, you will need to manually create webhooks for every single project that you want crawled.
Step 2. Create a personal access token
Sign into your Gitlab admin account.
Navigate to the upper right hand corner (user icon) and click "Preferences"
Select "Access Tokens" on the left side menu.
Add a personal access token.
Name: Glean Token
Scopes:
if you're allowing write privileges
api
if you're only allowing read privileges:
read_user
read_api
read_repository
admin_mode
Note: admin_mode is required for the crawler to be able to list user emails (ref)
Leave Expires at empty
Copy the personal access token into the corresponding field in Glean
Check the box if the token has write privileges
Step 3. Provide information about your Gitlab server instance
Your Gitlab instance domain name (e.g. https://gitlab.company.com)
Your Gitlab Server IP
If you provided api scope for the token, click Save and you're done! Otherwise, move on to the next step.
Step 4. Create Webhooks Manually (if token has only read access)
Log into Gitlab with an admin account in order to manually create webhooks.
Now we’ll create project webhooks. For each project perform the following steps:
Navigate to the project page within Gitlab.
On the left-side menu, navigate to Settings → Webhooks.
Create a webhook with the following properties:
Lastly, we’ll create a System Hook:
Navigate to Admin Area → System Hooks.
Create a hook with the following properties:
Step 5. Upload User Mapping to GCS
Gitlab doesn’t return a user’s email via the API unless that user has explicitly allowed their email to be publicly shown. In order for us to correctly crawl permissions in Gitlab, we need to be able to map each user id to their company email.
Please create a CSV with two columns: Gitlab user ID, and email. The CSV doesn’t need to have column headers, but the columns do need to be in the order (user ID, email).
Note that the user ID is NOT the username –– the user ID should be numbers only and corresponds to the
id
in the example response of the /members API.Example of a correct row:
12345,user1@glean.com
Provide this CSV to your Glean support team to correctly map the user IDs to the primary SSO email.
To list all gitlab user IDs, you can use the Gitlab API. For your company emails, they can hopefully be queried from an identity system like Okta, Gsuite, or any other source you may have internally.
Appendix
If you want to direct webhooks through local network, before creating webhooks, we’ll need to allow local requests to the local network from web hooks and services:
Navigate to Admin Area → Settings → Network
Expand Outbound requests
Paste the local proxy IP into the text box under "Local IP addresses and domain names that hooks and services may access." (Glean Support can help you with finding this IP)
Click Save changes