Glean captures the following content from Google Drive:
Native file types such as Doc, Sheets, Slides
Content from personal and shared drives
Supported files in GDrive - listed here
Glean uses Google Drive’s standard API to ingest all data.
Setting up the GDrive connector requires the creation of a service account with super admin privileges. Alternatively, an admin custom role can be created.
All permissions are automatically respected by Glean. Users will only see search results for what they have access to. When the user clicks on a search result they are taken to the Google Drive web application which enforces the permissions just like it would if the user was to go to Google Drive directly.
Identity (permissions changes) is crawled every 10 minutes
Activity reports (adds/updates/permissions changes etc) is crawled every minute
Incremental crawl every 3 hours (added reliability from the 10-minute activity reports)
Full crawl of the corpus every 28 days
People data is crawled every hour
People data is indexed after an additional hour
The configuration parameters also include control of full/incremental crawl rates (full set of options can be shared on request)
Content permissions changes (i.e, fileA is shared/no longer shared to a user/group) is picked up by the activity crawl which runs every minute. Changes to a group's memberships is picked up by the identity crawl.
Controls to Redlist/Greenlist Content
Admins can redlist and greenlist shared or user drives or folders.
Here are the in-product instructions:
Connect to Glean
Enter the email of your Google Drive super admin into Glean.
Add API scopes
Visit the Google Admin Console to Manage OAuth Clients. You’ll need to be signed in as an admin.
Click Add new and paste the 21-digit Unique ID from below into the Client ID field.
Note: if you have already connected Google Tools (Google Calendar and Gmail) with this same Client ID, you should instead click ‘Edit’ on the existing API client and then add the additional scopes below.
Copy and paste the following into the OAuth scopes (comma-delimited) field and then click Authorize:
For any questions or issues with this setup, please reach out to email@example.com.