Overview
File Upload is available for Assistant, Public Knowledge, and Apps. We are also providing an API to upload files to Glean Assistant. This feature allows users to ask about uploaded local files, including summarizing, analyzing, and generating content. Limits for file upload will depend on the model used in the Glean instance.
The content of files is stored in users' chat sessions for 24 hours. File metadata is retained within the chat session where the files were uploaded for up to 30 days and then deleted. The files are not added to your corpus and are not accessible to other users.
Supported File Formats
The feature supports a variety of file types, categorized as follows:
Document Files: pdf, doc, docx, pages
Spreadsheet Files: xls, xlsx, numbers
Presentation Files: ppt, pptx, key
Text Files: csv, json, xml, txt, rtf
Web Files: html, css
Code Files: java, py, js, ts, cpp, c, ipynb, sql, sh, go, yaml, log
Key Features
File Upload: Users can upload multiple files (up to 5 files, each with a maximum size of 10 MB) directly from their local computer.
Real-Time Querying: Users can query the text content of the uploaded files immediately after upload.
Document Metadata: The chat UI will display document metadata, such as title and file type.
File Deletion:
Users can delete uploaded files before submitting their first query. Once a query is submitted, the files cannot be deleted directly from the chat session but will be removed when the chat session history is deleted.
We will also have a default 24 hour retention policy for the content of all files uploaded.
In keeping consistency with our chat retention policy, all files content and metadata will be deleted 30 days after a chat session is started.
API Support: Customers who utilize our developer platform can also upload files via our API. Please visit the following documentation
Security: The files will be parsed and scanned for malware before being stored within the cloud project. Any files with detected malware will have an error for upload
Privacy: Files uploaded will only be accessible to the user who uploaded them
Limits:
The minimum file size for upload is 1 KB
The maximum file size and number of files for upload in one session is determined by the token limit for your model
128K Models: 5 files and 10 MB
32K Models: 4 files and 5 MB
8K Models: 2 files and 2 MB
Known Limitations
Multi-media support: We do not support images, audio, video files, and any other files outside of the list above for upload.
Custom data retention policies: We do not support data retention policies beyond the default 24 hour policy for data and 30 days policy for metadata described above. You can ask users to disable chat session history or to manually delete chat sessions if you would like to delete metadata sooner.
Optical Character Recognition must be enabled for your org for scanned PDFs to work: Please contact your Glean if you run into issues with upload PDFs not working, particularly those that are scanned.
How do I enable file upload?
Turn on the feature for all of their users via the Settings tag in the Assistant section of workspace. File Upload is off by default for existing customers (as of GA on 9/24) and on by default for new customers.
Related Features
Future Work
Support for larger file sizes
More robust support for PDFs
Data Analysis for Spreadsheets and CSV Files
Multimedia file types:
Video files: mp4, mov, avi
Image files: jpg, png, gif
Audio files:mp3