Storage Adapters
OpenCodeHub supports a pluggable storage system, allowing you to store repository data (git objects, LFS files, artifacts) on various backends.
Supported Adapters
Section titled “Supported Adapters”- Local Filesystem: Default, stores data on the server’s disk.
- S3 Compatible: AWS S3, MinIO, Cloudflare R2, DigitalOcean Spaces.
- Google Drive: Ideal for personal/low-cost deployments.
- Azure Blob Storage: Microsoft Azure storage.
📂 Local Storage (Default)
Section titled “📂 Local Storage (Default)”Data is stored in the data/ directory relative to the application root.
Configuration:
STORAGE_TYPE=localSTORAGE_PATH=./data/storage # Optional, default is ./data☁️ S3 Compatible Storage
Section titled “☁️ S3 Compatible Storage”Store data in any S3-compatible bucket. This is recommended for production scalablity.
Configuration:
STORAGE_TYPE=s3STORAGE_BUCKET=my-opencodehub-bucketSTORAGE_REGION=us-east-1 # or autoSTORAGE_ENDPOINT=https://s3.amazonaws.com # or your custom endpointS3_ACCESS_KEY=your-access-keyS3_SECRET_KEY=your-secret-keyExamples
Section titled “Examples”MinIO (Self-hosted):
STORAGE_ENDPOINT=http://minio:9000STORAGE_REGION=us-east-1S3_FORCE_PATH_STYLE=trueCloudflare R2:
STORAGE_ENDPOINT=https://<account-id>.r2.cloudflarestorage.comSTORAGE_REGION=auto🚗 Google Drive Stack
Section titled “🚗 Google Drive Stack”This stack is ideal for cost-effective, serverless-style deployments where you want to minimize persistent volume usage.
Prerequisites
Section titled “Prerequisites”- Google Cloud Project: Enable Google Drive API.
- OAuth Credentials: Create “Web Application” credentials.
- Refresh Token: Obtain a long-lived refresh token (e.g., via OAuth Playground).
Configuration
Section titled “Configuration”STORAGE_TYPE=gdriveGOOGLE_CLIENT_ID=your-client-idGOOGLE_CLIENT_SECRET=your-client-secretGOOGLE_REFRESH_TOKEN=your-refresh-tokenGOOGLE_FOLDER_ID=your-folder-idHow to get Credentials:
- Go to Google Cloud Console.
- Create a project -> Enable Google Drive API.
- Create OAuth Client ID.
- Get Refresh Token via OAuth Playground with scope
https://www.googleapis.com/auth/drive.file.
🔷 Azure Blob Storage
Section titled “🔷 Azure Blob Storage”Configuration:
STORAGE_TYPE=azureAZURE_STORAGE_CONNECTION_STRING=DefaultEndpointsProtocol=https;AccountName=...AZURE_CONTAINER_NAME=opencodehub⚠️ Performance Considerations
Section titled “⚠️ Performance Considerations”Latency Comparison
Section titled “Latency Comparison”| Operation | Local SSD | S3/R2 Object Storage |
|---|---|---|
| Repository page load | ~50-100ms | 5-30 seconds |
| Clone (small repo) | ~100ms | 2-10 seconds |
| Push (10 files) | ~200ms | 1-5 seconds |
| File browser navigation | ~30ms | 500ms-2s |
Why Does This Happen?
Section titled “Why Does This Happen?”When using object storage, every Git operation requires multiple HTTP requests:
- List objects - 1 API call
- Download each file - 40+ API calls for a typical repo
- Each request has ~50-100ms latency (network + S3 processing)
A repository page load might download:
HEAD,config,refs/heads/*(branch info)objects/pack/*.idx,*.pack(git objects)- All hook files, info files, etc.
40 files × 100ms = 4+ seconds minimum
Recommendations by Use Case
Section titled “Recommendations by Use Case”🏢 Production (Self-Hosted)
Section titled “🏢 Production (Self-Hosted)”Best: Local SSD storage + S3 for backups only
STORAGE_TYPE=localSTORAGE_PATH=/var/lib/opencodehub/reposUse a cron job or background process to sync to S3 for disaster recovery.
☁️ Serverless (Vercel, Cloudflare Workers)
Section titled “☁️ Serverless (Vercel, Cloudflare Workers)”Trade-off: Accept latency or implement caching
STORAGE_TYPE=s3# Enable local cache (coming soon)ENABLE_REPO_CACHE=trueCACHE_TTL_SECONDS=300💰 Budget-Conscious (Google Drive)
Section titled “💰 Budget-Conscious (Google Drive)”Trade-off: Slower but very cheap
STORAGE_TYPE=gdriveGoogle Drive has similar latency characteristics to S3.
How GitHub/GitLab Solve This
Section titled “How GitHub/GitLab Solve This”| Company | Approach |
|---|---|
| GitHub | Repos on local NVMe clusters, S3 for backups only |
| GitLab | Gitaly (dedicated Git servers with local storage) |
| Bitbucket | Sharded storage servers, repos cached in memory |
Key insight: Major Git hosts never serve live traffic from object storage. They use local disk + memory caching for hot repos.
Future Improvements (Roadmap)
Section titled “Future Improvements (Roadmap)”We’re working on:
- Local disk cache with TTL-based invalidation
- Parallel S3 downloads (70% faster initial loads)
- Redis metadata cache (instant branch/tree info)
- Gitaly-like architecture for enterprise deployments
🔄 Migrating Between Storage Types
Section titled “🔄 Migrating Between Storage Types”To migrate from local to S3:
# 1. Backup current repostar -czf repos-backup.tar.gz data/repos/
# 2. Upload to S3aws s3 sync data/repos/ s3://your-bucket/repos/
# 3. Update .envSTORAGE_TYPE=s3STORAGE_BUCKET=your-bucket
# 4. Clear local cacherm -rf .tmp/repos/To migrate from S3 to local:
# 1. Download from S3aws s3 sync s3://your-bucket/repos/ data/repos/
# 2. Update .envSTORAGE_TYPE=localSTORAGE_PATH=./data/repos