Heimdall: An Open Source Salesforce Backup That Actually Feels Enterprise

Salesforce data loss is rarely dramatic. It's usually accidental deletes, a bad Flow deployment, a dodgy integration, or someone mass-updating the wrong field. Then you discover it days later, when the audit trail is already cold.

Salesforce gives you Weekly Export, but it's manual and only weekly. Commercial tools like OwnBackup and GRAX exist, but per-user pricing adds up fast — for an org with 200 users, you're looking at $600–1,000/month.

Heimdall is an open source alternative built by Johan Karlsteen, who spent years running his own bash-and-Python backup scripts in production before rewriting the whole thing properly in Java 21 with Spring Boot and Spring Batch. It's been running in production on his org for years, and he recently open-sourced it under the AGPL-3.0 license.

Original write-up (with screenshots): johan.karlsteen.com

GitHub repo: github.com/devrandom-se/heimdall

What Heimdall Does

Heimdall backs up your Salesforce org on a schedule and stores record version history and metadata in PostgreSQL, with raw CSV exports and file binaries in S3 (or any S3-compatible storage like MinIO or Backblaze B2).

From the project README, the core features are:

  • Complete version history for every record
  • Deleted record browsing (restore API is not yet implemented — the UI buttons are stubs for now)
  • Record archiving with optional cleanup in Salesforce
  • ContentVersion file backup with deduplication
  • A web GUI for browsing, searching, and restoring data

Two Modes: Backup Engine and Restore GUI

Heimdall runs in two modes.

Batch mode is the backup engine. It runs as a scheduled ECS Fargate task, a Kubernetes cron job, or just java -jar on a server. It queries every object in your org via the Salesforce Bulk API, stores record metadata in PostgreSQL, and uploads full CSV data to S3. If a run dies halfway through, it resumes from checkpoints rather than starting over.

Web mode is the restore GUI, built with the Salesforce Lightning Design System. You can search for any record by ID, see its complete version history across all backup periods, view field-by-field diffs between versions, browse deleted and archived records, and see related child records and files.

The Practical Bits That Make It Worth Looking At

A few implementation choices stand out because they map to real production pain.

Checkpoint resume that survives failure. Backups fail. Containers restart. Jobs get OOM-killed. Every few files, Heimdall writes a checkpoint to PostgreSQL with the last processed record ID and timestamp. The next run picks up exactly where it left off.

ContentVersion deduplication. Salesforce stores files with a checksum, and Heimdall uses content-addressable storage — each file is stored once by its checksum, regardless of how many records reference it. According to Johan's write-up, on one org with 3.5 million ContentVersions, this saved 320 GB out of 830 GB total — a 38% reduction.

Dynamic batch sizing. The Salesforce Bulk API has a sweet spot. Too-small batches and you spend all your time on API overhead; too-large and you risk timeouts. Heimdall starts at 50K records and adapts based on response times, scaling between 50K and 200K.

API limit protection. Heimdall can monitor your org's API usage and stop the backup gracefully when a configurable percentage of the daily limit is reached. It uses an absolute threshold against the org's actual usage, so it's safe across restarts.

Period partitioning for GDPR and retention. Instead of "incremental forever," Heimdall partitions data by monthly periods (YYMM format). Need to purge data older than 12 months for GDPR? Delete the period. Clean, surgical, and fast.

RDS on-demand lifecycle. The PostgreSQL database only needs to run during backups and when someone's using the GUI — maybe 2–3 hours a day. Heimdall can automatically start the RDS instance before it's needed and stop it when done, dropping costs from roughly $50/month to $3–5/month.

S3-compatible storage. It's not locked to AWS S3. The README explicitly lists compatibility with MinIO and Backblaze B2.

What It Costs

Running Heimdall for a mid-size org on AWS:

  • S3 storage: A few dollars/month (auto-transitions to Infrequent Access after 30 days)
  • RDS PostgreSQL: $3–5/month on-demand
  • ECS Fargate: Pay per backup run, pennies per execution
  • Typical total: Roughly $5–15/month

Compare that to commercial solutions charging $3–5 per user per month.

How to Try It

Start with the README and prerequisites, then run locally first: github.com/devrandom-se/heimdall

If you want to see what Johan plans next, the roadmap includes pluggable storage backends (Azure Blob, GCS, local filesystem), retention policies with GDPR-compliant automatic cleanup, Salesforce OAuth login for the GUI, and a Lightning Web Component for viewing backup data directly in Salesforce: ROADMAP.md

The Bottom Line

Heimdall is not "yet another export script." It's a proper backup and recovery system with a UI, version history, file support, API limit protection, and serious cost control built in. The restore functionality is still maturing — deleted record restore buttons are stubs for now — but the backup, browsing, and diff capabilities are production-tested.

If your current plan is weekly exports and hope, or if your backup vendor bill is painful, Heimdall is worth a serious look.

Build better Salesforce systems. If you care about resilient architecture, clean automation, and real-world Salesforce strategy, explore more at consultantcloud.io.

By Ciarán Fitzgerald