gary.info

here be dragons

SQLite Wants One Writer But You Have Twenty

sqlite-writers.md

SQLite Wants One Writer But You Have Twenty: A Production Story

Everyone said SQLite can't handle concurrent writes. Everyone was wrong. They just weren't creative enough with their definition of "handle."

Here's how we've been running 50+ concurrent processes writing to a single SQLite database for 18 months—and why it's more reliable than the PostgreSQL cluster it replaced.

The Impossible Situation

We were building a distributed file scanner (props) that needed to catalog millions of files across multiple machines. The requirements were simple enough to make you weep:

  • Scan directories in parallel across multiple processes
  • Store results in a single, queryable database
  • Must work on developer laptops without setup
  • Zero data loss acceptable
  • What Everyone Said We Needed

    Architecture: PostgreSQL + connection pooler
    Cost: $340/month (managed DB)
    Complexity: 
      - Database server
      - Connection pooler (pgBouncer)
      - Connection retry logic
      - Transaction deadlock handling
    Time to implement: 2 weeks

    What We Had

    Budget: $0
    Timeline: "Yesterday"
    Team: Me and too much coffee
    Existing code: Single-threaded SQLite writer

    The Stupid Idea That Wasn't

    "What if we just... serialize the writes ourselves?"

    My colleague was joking. I wasn't laughing. I was thinking.

    First Attempt: Proof of Insanity

    // This shouldn't work but...
    func writeToSQLite(db *sql.DB, data chan Record) {
        for record := range data {
            db.Exec("INSERT INTO findings VALUES (?)", record)
        }
    }
    
    // Meanwhile, in 20 other processes...
    panic: database is locked

    Right. SQLite's write lock. The thing everyone warns you about.

    The "Wait, What?" Moment

    But then I remembered: SQLite doesn't hate multiple writers. It just wants them to take turns.

    What if we built a bouncer?

    Building the Production Monstrosity

    Version 1: The Redis Referee

    // In each scanner process
    func scanAndQueue(path string, redis *redis.Client) {
        filepath.Walk(path, func(p string, info os.FileInfo, err error) error {
            data := analyzeFile(p)
            redis.LPush(ctx, "findings", serialize(data))
            return nil
        })
    }
    
    // In ONE worker process
    func consumeFromRedis(redis *redis.Client, db *sql.DB) {
        for {
            data, _ := redis.BRPop(ctx, 0, "findings").Result()
            
            // The magic: only ONE process touches SQLite
            db.Exec("INSERT INTO findings VALUES (?)", data[1])
        }
    }

    Throughput on first test: 45,000 inserts/second

    Wait. That's... faster than our PostgreSQL benchmark?

    Version 2: The Auto-Promoting Dictator

    But Redis meant another dependency. What if we went even more insane?

    // The most democratic database pattern ever
    type WriterElection struct {
        isWriter  bool
        server    *http.Server
        writeChan chan Record
    }
    
    func (w *WriterElection) Start(db *sql.DB) error {
        // Try to become the writer
        listener, err := net.Listen("tcp", ":8745")
        if err != nil {
            // Someone else is already the writer
            w.isWriter = false
            return nil
        }
        
        // I am the chosen one!
        w.isWriter = true
        
        // Start the write loop
        go func() {
            for record := range w.writeChan {
                db.Exec("INSERT INTO findings VALUES (?)", record)
            }
        }()
        
        // Accept writes from peasant processes
        http.HandleFunc("/write", func(rw http.ResponseWriter, r *http.Request) {
            var record Record
            json.NewDecoder(r.Body).Decode(&record)
            w.writeChan <- record
        })
        
        w.server = &http.Server{Handler: http.DefaultServeMux}
        go w.server.Serve(listener)
        
        return nil
    }
    
    func (w *WriterElection) Write(record Record) error {
        if w.isWriter {
            w.writeChan <- record
            return nil
        }
        
        // Send to the writer process
        resp, err := http.Post("http://localhost:8745/write", 
            "application/json", 
            bytes.NewReader(serialize(record)))
        return err
    }

    The Numbers Don't Lie

    After 18 months in production scanning our entire infrastructure daily:

    Performance

  • Single-threaded SQLite: 8K records/sec
  • Our Redis hack: 45K records/sec
  • Our HTTP hack: 38K records/sec
  • PostgreSQL with pgBouncer: 12K records/sec
  • Reliability

  • PostgreSQL connection errors: ~20/day
  • Redis queue failures: 0
  • HTTP coordination failures: 0
  • SQLite corruption events: 0
  • Cost

  • PostgreSQL cluster: $340/month
  • Redis (already had it): $0/month
  • HTTP coordination: $0/month
  • Developer time saved: Priceless
  • Why This Actually Makes Sense

    Everyone focuses on SQLite's "limitation" but misses its superpower: deterministic single-writer behavior.

    The Hidden Advantage of Single-Writer Enforcement

    // With PostgreSQL, this requires careful transaction isolation
    BEGIN;
    SELECT MAX(id) FROM findings;
    INSERT INTO findings VALUES (?, ?);
    COMMIT;
    
    // With SQLite + our pattern, it's automatically safe
    db.Exec("INSERT INTO findings VALUES (?, ?)")
    // Only one writer exists. Period.

    What Distributed Databases Get Wrong

    They solve the wrong problem. You don't need multiple simultaneous writers. You need:

  • Multiple processes to generate data (✓ We have this)
  • High write throughput (✓ Sequential writes are FAST)
  • Consistency (✓ Single writer = no conflicts)
  • Our "hack" just makes explicit what distributed databases do implicitly: serialize writes.

    The Pattern: Embrace the Constraint

    This isn't just about SQLite. It's about recognizing that:

  • Constraints are features in disguise - Single writer "limitation" becomes consistency guarantee
  • Queues are everywhere - Redis lists, HTTP endpoints, Unix pipes, carrier pigeons
  • Simple beats complex - One writer is easier to reason about than distributed consensus
  • Other Places This Thinking Applies

  • Using file locks as distributed mutexes
  • Treating S3 as an eventually consistent database
  • Using DNS TXT records as a configuration store
  • Implementing Raft consensus with cron jobs (don't ask)
  • Try This In Your Architecture

    Before you provision that database cluster, ask:

  • Do I need multiple writers or just multiple write sources?
  • Can I serialize at the application layer instead?
  • What if the "limitation" is actually the solution?
  • The Single-Writer Checklist

  • [ ] Can you funnel writes through a single process?
  • [ ] Is your write volume actually sequential-speed friendly?
  • [ ] Are you solving coordination at the wrong layer?
  • [ ] Have you benchmarked the "wrong" approach?

Your Mission

# Find all the places you're fighting your database
$ grep -r "database is locked\|deadlock detected\|connection pool" .

# Each match is a place where single-writer might win

The Confession

Is using HTTP as a write coordination layer a best practice? Hell no.

Has it processed 1.2 billion file records with zero data loss? Hell yes.

Would I use PostgreSQL if starting fresh? Looks at zero dependencies. Looks at zero ops burden. Not a chance.

The dirty secret? Most "concurrent" write workloads aren't. They're just sequential writes with extra steps and race conditions.


What database "limitations" are you fighting instead of embracing? When did adding complexity make things worse instead of better?

Next week: How we replaced Kubernetes with systemd and a bash script that's smarter than it should be.