note №.006 · 2026 · 04 · 289 min - or two cups of filter coffee

Search, on top of
object storage.

Where the math actually works out, what you trade for the cheaper bill, and the strange small joy of a query plan that ends in s3.GetObject.

For most of the last fifteen years, the rule was simple: if you wanted search, you ran Elasticsearch, or one of its descendants, on a fleet of beefy machines with fast local disks. The index lived next to the CPU. Queries were fast. Bills were high. Nobody loved the bills.

Then somebody noticed that S3 had quietly become absurdly cheap, and the rest of object storage was getting that way too. Quickwit landed. Tantivy got serious. People started asking an obvious-in-retrospect question: what if the index just lived in S3?

The shape of the idea.

The compromise looks like this. Keep the index in object storage. Keep the query engine stateless. Pay only for the queries you actually run. What you give up: sub-millisecond p99 latency, a bit of operator comfort. What you get back: storage at $0.023/GB/month instead of NVMe-attached EBS at $0.10/GB/month, independent scaling, painless backups, and a system you can mostly reason about on one whiteboard.

Show me the math.

For a real workload I've watched closely, about 800GB of indexed logs, 50 queries/second at peak, hot but not blazing, the bill shapes look roughly like this:

# old setup: Elasticsearch, NVMe disks, 3x replication
3x r6i.2xlarge        $360/mo   compute
800GB x 3 replicas    $240/mo   storage
total                 $600/mo

# new setup: Quickwit-style, object storage
1x m6i.large          $60/mo
800GB x 1 in S3 std   $18/mo
10M GetObject/mo      $400/mo   <- the interesting line
total                 $478/mo

Object storage isn't free; requests are. If your queries are tight and your splits are well-designed, you stay comfortably under a million requests a day. If they aren't, you can light an embarrassing amount of money on fire by GetObject-ing the same byte ranges repeatedly[1].

What "designed your splits well" actually means.

A naive implementation does this:

async fn search(query: Query) -> Results {
    let segments = list_segments().await?;       // bad: list on every query
    let hits = futures::future::join_all(
        segments.iter().map(|s| s.search(query)) // bad: probes every segment
    ).await;
    aggregate(hits)
}

A better one looks more like this:

async fn search(query: Query) -> Results {
    // 1. consult a small in-memory metadata layer
    let candidates = metastore.candidate_splits(&query);

    // 2. fan out only to candidates, using range reads
    let hits = futures::future::join_all(
        candidates.iter().map(|s| {
            s.search_with_range_reads(query)
        })
    ).await;

    aggregate(hits)
}

Two changes. A tiny metastore that prunes splits before any object-storage call. And range reads instead of whole-object reads: a 4MB read from a 1GB split costs the same as a 4MB read from a 4MB file. The size of the split doesn't matter; the size of what you actually fetch does.

The index doesn't have to live near the CPU. It just has to be reachable, cheap, and addressable in pieces.

Where it falls apart.

  • Latency-critical search (autocomplete, type-ahead, anything user-facing under 50ms). Don't. Keep a hot in-memory layer; let object storage do the long tail.
  • Tiny, high-frequency queries. A 5ms query plus 80ms of network is now an 85ms query. The math doesn't work for a UI search box.
  • Vector search, sort of. The architectural shape is right; the latencies aren't quite there without a warm cache, but the gap is closing fast.
  • Cold-start fan-out. A query against 500 cold splits is 500 independent S3 reads. Plan your warm cache, or watch your p99 spike at 9am every Monday.

What I keep coming back to.

The moment that made me a believer was watching a 300GB analytics query plan resolve in a meeting room. The whole plan fit on one screen. The plan ended in s3.GetObject. That was it. No mystery cluster. No "is the hot node hot?" No standby replica nobody could remember the password to.

For a certain shape of workload: large, mostly-cold, query-on-demand, this is just better. The math is better. The operations are better. The mental model is better. It's not for everything. But it's for a lot more than people currently use it for, and I think the next two years prove that out.

Future notes: warm caching layers, vector search on this same architecture, and a longer field report on the metastore tradeoffs. If you've built one of these and disagree with any of the above, please write me.

- end of note -
  1. I once watched a team triple their bill in a week by un-tuning a result cache. Object storage doesn't make you faster; it just makes the price of slow obvious.
  2. All numbers are from a real 2026 workload. AWS pricing changes; do your own math; the shape will be similar.
filed under →infrastructuresearchbuilding
↬ read next:

The interesting boundary around the model.

Most "AI product" work is really tool design, approval-flow design, and observability. The model just sits in the middle, doing the easy part. A short essay on what actually makes these systems good.

continue →