How Was AWS S3 Built? preview image

This video, through a conversation with AWS VP of Data and Analytics Mai-Lan Tomsen Bukovec, deeply explores the massive scale, design principles, and constantly evolving technology behind AWS S3 — the world's largest cloud storage service. From S3's staggering scale of 500+ trillion objects and hundreds of exabytes of data, processing over a quadrillion requests annually, to the engineering challenges of transitioning from eventual consistency to strong consistency, and advanced concepts like formal methods and failure allowances. It also reveals how S3 evolves with new data primitives like Tables and Vectors to meet customer needs, all managed through the core principle of simplicity.


1. S3's Overwhelming Scale and Origins

Mai-Lan began by describing S3's current scale. S3 currently holds over 500 trillion objects and hundreds of exabytes of data, processing hundreds of millions of transactions per second. It processes over a quadrillion requests annually. Underlying S3 are tens of millions of hard drives and millions of servers distributed across 120 availability zones in 38 regions. Mai-Lan illustrated the scale by saying if you stacked all the drives, they'd reach the International Space Station and nearly come back.

S3 development began in 2005 and launched in 2006 as the first AWS service. Amazon engineers needed a space to store unstructured data like PDFs, images, and backups at economical costs without worrying about storage growth. S3 was designed from the start with eventual consistency — when data was stored, the system confirmed it had the data, but it might not appear in listings immediately.

2. The Rise of Data Warehouses and S3's Evolution

With the Apache Hadoop community starting in 2006, frontier data customers like Netflix and Pinterest combined Hadoop with S3's economics, unlimited storage, and good performance to build data lakes, expanding S3 beyond unstructured data to tabular data. Around 2020, the growth of Parquet files was notable, as customers began managing Parquet data directly on S3.

In 2019-2020, Iceberg rose to prominence as an open-source data format for large-scale analytics workflows, enabling decentralized analytics architectures. AWS responded by launching S3 Tables in December 2024, and previewing S3 Vectors in July 2025, with general availability the following week (as of January 2026).

3. S3's Core Architecture and the Secret to Low Pricing

S3's developer experience was built around simplicity from the start. S3's fundamental operations are put (store data) and get (retrieve data), and performing these two operations efficiently at massive scale is S3's essence. Over time, APIs expanded to include delete, list, copy, and recently conditional put and conditional delete operations.

S3 launched at a groundbreaking 15 cents per GB per month in 2006, now down to 2-3 cents through continuous price reductions. This is achieved through:

  • Continuous price reductions on storage and feature costs
  • Total Cost of Ownership (TCO) management through tiering and archiving
  • Intelligent-Tiering (launched 2018): Automatically applies up to 40% discounts when data isn't accessed for a month

AWS maximizes efficiency from hardware through the bottom of the software stack. Amazon Glacier stores data at just 1 cent per GB per month for data that doesn't need immediate access.

4. From Eventual Consistency to Strong Consistency and Formal Verification

S3 initially provided eventual consistency, prioritizing high availability. But as customers used S3 for data lakes, demand for strong consistency grew. To implement this, S3 developed a replicated journal — a new distributed data structure that links nodes sequentially so write operations flow through them in order.

Remarkably, AWS applied strong consistency to all S3 requests at no additional cost. Mai-Lan emphasized that while "there's nothing free in engineering," making it available as a core building block without cost concerns was important.

Equally important was knowing the consistency was correct. At S3's scale, manually verifying all edge cases is impossible, so S3 uses automated reasoningformal methods. This is applied to:

  • Consistency verification: Building proofs that the strong consistency model is correct
  • Cross Region Replication verification: Proving data replication succeeded
  • API correctness verification: Proving correct API behavior

"At S3 scale, we couldn't just say we provide strong consistency. We didn't actually know we provided strong consistency. That's exactly why we used automated reasoning."

5. Durability, Availability, and Failure Management at Scale

S3 promises an astounding eleven 9s (99.999999999%) durability — meaning 1 object out of 10 million might be lost once in 10,000 years. Over 200 microservices and auditor systems continuously inspect every byte, with repair systems that automatically activate when needed.

S3 treats server failure as a routine event. Mai-Lan noted servers are failing even during their conversation — the question isn't "when will it fail" but "how often."

Correlated failure — multiple nodes failing simultaneously from the same cause — is a serious threat. S3 mitigates this through data replication across availability zones and physical infrastructure distribution. Crash consistency ensures the system always recovers to a consistent state after any failure.

6. S3's Evolution and New Data Primitives: S3 Vectors

S3 continues evolving as a "living, breathing organism." S3 Tables enables native storage and management of tabular data like Parquet files on S3, queryable with SQL.

"SQL is the lingua franca of data, and LLMs worldwide have been trained on decades of SQL. So now you can interact with data in S3 through SQL."

S3 Vectors is S3's newest data primitive, designed for storing and retrieving embeddings data that's exploding in the AI era. To overcome the expense of finding nearest neighbors in high-dimensional vector space, S3 introduced vector neighborhoods — pre-clustering similar vectors asynchronously offline. This achieves sub-100ms query times while storing vectors on the massive S3 fleet rather than in memory.

S3 Vectors supports up to 2 billion vectors per index and up to 20 trillion vectors per S3 vector bucket.

7. Design Principles: Simplicity and Technical Fearlessness

S3 engineers work in the tension between Amazon's engineering creed of "respect what came before" and "be technically fearless." The core principle managing this complex system is simplicity:

  • User model simplicity: S3 API simplicity, SQL access to S3 data, understanding data through AI embedding models
  • Internal implementation simplicity: Each of the many microservices does one or two things very well, and every engineering meeting asks "What's the simplest possible way to implement this?"

Mai-Lan advised mid-career software engineers to have "relentless curiosity" — because working in large-scale distributed systems like S3 isn't about working within defined lines but redrawing them as needed.

8. Conclusion

AWS S3 goes beyond simple cloud storage — it's a living system that stores over 500 trillion objects and hundreds of exabytes while constantly evolving. Under the principle "scale is to your advantage," S3 maximizes efficiency across every layer from hardware to the bottom of the software stack, maintaining low prices while overcoming engineering challenges like the transition from eventual to strong consistency. Through formal methods, correlated failure planning, crash consistency, and new data primitives like S3 Tables and S3 Vectors, S3 is establishing itself as a revolutionary platform enabling semantic understanding and utilization of data — all guided by simplicity as a core value and engineers with a deep sense of ownership and commitment to customer data.

Related writing

Related writing