This presentation explains how Netflix chose and evolved Java as its main backend technology as of 2025, and how it has been rapidly adapting to recent changes in the Java ecosystem. It covers real-world experience and key case studies: Java version upgrades, large-scale microservice management, cutting-edge garbage collectors, the Spring Boot transition, and the evolution from REST to GraphQL and gRPC -- capturing all the practical know-how of operating services at massive scale.
1. Introduction and Netflix's Pragmatic Java Perspective
Presenter Paul Bakker begins by noting that he has covered the topic of how Netflix uses Java multiple times over the past several years. However, because the architecture, technology, and organization keep changing every year, he emphasizes that "even if you've seen a talk with the same title before, this time it will be completely different."
"Every year, the architecture keeps changing, the technology we use keeps changing, and we keep learning new things."
He mentions that Netflix's use of Java recently became an interesting topic on social media, with several misconceptions arising. Most notably, in response to questions about Oracle licensing costs:
"We don't pay Oracle a single penny in licensing fees. We use OpenJDK."
This answer drew laughter. And to those asking why not Rust:
"Many people asked, 'Why not Rust?' But there are several reasons why Java is more suitable for us."
He adds that technology choices are not simply a matter of trends or image.
2. Netflix's Service Architecture and Backend Structure
Streaming and Enterprise: Two Representative Scenarios
Netflix is not just a streaming app. The areas where Java is primarily used on the backend can be divided into two categories:
a. Streaming Services
- They must handle an enormous number of requests per second (high RPS)
- "With millions of users connected simultaneously, the backend must always withstand very high traffic."
- Because it's a global service, it's distributed across 4 Amazon regions to ensure low-latency delivery.
- Most failure scenarios are handled through retries, and even if some data is missing, the overall user experience is designed to remain unaffected.
- For performance reasons, relational databases are rarely used; instead, in-memory data stores and distributed caches are utilized.
b. Enterprise/Studio (Internal/Production Support) Apps
- There are numerous business-critical apps that manage film production personnel, equipment, locations, and schedules.
- Traffic is relatively very low, and most can operate in a single region.
- "In these apps, not a single piece of data can be lost." Transactional stability takes priority over retries.
- Relational databases are primarily used.
Common GraphQL-Based Backend Architecture
Interestingly, whether streaming or enterprise, the backend service architecture is nearly identical.
- An API Gateway sits at the front, receiving GraphQL queries.
- This API Gateway implements GraphQL Federation through integration with multiple DGS (Domain Graph Services).
- DGS applications are Java applications written in Spring Boot, and internally they fan out further to lower-level services (gRPC and various data stores, etc.).
"All key backend services are unified on DGS (Spring Boot + DGS framework) and GraphQL."
gRPC is primarily used for internal service-to-service communication (Java to Java) due to its high-performance advantages.
3. Open Connect, Java's Role, and the Multi-Language Environment
Netflix's real-time video streaming (direct streaming itself) is handled by a separate Open Connect infrastructure.
- Open Connect servers (dedicated boxes) are placed with ISPs worldwide, making streaming traffic start from a network-proximity standpoint as close as possible.
- The management software for this area is also mostly written in Java.
- Media encoding pipelines, real-time data processing, and some databases are also Java-based.
That said, some platform and machine learning components developed in Go, Python, and other languages are freely utilized.
"Of course we also use Go, Python, and various other languages, but across the backend as a whole, Java is overwhelmingly dominant."
4. The JDK and Framework Upgrade Journey: From JDK 8 to 21/23
Just a few years ago, Netflix honestly admits that all its Java services were stuck on very old JDK 8 and outdated, internally-built frameworks.
"At the time, even though we wanted to move to the latest JDK, we couldn't muster the courage because of old internal libraries and incompatible external libraries."
So they undertook a company-wide overhaul:
- Directly patching abandoned/incompatible libraries
- "When we actually got into it, there was very little to fix. It looks complex, but when you just do it, it's manageable."
- Converting all service frameworks to Spring Boot
- Including internally-developed automated code conversion tooling
- As a result, approximately 3,000 Java services were fully migrated
Currently, nearly all services run on Spring Boot + JDK 17 or higher. Key high-traffic services are on JDK 20/21/23, actively leveraging the new GC (Generational ZGC) and new features.
5. JDK Upgrade Results and Real-World GC and Virtual Thread Experience
JDK 17: 20% G1 GC Performance Boost!
The biggest change from the JDK 8 -> JDK 17 upgrade was the dramatic performance improvement of the G1 GC.
"Just moving to JDK 17 reduced GC CPU consumption by 20%. It was the biggest performance gain for the least effort."
JDK 21: Generational ZGC and Dramatic Error Reduction
- The existing ZGC was non-generational, which caused adverse effects with large heaps.
- But JDK 21's Generational ZGC delivered "overwhelmingly good results across all services!"
- The phenomenon of GC stop-the-world pauses lasting 1-1.5 seconds completely disappeared, and service errors/timeouts were noticeably reduced.
"Service stalls of over 1 second due to GC pauses, which caused lots of retries, immediately dropped to zero as soon as we switched to ZGC."
Virtual Threads: Changing the Game for Complex Concurrency
- In GraphQL and similar scenarios where lightweight parallel processing is essential, the old approach of manually specifying Executors or CompletableFutures was cumbersome.
- Virtual Threads enabled default parallel processing, dramatically reducing complexity and debugging difficulty.
"Without developers having to worry about it, slow DB calls run in parallel automatically, and as a result, response times were dramatically reduced."
Proceed with Caution! Real-World Debugging Experience
- Mixing Virtual Threads with synchronized/reentrant locks led to critical deadlocks through JDK 23.
- The issue was resolved in JDK 24 (issue 491), and full-scale adoption is planned to resume.
6. Spring Boot, Internal Platform, and Tooling Sophistication
Netflix's Spring Boot Strategy
- Spring Boot + custom modules (authentication/authorization, service mesh, gRPC, Observability, Dynamic Config, etc.)
- "The actual development experience is almost indistinguishable from open-source Spring Boot, and most extensions are handled through AutoConfiguration, just like Spring Data."
- Deployment uses AWS instances or containers (Titus, Netflix's own K8s platform), all with Exploded Jar + Embedded Tomcat architecture
Native Image (Build-Time Compilation): Approached Cautiously
- "We haven't adopted native images yet due to degraded build/debugging environments, developer experience issues, and stability concerns."
- Instead, they're closely watching next-generation technologies like Project Leiden and Spring's AOT
Moving Away from Reactive; WebFlux Is Discouraged
- The company was once an evangelist for RxJava-based Reactive, but code complexity and debugging pain became too great, so they've now fully standardized on WebMVC.
"Right now, nobody wants to touch Reactive code. Virtual Threads and Structured Concurrency will ultimately replace Reactive."
Spring Boot 3 Upgrade and the Jakarta Namespace Issue
- "Most of it is just find/replace, but libraries are more complex, so Netflix uses a Gradle Transform plugin to handle automatic conversion at the bytecode level."
- The related tooling has also been open-sourced for community use!
7. GraphQL & DGS Framework: Practical Large-Scale Adoption and Collaboration
- Netflix open-sourced the DGS framework (released 2020) for internal GraphQL usage
- In the Java ecosystem, GraphQL Java is very low-level, so the DGS framework provides a much higher-productivity model built on Spring Boot
- Significant investment in testing tools, annotation-based programming models, and other development/operational conveniences
"We're collaborating with Spring for GraphQL so that both Netflix DGS and the official Spring GraphQL can be used harmoniously together."
8. GraphQL/gRPC Instead of REST!
Paul Bakker clearly states that at this point, GraphQL is optimal for UI-to-backend communication and gRPC is optimal for service-to-service communication.
- GraphQL:
- "Flexible API schema tailored to the UI is essential."
- "Because the UI and backend share the same schema, collaboration efficiency is high."
- gRPC:
- Ideal for high-performance, strongly-typed, method-based remote calls between servers
- REST is no longer recommended!
- "With REST, you end up dumping data indiscriminately and shifting the complex filtering responsibility to UI developers."
- "Only OK for limited rapid prototyping."
In Closing
Netflix was once trapped in technical debt, but through aggressive legacy modernization, company-wide upgrades, and pioneering experience applying the latest Java and Spring paradigms at massive scale, they've built substantial expertise. In particular, the latest JDK GC, Virtual Threads, and GraphQL adoption are key points where tangible performance improvements and developer experience were achieved simultaneously. Paul Bakker's talk isn't a simple introduction to technology trends -- it vividly conveys how a real organization solves various real-world problems and how it pragmatically embraces changes in the technology ecosystem. This was a presentation that showed Netflix has completely shed the prejudice that Java is "slow and heavy"!
"If you have any questions, feel free to find me in the hallway today!"
