JDK 21下个月就要发布了,Virtual Threads也正式成为发布特性。虚线程的引入,大概从此会改变Java项目的架构。

  1. 语言中线程的实现
    1. 基于内核线程实现:1:1
    2. 实现在用户空间:N:1
    3. 混合模式:N:M
  2. 虚线程
    1. Java的最小并发单元
    2. thread per request
    3. 异步
      1. 异步框架的本质
    4. 虚线程:继续thread per request
      1. 虚线程的底层实现
    5. 返祖
    6. 性能
      1. 什么时候不应该用虚线程
  • 线程实现在用户空间,也称绿色线程。和操作系统线程是N:1的映射关系;
  • 线程实现在内核空间。和操作系统线程一一映射;
  • 用户空间也可以实现线程;





  • 不需要关心自己调度线程,有os控制;


  • 效率低:所有现成的操作(创建、同步、销毁)都需要进行系统调用;


  1. 由内核进行线程调度成本比较高:因为线程的切换需要保护和恢复现场,涉及到寄存器、缓存的保存等;
  2. 每一个线程都是一个内核线程,都要在内核里开辟空间(16KB)。所以线程不能开太多,容易耗尽内核的内存资源;

64bit linux上,HotSpot创建一个线程,还要默认分配1MB的栈容量。





  • 线程切换很快,无需经过操作系统;
  • 不管操作系统支不支持线程,都可以这么搞;
  • 支持更大的线程数量;


  • 没有操作系统支持。所有的线程操作都要在用户空间实现。对于一门语言来说,实现起来比较复杂;




所以,虽然用户空间的线程被称为green thread,但是未必高效。Java1.1就基于用户空间实现了线程,但是和os线程比,上面说的使用资源的限制是无法被突破的,所以后来被抛弃了,主要原因就是在多核处理器上表现不行

The first advantage is performance on multiprocessor (MP) machines. In green threads all Java threads execute within one operating system lightweight process (LWP), and thus UnixWare has no ability to distribute the execution of Java threads among the extra processors in an MP machine. But in the native threads model, each Java thread is mapped to a UnixWare threads library multiplexed thread, and the threads library will indeed map those threads to different LWPs as they are available. Furthermore, under native threads the Java virtual machine will expand the number of LWPs available to the threads library, one for each additional processor in the MP configuration.






这里直接拿JEP 444来介绍了,上面的motivation写的非常漂亮!



Every statement in every method is executed inside a thread and, since Java is multithreaded, multiple threads of execution happen at once. The thread is Java’s unit of concurrency: a piece of sequential code that runs concurrently with — and largely independently of — other such units. Each thread provides a stack to store local variables and coordinate method calls, as well as context when things go wrong: Exceptions are thrown and caught by methods in the same thread, so developers can use a thread’s stack trace to find out what happened. Threads are also a central concept for tools: Debuggers step through the statements in a thread’s methods, and profilers visualize the behavior of multiple threads to help understand their performance.


thread per request


Server applications generally handle concurrent user requests that are independent of each other, so it makes sense for an application to handle a request by dedicating a thread to that request for its entire duration. This thread-per-request style is easy to understand, easy to program, and easy to debug and profile because it uses the platform’s unit of concurrency to represent the application’s unit of concurrency.


Unfortunately, the number of available threads is limited because the JDK implements threads as wrappers around operating system (OS) threads. OS threads are costly, so we cannot have too many of them, which makes the implementation ill-suited to the thread-per-request style. If each request consumes a thread, and thus an OS thread, for its duration, then the number of threads often becomes the limiting factor long before other resources, such as CPU or network connections, are exhausted.


The JDK’s current implementation of threads caps the application’s throughput to a level well below what the hardware can support.



开发者被逼无奈,为了充分利用硬件资源,只得放弃了thread per request的编程风格,开始搞线程池,试图共享线程——在计算的时候使用线程,在等待io的时候释放线程:

Some developers wishing to utilize hardware to its fullest have given up the thread-per-request style in favor of a thread-sharing style. Instead of handling a request on one thread from start to finish, request-handling code returns its thread to a pool when it waits for another I/O operation to complete so that the thread can service other requests. This fine-grained sharing of threads — in which code holds on to a thread only while it performs calculations, not while it waits for I/O — allows a high number of concurrent operations without consuming a high number of threads.

资源利用率确实上来了,血压也上来了,写代码必须用异步风格了,还要引入一套non blocking io方法,最后还要用callback来处理结果:

While it removes the limitation on throughput imposed by the scarcity of OS threads, it comes at a high price: It requires what is known as an asynchronous programming style, employing a separate set of I/O methods that do not wait for I/O operations to complete but rather, later on, signal their completion to a callback.



Without a dedicated thread, developers must break down their request-handling logic into small stages, typically written as lambda expressions, and then compose them into a sequential pipeline with an API (see CompletableFuture, for example, or so-called “reactive” frameworks). They thus forsake the language’s basic sequential composition operators, such as loops and try/catch blocks.


In the asynchronous style, each stage of a request might execute on a different thread, and every thread runs stages belonging to different requests in an interleaved fashion. This has deep implications for understanding program behavior: Stack traces provide no usable context, debuggers cannot step through request-handling logic, and profilers cannot associate an operation’s cost with its caller. Composing lambda expressions is manageable when using Java’s stream API to process data in a short pipeline but problematic when all of the request-handling code in an application must be written in this way.


This programming style is at odds with the Java Platform because the application’s unit of concurrency — the asynchronous pipeline — is no longer the platform’s unit of concurrency.


异步框架如jdk里的CompletableFuture,或者Reactive programming。

为了避免阻塞代码block cpu,程序猿将代码分割成小块,每一块代码都有输入和输出,每一块代码都被写成了lambda,然后使用异步框架组装他们。异步框架的功能就是

  1. 使用正确的输入调用lambda
  2. 得到输出
  3. 把输出作为下一个lambda的输入


异步编程能够极大提高CPU使用率,缺点就是到处都是lambda,到处都在调用lambda,并把结果转发给另一个lambda。如果看代码的stack trace,几乎看不到业务代码,只能看到框架在调用lambda。

这种风格会导致debug、异常处理、测试、维护特别蛋疼。如果错误不在业务代码内,而是因为业务返回了一个null,并在框架代码里出发了一个NPE,炸了……stack trace啥也看不出来,不知道null来自哪儿。

虚线程:继续thread per request

要解决问题,还是得让程序员写thread per request风格的代码!而这一切也很简单,只要jdk实现的thread能高效一些就行了!

To enable applications to scale while remaining harmonious with the platform, we should strive to preserve the thread-per-request style. We can do this by implementing threads more efficiently, so they can be more plentiful.


Operating systems cannot implement OS threads more efficiently because different languages and runtimes use the thread stack in different ways. It is possible, however, for a Java runtime to implement Java threads in a way that severs their one-to-one correspondence to OS threads. Just as operating systems give the illusion of plentiful memory by mapping a large virtual address space to a limited amount of physical RAM, a Java runtime can give the illusion of plentiful threads by mapping a large number of virtual threads to a small number of OS threads.


A virtual thread is an instance of java.lang.Thread that is not tied to a particular OS thread. A platform thread, by contrast, is an instance of java.lang.Thread implemented in the traditional way, as a thin wrapper around an OS thread.

thread per request风格的程序可以跑在虚线程上(每个请求绑定一个虚线程),但是虚线程只在计算的时候才消耗cpu。其结果就是这样的代跑起来,性能和异步风格的代码一样!这里面的差异只有jdk知道:当程序使用blocking io的时候,jdk自动给它映射为os层面的non blocking操作,并自动挂起虚线程

Application code in the thread-per-request style can run in a virtual thread for the entire duration of a request, but the virtual thread consumes an OS thread only while it performs calculations on the CPU. The result is the same scalability as the asynchronous style, except it is achieved transparently: When code running in a virtual thread calls a blocking I/O operation in the java.* API, the runtime performs a non-blocking OS call and automatically suspends the virtual thread until it can be resumed later.


To Java developers, virtual threads are simply threads that are cheap to create and almost infinitely plentiful. Hardware utilization is close to optimal, allowing a high level of concurrency and, as a result, high throughput, while the application remains harmonious with the multithreaded design of the Java Platform and its tooling.


Virtual threads are cheap and plentiful, and thus should never be pooled: A new virtual thread should be created for every application task. Most virtual threads will thus be short-lived and have shallow call stacks, performing as little as a single HTTP client call or a single JDBC query. Platform threads, by contrast, are heavyweight and expensive, and thus often must be pooled. They tend to be long-lived, have deep call stacks, and be shared among many tasks.



本质上还是需要os线程执行任务,所以jdk维护了一个修改过的fork join pool,里面都os线程,作为虚线程的执行者。虚拟线程就像mount到线程池里的os线程上一样,任务还是通过虚拟线程,最终由这些os线程执行的。如果让虚线程打印Thread.currentThread(),会得到ForkJoinPool-1-worker-1,说明虚线程mount到了fork join pool 1的worker1上。这个pool创建的并不大,线程数和CPU核数一致。


当碰到blocking操作,虚线程会从os线程上把自己unmount下来(Contination#yield),把自己的stack保存到heap里。JDK里所有的blocking代码现在都会在block的时候调用Contination#yield。当取到数据之后,jdk也有一个handler会监视这些数据,并触发一个信号,调用Continuation#run,重新恢复虚线程的上下文,并把虚线程放到fork join pool里的os线程的wait列表里。

monitor handler是框架里的核心调度者。



    public static void sleep(long millis) throws InterruptedException {
        if (millis < 0) {
            throw new IllegalArgumentException("timeout value is negative");

        long nanos = MILLISECONDS.toNanos(millis);
        ThreadSleepEvent event = beforeSleep(nanos);
        try {
            if (currentThread() instanceof VirtualThread vthread) {
            } else {
        } finally {


  • 如果是虚线程,则jvm自己控制虚线程“挂起”,也就是umount;
  • 如果是os线程,则像之前的jdk一样调用native代码(sleep0)由操作系统挂起os线程;



Using virtual threads does not require learning new concepts, though it may require unlearning habits developed to cope with today’s high cost of threads. Virtual threads will not only help application developers — they will also help framework designers provide easy-to-use APIs that are compatible with the platform’s design without compromising on scalability.



To put it another way, virtual threads can significantly improve application throughput when

  • The number of concurrent tasks is high (more than a few thousand), and
  • The workload is not CPU-bound, since having many more threads than processor cores cannot improve throughput in that case.

线程能跑的,虚线程都能跑,还能支持thread local变量:

A virtual thread can run any code that a platform thread can run. In particular, virtual threads support thread-local variables and thread interruption, just like platform threads. This means that existing Java code that processes requests will easily run in a virtual thread. Many server frameworks will choose to do this automatically, starting a new virtual thread for every incoming request and running the application’s business logic in it.





我哭了!就等下个月19号java 21发布了!发了我就测性能!

本文由作者按照 CC BY 4.0 进行授权