{"id":195996,"date":"2025-12-17T06:19:07","date_gmt":"2025-12-17T06:19:07","guid":{"rendered":"https:\/\/www.newsbeep.com\/ie\/195996\/"},"modified":"2025-12-17T06:19:07","modified_gmt":"2025-12-17T06:19:07","slug":"tornadovm-2-0-brings-automatic-gpu-acceleration-and-llm-support-to-java","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ie\/195996\/","title":{"rendered":"TornadoVM 2.0 Brings Automatic GPU Acceleration and LLM support to Java"},"content":{"rendered":"<p>The TornadoVM project recently reached version 2.0, a major milestone for the <a href=\"https:\/\/github.com\/beehive-lab\/TornadoVM\" rel=\"nofollow noopener\" target=\"_blank\">open-source project<\/a>\u00a0that aims to provide a heterogeneous hardware runtime for Java. This release is likely to be of particular interest to teams developing LLM solutions on the JVM.<\/p>\n<p>The project automatically accelerates Java programs on multi-core CPUs, GPUs, and FPGAs. It does not replace existing JVMs, but instead adds the capability of offloading Java code to the backends, handling memory management between Java and hardware accelerators, and running the compute-kernels. This capability provides a key component of modern cloud and ML workloads.<\/p>\n<p>InfoQ has previously covered the project <a href=\"https:\/\/www.infoq.com\/articles\/tornadovm-java-gpu-fpga\/\" rel=\"nofollow noopener\" target=\"_blank\">in 2020<\/a>\u00a0and <a href=\"https:\/\/www.infoq.com\/articles\/java-performance-tornadovm\/\" rel=\"nofollow noopener\" target=\"_blank\">2022<\/a>.<\/p>\n<p>TornadoVM compiles Java bytecode at runtime (by acting as a JIT compiler) to one of three backends: OpenCL C, NVIDIA CUDA PTX, and SPIR-V binary. Developers can choose which backends to install and run depending on their specific systems.<\/p>\n<p>Note that not every sort of Java computation is amenable to being offloaded to TornadoVM. For example, workloads with for-loops that do not have dependencies between iterations are very good candidates, as these allow computation in parallel.<\/p>\n<p>In particular, matrix-based applications such as machine learning and deep learning are good candidates. Other good examples of this pattern are physics simulations (e.g., N-body particle computation), financial applications such as Black-Scholes, and a range of applications in computer vision, computational photography, natural language processing, and signal processing.<\/p>\n<p>TornadoVM offers two complementary ways to express parallelism: the Loop Parallel API, which uses Java annotations such as @Parallel and @Reduce to parallelize loops, and the Kernel API, which uses a KernelContext for explicit GPU-style programming (with concepts such as thread IDs, local memory, barriers available), and which is similar to CUDA\/OpenCL\/SYCL.<\/p>\n<p>The Loop Parallel API can be as simple as adding a type annotation:<\/p>\n<p>&#13;<br \/>\npublic static void vectorMul(FloatArray a, FloatArray b, FloatArray result) {&#13;<br \/>\n    for (@Parallel int i = 0; i &lt; result.getSize(); i++) {&#13;<br \/>\n        result.set(i, a.get(i) * b.get(i));&#13;<br \/>\n    }&#13;<br \/>\n}<\/p>\n<p>Whereas the Kernel Context style explicitly builds a TaskGraph as a Java object, like this:<\/p>\n<p>&#13;<br \/>\nvar taskGraph = new TaskGraph(&#8220;multiply&#8221;)&#13;<br \/>\n\u00a0 \u00a0 \u00a0 .transferToDevice(DataTransferMode.FIRST_EXECUTION, a, b)&#13;<br \/>\n\u00a0 \u00a0 \u00a0 .task(&#8220;vectorMul&#8221;, Example::vectorMul, a, b, result)&#13;<br \/>\n\u00a0 \u00a0 \u00a0 .transferToHost(DataTransferMode.EVERY_EXECUTION, result);&#13;<br \/>\n&#13;<br \/>\nvar snapshot = taskGraph.snapshot();&#13;<br \/>\nnew TornadoExecutionPlan(snapshot).execute();<\/p>\n<p>The team is also shipping a\u00a0<a href=\"https:\/\/github.com\/beehive-lab\/GPULlama3.java\" target=\"_blank\" rel=\"nofollow noopener\">complete LLM inference library<\/a>\u00a0built with it in pure Java that provides LLM inference on GPUs, all in Java without external dependencies.<\/p>\n<p>The just-shipped release v0.3.0 of GPULlama3.java brings significant performance and usability improvements.<\/p>\n<p>&#13;<br \/>\n\t~30% performance boost on NVIDIA GPUs (tokens\/sec)&#13;<br \/>\n\tOptimized FP16 and Q8 kernel generation.&#13;<br \/>\n\tEasier setup thanks to the new TornadoVM SDKs &#8212; no complex GPU configuration.&#13;<br \/>\n\tRun across NVIDIA PTX, OpenCL, and early Apple Silicon support.&#13;<br \/>\n\tEnhanced <a href=\"https:\/\/docs.quarkiverse.io\/quarkus-langchain4j\/dev\/gpullama3-chat-model.html\" rel=\"nofollow noopener\" target=\"_blank\">Quarkus support<\/a>&#13;<br \/>\n\t<a href=\"https:\/\/docs.langchain4j.dev\/integrations\/language-models\/gpullama3-java\" rel=\"nofollow noopener\" target=\"_blank\">Integration with LangChain4j<\/a>&#13;<\/p>\n<p>GPULlama3.java currently supports several FP16 (16-bit floating point) and 8-bit quantized models, in the single-digit billions of parameters range:<\/p>\n<p>&#13;<br \/>\n\tLlama 3.2 (1B) \u2013 FP16&#13;<br \/>\n\tLlama 3.2 (3B) \u2013 FP16&#13;<br \/>\n\tLlama 3 (8B) \u2013 FP16&#13;<br \/>\n\tMistral (7B) \u2013 FP16&#13;<br \/>\n\tQwen3 (0.6B) \u2013 FP16&#13;<br \/>\n\tQwen3 (1.7B) \u2013 FP16&#13;<br \/>\n\tQwen3 (4B) \u2013 FP16&#13;<br \/>\n\tQwen3 (8B) \u2013 FP16&#13;<br \/>\n\tPhi-3-mini-4k \u2013 FP16&#13;<br \/>\n\tQwen2.5 (0.5B)&#13;<br \/>\n\tQwen2.5 (1.5B)&#13;<br \/>\n\tDeepSeek-R1-Distill-Qwen (1.5B)&#13;<\/p>\n<p>Depending on the selected model, a different execution plan will be built, corresponding to the relevant model architecture.<\/p>\n<p>The project is led by the Beehive lab, which is part of the Advanced Processor Technologies Group at the University of Manchester, specializing in the codesign of combined hardware \/ software solutions.<\/p>\n<p>The team has also developed TornadoInsight, <a href=\"https:\/\/github.com\/beehive-lab\/tornado-insight\" rel=\"nofollow noopener\" target=\"_blank\">a plugin for IntelliJ<\/a> IDEA\u00a0that enhances the <a href=\"https:\/\/www.tornadovm.org\/post\/introducing-tornadoinsight-unleashing-the-power-of-tornadovm-in-intellij-idea\" rel=\"nofollow noopener\" target=\"_blank\">developer experience<\/a> when working with TornadoVM.<\/p>\n<p>Future work on the roadmap includes making TornadoVM <a href=\"https:\/\/sdkman.io\/sdks\/tornadovm\/\" rel=\"nofollow noopener\" target=\"_blank\">available on SDKman<\/a>\u00a0and moving the JNI components in the codebase to use the new FFM API instead.<\/p>\n","protected":false},"excerpt":{"rendered":"The TornadoVM project recently reached version 2.0, a major milestone for the open-source project\u00a0that aims to provide a&hellip;\n","protected":false},"author":2,"featured_media":195997,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[220,11764,8853,2335,61,60,94769,1545,9416,80,98146,104517],"class_list":{"0":"post-195996","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-technology","8":"tag-ai","9":"tag-architecture-design","10":"tag-development","11":"tag-gpu","12":"tag-ie","13":"tag-ireland","14":"tag-java","15":"tag-large-language-models","16":"tag-ml-data-engineering","17":"tag-technology","18":"tag-tornadovm","19":"tag-tornadovm-20-gpu-llm"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/195996","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/comments?post=195996"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/195996\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media\/195997"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media?parent=195996"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/categories?post=195996"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/tags?post=195996"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}