{"id":366401,"date":"2026-03-30T20:29:10","date_gmt":"2026-03-30T20:29:10","guid":{"rendered":"https:\/\/www.newsbeep.com\/il\/366401\/"},"modified":"2026-03-30T20:29:10","modified_gmt":"2026-03-30T20:29:10","slug":"running-ai-on-a-pi-in-under-5-minutes-virtualization-review","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/il\/366401\/","title":{"rendered":"Running AI on a Pi in Under 5 minutes &#8212; Virtualization Review"},"content":{"rendered":"\n<p id=\"ph_pcontent1_0_KickerText\" class=\"kicker\">How-To<\/p>\n<p>        Running AI on a Raspberry Pi, Part 2: Running AI on a Pi in Under 5 minutes<\/p>\n<p>In a recent<br \/>\n <a href=\"https:\/\/virtualizationreview.com\/Articles\/2026\/03\/30\/Running-AI-on-a-Raspberry-Pi-Part-1-Overview.aspx\" target=\"_blank\" rel=\"nofollow noopener\">article<\/a>, I discussed running AI locally on a relatively<br \/>\n low-powered system, particularly a Raspberry Pi 500+. In that<br \/>\n article, I discussed the major software components of a local AI,<br \/>\n including LLMs and RAGs, what they are, and how they are used. I also<br \/>\n discussed why I think the Pi, with its ARM processor, 16 GB RAM, and<br \/>\n NVMe drive, could handle the load that AI would require. Since the Pi<br \/>\n 500+ is based on the more popular Pi 5, I will use that model&#8217;s name<br \/>\n interchangeably in this article, as I did in my past article.<\/p>\n<p>In this article,<br \/>\nI will run a Local Large Language Model (LLM) system on a<br \/>\n Raspberry Pi. This was a great way for me to dip my toe into AI,<br \/>\n and the best part was that I could do it in less than 5 minutes!<\/p>\n<p>Which LLM on a Pi<br \/>Recent advances<br \/>\n in model architecture and aggressive quantization have enabled the<br \/>\n running of AI models on extremely small devices, such as the<br \/>\n Raspberry Pi. People have reported that LLMs in the 1&#8211;4 billion<br \/>\n parameter range can now deliver impressive performance for tasks such<br \/>\n as text generation, reasoning, coding, tool calling, and even<br \/>\n vision understanding, all without requiring GPUs, cloud resources, or<br \/>\n heavy infrastructure.<\/p>\n<p>Using tools like<br \/>\n Ollama and quantized model formats, it is now practical to<br \/>\n experiment with local AI on low-power hardware, opening the door<br \/>\n to affordable, private, and portable AI deployments that fit<br \/>\n literally in the palm of your hand (Pi 5), or in my case, the inside<br \/>\n of a keyboard (Pi 500+).<\/p>\n<p> <a href=\"https:\/\/virtualizationreview.com\/articles\/2026\/03\/30\/~\/media\/ECG\/virtualizationreview\/Images\/2026\/03\/VR_Running_AI_on_Pi_P2_Ollama_html_23b467bb.ashx\" target=\"_blank\" rel=\"nofollow noopener\"><img loading=\"lazy\" decoding=\"async\" alt=\"Picture 1\" src=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/03\/VR_Running_AI_on_Pi_P2_Ollama_html_23b467bb_s.ashx.jpeg\" width=\"300\" height=\"94\"\/> <\/a><br \/>\n [Click on image for larger view.]<\/p>\n<p>Before starting<br \/>\n this project, I reviewed many popular tiny models, including several<br \/>\n from the Qwen3 family, EXAONE 4.0, Ministral 3, Jamba Reasoning,<br \/>\n IBM&#8217;s Granite Micro, and Microsoft&#8217;s Phi-4 Mini. Each had<br \/>\n different strengths in areas such as long-context processing,<br \/>\n reasoning, multimodal understanding, and agentic capabilities. In the<br \/>\n end, I narrowed it down to three small LLMs to work with.<\/p>\n<p>Ollama<br \/>Ollama is an<br \/>\n open-source AI platform that lets you run LLMs locally, without<br \/>\n relying on cloud-hosted AI services. It allows you to download,<br \/>\n manage, and run models such as Llama, Mistral, Gemma, and others<br \/>\n directly on your own machines, whether they&#8217;re laptops, desktops,<br \/>\n servers, or even Pi systems.<\/p>\n<p>Ollama<br \/>\n abstracts away much of the complexity involved in model setup,<br \/>\n dependency management, and hardware acceleration. It provides a clean<br \/>\n command-line interface and API that developers can easily integrate<br \/>\n into their workflows. Ollama is gaining widespread adoption among<br \/>\n developers, researchers, IT professionals, and organizations<br \/>\n experimenting with private AI deployments, local RAG pipelines, and<br \/>\n edge-based inference.<\/p>\n<p> <a href=\"https:\/\/virtualizationreview.com\/articles\/2026\/03\/30\/~\/media\/ECG\/virtualizationreview\/Images\/2026\/03\/VR_Running_AI_on_Pi_P2_Ollama_html_b7be4f5b.ashx\" target=\"_blank\" rel=\"nofollow noopener\"><img loading=\"lazy\" decoding=\"async\" alt=\"Picture 2\" src=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/03\/VR_Running_AI_on_Pi_P2_Ollama_html_b7be4f5b_s.ashx.jpeg\" width=\"593\" height=\"593\"\/> <\/a><br \/>\n [Click on image for larger view.]<\/p>\n<p>Ollama emerged in<br \/>\n response to a movement toward greater access to AI and the<br \/>\n decentralization of inference workloads. It really started to gain<br \/>\n traction in 2023 and accelerated rapidly through 2024 and 2025 as<br \/>\n local model performance improved dramatically. As quantization<br \/>\n techniques, optimized runtimes, and efficient architecture made it<br \/>\n possible to run capable models on consumer-grade hardware, Ollama<br \/>\n quickly positioned itself as an easy way for those with limited<br \/>\n resources to get started with AI. Its rapid adoption has been<br \/>\n fueled by its simplicity, active open-source community, and growing<br \/>\n ecosystem of supported models and integrations.\n<\/p>\n<p>Ollama Commands<br \/>Here are a few of<br \/>\n the commands I used with Ollama. These examples use llama3 as the<br \/>\n LLM, but you can use other LLMs.<\/p>\n<p>\n  Command<\/p>\n<p>\n  Example<\/p>\n<p>\n  What<br \/>\n  It Does<\/p>\n<p>\n  ollama<br \/>\n  pull<\/p>\n<p>\n  ollama<br \/>\n  pull llama3<\/p>\n<p>\n  Downloads<br \/>\n  a model from the Ollama registry and stores it locally for offline<br \/>\n  use.<\/p>\n<p>\n  ollama<br \/>\n  run<\/p>\n<p>\n  ollama<br \/>\n  run llama3<\/p>\n<p>\n  Launches<br \/>\n  an interactive chat session with a model. It will download the<br \/>\n  model if needed.<\/p>\n<p>\n  ollama<br \/>\n  list<\/p>\n<p>\n  ollama<br \/>\n  list<\/p>\n<p>\n  Lists<br \/>\n  all models currently installed on the local system.<\/p>\n<p>\n  ollama<br \/>\n  rm<\/p>\n<p>\n  ollama<br \/>\n  rm llama3<\/p>\n<p>\n  Deletes<br \/>\n  a locally stored model to free disk space.<\/p>\n<p>\n  ollama<br \/>\n  show<\/p>\n<p>\n  ollama<br \/>\n  show llama3<\/p>\n<p>\n  Displays<br \/>\n  detailed information about a model, including parameters and<br \/>\n  configuration.<\/p>\n<p>\n  ollama<br \/>\n  serve<\/p>\n<p>\n  ollama<br \/>\n  serve<\/p>\n<p>\n  Starts<br \/>\n  the local Ollama API server for programmatic access and<br \/>\n  integrations.<\/p>\n<p>\n  ollama<br \/>\n  ps<\/p>\n<p>\n  ollama<br \/>\n  ps<\/p>\n<p>\n  Shows<br \/>\n  currently running models and active inference processes.<\/p>\n<p>\n  ollama<br \/>\n  stop<\/p>\n<p>\n  ollama<br \/>\n  stop llama3<\/p>\n<p>\n  Stops<br \/>\n  a currently running model session.<\/p>\n<p>\n  ollama<br \/>\n  create<\/p>\n<p>\n  ollama<br \/>\n  create my-model -f Modelfile<\/p>\n<p>\n  Builds<br \/>\n  a custom model using a Modelfile configuration.<\/p>\n<p>\n  ollama<br \/>\n  cp<\/p>\n<p>\n  ollama<br \/>\n  cp llama3 my-model<\/p>\n<p>\n  Copies<br \/>\n  an existing model locally, often used as a base for customization.<\/p>\n<p>\nTom&#8217;s Tip: Use \/bye to exit ollama.<\/p>\n<p>\n Installing<br \/>\n a Local LLM on Raspberry Pi<br \/>Installing and<br \/>\n running an LLM on Raspberry Pi is relatively simple using Ollama.\n<\/p>\n<p>Deciding which<br \/>\n LLM to run is more difficult. The &#8220;religious&#8221; wars over LLMs<br \/>\n are intense and make past IT wars, such as Windows vs. Mac or<br \/>\n file vs. block storage, seem minor compared to the opinions about LLM<br \/>\n selection. After far too much research, I decided to start with<br \/>\n qwen2.5.<\/p>\n<p>Qwen is an<br \/>\n interesting LLM and made quite a ruckus when it was first released,<br \/>\n as it is an LLM from the Chinese company Alibaba. The Qwen family of<br \/>\n LLMs were, designed to deliver strong performance across reasoning,<br \/>\n coding, multilingual understanding, and general-purpose text<br \/>\n generation.\n<\/p>\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"How-To Running AI on a Raspberry Pi, Part 2: Running AI on a Pi in Under 5 minutes&hellip;\n","protected":false},"author":2,"featured_media":366402,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[75429,5130,20721,2934,15083,176553,176559,85,46,18529,134,176556,176555,140,176557,35362,125,21652,176558,58642,176554],"class_list":{"0":"post-366401","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-technology","8":"tag-backup","9":"tag-cloud","10":"tag-containers","11":"tag-edge","12":"tag-hybrid-cloud","13":"tag-hyper-v","14":"tag-hyperconverged","15":"tag-il","16":"tag-israel","17":"tag-kubernetes","18":"tag-microsoft","19":"tag-sd-wan","20":"tag-sdn","21":"tag-security","22":"tag-serverless","23":"tag-storage","24":"tag-technology","25":"tag-veeam","26":"tag-virtual-desktops","27":"tag-vmware","28":"tag-vsphere"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/366401","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/comments?post=366401"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/366401\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media\/366402"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media?parent=366401"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/categories?post=366401"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/tags?post=366401"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}