Skip to main content

Your submission was sent successfully! Close

Thank you for signing up for our newsletter!
In these regular emails you will find the latest updates from Canonical and upcoming events where you can meet our team.Close

Thank you for contacting us. A member of our team will be in touch shortly. Close

An error occurred while submitting your form. Please try again or file a bug report. Close

  1. Blog
  2. Article

Canonical
on 23 October 2025

Introducing silicon-optimized inference snaps


Install a well-known model like DeepSeek R1 or Qwen 2.5 VL with a single command, and get the silicon-optimized AI engine automatically.

London, October 23 – Canonical today announced optimized inference snaps, a new way to deploy AI models on Ubuntu devices, with automatic selection of optimized engines, quantizations and architectures based on the specific silicon of the device. Canonical is working with a wide range of silicon providers to deliver their optimizations of well-known LLMs to the developers and devices.

A single well-known model like Qwen 2.5 VL or DeepSeek R1 has many different sizes and setup configurations, each of which is optimized for specific silicon. It can be difficult for an end-user to know which model size and runtime to use on their device. Now, a single command gets you the best combination, automatically. Canonical is working with silicon partners to integrate their optimizations. As new partners publish their optimizations, the models will become more efficient on more devices.

This enables developers to integrate well-known AI capabilities seamlessly into their applications and have them run optimally across desktops, servers, and edge devices.

A snap package can dynamically load components. We fetch the recommended build for the host system, simplifying dependency management while improving latency.  The public beta includes Intel and Ampere®-optimized DeepSeek R1 and Qwen 2.5 VL as examples, and open sources the framework by which these are built.

“We are making silicon-optimized AI models available for everyone. When enabled by the user, they will be deeply integrated down to the silicon level,” said Jon Seager, VP Engineering at Canonical, “I’m excited to work with silicon partners to ensure that their silicon-optimized models ‘just work.’ Developers and end-users no longer need to worry about the complex matrix of engines, builds and quantizations. Instead, they can reliably integrate a local version of the model that is as efficient as possible and continuously improves.”

The silicon ecosystem invests heavily in performance optimizations for AI, but developer environments are complex and lack simple tools for unpacking all the necessary components for building complete runtime environments. On Ubuntu, the community can now distribute their optimized stacks straight to end users. Canonical worked closely with Intel and Ampere to deliver hardware-tuned inference snaps that maximize performance.

“By working with Canonical to package and distribute large language models optimized for Ampere hardware through our AIO software, developers can simply get our recommended builds by default, already tuned for Ampere processors in their servers,” said Jeff Wittich, Chief Product Officer at Ampere, “This brings Ampere’s high performance and efficiency to end users right out of the box. Together, we’re enabling enterprises to rapidly deploy and scale their preferred AI models on Ampere systems with Ubuntu’s AI-ready ecosystem.”

“Intel optimizes for AI workloads from silicon to high-level software libraries. Until now, a developer has needed the skills and knowledge to select which model variants and optimizations may be best for their client system,” said Jim Johnson, Senior VP, GM of Client Computing Group, Intel, “Canonical’s approach to packaging and distributing AI models overcomes this challenge, enabling developers to extract the performance and cost benefits of Intel hardware with ease. One command detects the hardware and uses OpenVINO, our open source toolkit for accelerating AI inference, to deploy a recommended model variant, with recommended parameters, onto the most suitable device.”

Get started today 

Get started and run silicon-optimized models on Ubuntu with the following commands:

sudo snap install qwen-vl --beta

sudo snap install deepseek-r1 --beta

Developers can begin experimenting with the local and standard inference endpoints of these models to power AI capabilities in their end-user applications. 

Learn more and provide feedback

About Canonical 

Canonical, the publisher of Ubuntu, provides open source security, support and services. Our portfolio covers critical systems, from the smallest devices to the largest clouds, from the kernel to containers, from databases to AI. With customers that include top tech brands, emerging startups, governments and home users, Canonical delivers trusted open source for everyone.

Learn more at https://canonical.com/

Related posts


Anthony Dillon
6 November 2025

Web Engineering: Celebrating Our Third Annual Hack Week

Design Engineering

The Web Engineering team is thrilled to announce the successful conclusion of our third annual Hack Week! Over the past three years, this initiative has become a cornerstone of our collaborative spirit and commitment to innovation. With 126 significant contributions to date, Hack Week provides a dedicated space for our engineers to tackle ...


Jehudi
5 November 2025

Azure VM utils now included in Ubuntu: boosting cloud workloads

Cloud and server Public Cloud

Ubuntu images on Microsoft Azure have recently started shipping with the open source package azure-vm-utils included by default. This change provides essential utilities and udev rules to optimize the Linux experience on Azure, resulting in more reliable disks, smoother networking on accelerated setups, and fewer tweaks to get things runn ...


Benjamin Ryzman
5 November 2025

Edge Networking gets smarter: AI and 5G in action

5G core network private mobile network

Organizations everywhere are pushing AI and networks closer to the edge. With that expansion comes a challenge: how do you ensure reliable performance, efficiency, and security outside of the data center? Worker safety, healthcare automation, and the success of mobile private networks depend on a robust technology stack that can withstand ...