Gateworks' New M.2 Card Brings GPU-Level AI to Industrial Hardware

By Mukundan Sivaraj | March 11, 2026 | 5 min read

The California company's new M.2 module adds dedicated AI processing to existing embedded systems

Gateworks, a San Luis Obispo, California-based industrial single-board computer maker founded in 1998, and NXP Semiconductors announced the GW16168 AI Acceleration Card at Embedded World in Nuremberg, Germany on March 10, 2026.

NXP, headquartered in Eindhoven, Netherlands, posted $12.27 billion in revenue in 2025 and has made edge AI a growing focus, with its Industrial and IoT segment growing 24% year-on-year in Q4 2025. The company accelerated that push when it acquired Kinara in October 2025, a discrete NPU hardware maker, for $307 million.

The announcement comes roughly a year after Gateworks joined NXP's Gold Partner program in March 2025, a relationship that underpins the collaboration behind the product. The GW16168 is built around NXP's Ara240 Discrete NPU and is designed to add dedicated AI processing to existing embedded hardware without a full hardware redesign.

Adding AI acceleration to embedded systems has historically forced engineers into a difficult choice. Repurposed GPUs demanded complete system overhauls, while running inference directly on embedded processors pushed thermal and latency limits.

Earlier M.2 accelerators offered more flexibility but delivered limited compute and memory, leaving developers with an expensive balancing act between performance, power, and flexibility.

What the GW16168 Delivers

The card uses an M.2 2280 M-Key form factor and carries NXP's passively cooled Ara240 Discrete NPU alongside 16GB of LPDDR4 memory. It draws a typical 6.6 watts and delivers up to 40 TOPS. Gateworks designed, tested, and assembled the card in the USA to industrial-grade standards, and projects a 10-year module lifespan.

The card connects via M.2 to existing platforms such as the NXP i.MX 8M Plus or i.MX 95, offloading inference workloads from the host CPU and freeing it for system logic and I/O tasks. Gateworks says the 16GB of onboard memory also eliminates the out-of-memory errors that commonly occur when running vision transformers or large language models on standard edge modules.

The passive cooling design allows the card to operate in sealed, fanless environments.

On the software side, the GW16168 is backed by the Ara240 SDK, which supports TensorFlow, PyTorch, and ONNX, and includes integrated model-conversion utilities. Gateworks' CTO described the SDK as middleware between high-level AI frameworks and the NXP Ara hardware, handling model conversion, quantization, and graph optimization.

Ravi Annavajjhala, vice president and general manager of Neural Processing Units at NXP Semiconductors, said the card demonstrates how customers can scale AI performance without redesigning their entire hardware platform, adding that the approach brings "flexibility, longevity and cost efficiency to real-world AI deployments."

The GW16168 and an accompanying development kit will be available through DigiKey, Braemac, RoundSolutions, and Avnet, with shipping expected in late May.

Gateworks made the announcement alongside a second product reveal at the same show: the Catalina SBC family, built on the NXP i.MX 95 processor, which lists the GW16168 as a compatible accelerator module.

Gateworks' New M.2 Card Brings GPU-Level AI to Industrial Hardware

Related Articles