Name: EdgeMatrix — CORE + EdgeFlow
Brand: SandLogic
Availability: InStock

Question 1

What is EdgeMatrix?

Accepted Answer

EdgeMatrix is SandLogic's hardware-agnostic AI inference platform. It has two layers: CORE (the compiler + runtime engine that dispatches any model architecture to any silicon target) and EdgeFlow (the inference acceleration engine where models actually execute). EdgeMatrix is positioned as "the CUDA of edge AI" — unified programming model, full toolchain (compiler + runtime + libraries), 193 model architectures pre-tuned, but unlike CUDA, hardware-agnostic by design.

Question 2

How is EdgeMatrix different from CUDA?

Accepted Answer

EdgeMatrix is similar to CUDA in four ways and different in one. Similar: unified programming abstraction, full toolchain (compiler + runtime + op libraries), hand-tuned kernels per silicon target, model-zoo breadth shipped in. Different: CUDA only runs on NVIDIA; EdgeMatrix natively optimizes for NVIDIA, AMD, Intel, ARM, Qualcomm — and our own Krsna SoC. CUDA is NVIDIA's moat and developer's lock; EdgeMatrix removes the silicon constraint from the inference decision.

Question 3

How does EdgeMatrix achieve +73% throughput over vLLM?

Accepted Answer

EdgeMatrix v0.0.4 (EdgeFlow inside) drives the +73% L40s lift through three mechanisms. Hybrid KV-cache reuse: prefix-level cache for shared prompts plus entity-level cache for retrieved chunks; cache hits skip the model entirely. Cache-aware scheduling: routes requests to GPU/NPU/CPU based on cache locality, not round-robin. Dynamic compiler dispatch: just-in-time kernel selection based on batch shape, sequence length, and target device. Together they deliver ~20% efficiency gain at the runtime layer alone.

Question 4

What model architectures does EdgeMatrix support?

Accepted Answer

EdgeMatrix ships pre-tuned for 193 model architectures across eight families: Transformers (Llama, Qwen, Mistral, Shakti, Phi, DeepSeek, Gemma), VLMs (Shakti-VLM, Qwen2-VL, LLaVA), State Space Models (Mamba, Mamba-2, Jamba), Linear Attention / RWKV, Liquid Foundation Models (LFM-1B / 3B / 40B), CNNs (ResNet, YOLO, EfficientNet, MobileNet), Mixture of Experts (Mixtral, DeepSeek-V3), and Diffusion (Stable Diffusion, SDXL, Flux). CORE handles dispatch; EdgeFlow handles acceleration.

Question 5

What silicon does EdgeMatrix run on?

Accepted Answer

EdgeMatrix runs across the full silicon envelope enterprises actually buy: NVIDIA GPUs (H100, A100, L40s, L4), AMD CPUs and GPUs with ROCm, Intel CPUs and Arc, ARM-class processors including Raspberry Pi and edge SoCs, Qualcomm QDC stack, plus our own Krsna SoC for in-house deployment. Same engine, every silicon family. Adding a new target is days of integration, not quarters of bespoke porting.

EdgeMatrix.
The CUDA of edge AI.

CUDA, but unlocked.

Like CUDA: one programming abstraction across the silicon.

Like CUDA: the full toolchain, not just a library.

Like CUDA: hand-tuned kernels for the silicon.

Like CUDA: ships with the model architectures already supported.

Unlike CUDA: hardware-agnostic by design.

Faster than every open framework.

EdgeMatrix vs leading runtimes

Same engine on enterprise hardware

Write once. Run anywhere.

Built for the production reality.

Hybrid KV cache reuse

Dynamic compiler optimization

Hardware-agnostic acceleration

Quantization without quality loss

Cache-aware scheduling

Native VLM, MoE, multi-modal

How the +73% lift actually works.

193 models and counting.

Shakti

Llama

Qwen

Mistral

Phi

DeepSeek

Gemma

And many more

Replace your inference stack — in a week.

EdgeMatrix.The CUDA of edge AI.