ExSLerate is SandLogic's NPU IP for the edge — FPGA-validated and available for licensing today. Proprietary, patented hardware-software co-design cuts DRAM traffic by up to 50% lossless at 8-bit precision. It supports inference of language, speech, and vision models across four configurations from M64 (always-on wearables) to M4096 (robotics and heavy edge). Native INT4 and FP8 (E4M3) precision. Built on IREE, the open MLIR-based compiler runtime.

What does an ExSLerate IP license include?

Four deliverables: (1) IEEE 1735 encrypted Verilog RTL for the licensed configuration, ready for Cadence Xcelium, Synopsys VCS, and AMD Vivado flows; (2) the full software stack — IREE compiler with ExSLerate extensions, runtime, HAL drivers, and PyTorch / JAX / TensorFlow frontends; (3) a working FPGA bitstream for the validated ZCU106 and Kria target; (4) the UVM verification environment, regression suite, and scripts that SandLogic uses internally to sign off the IP.

Krsna is the planned prototype AI SoC built around the ExSLerate NPU IP — the reference integration that demonstrates the full stack on a single die. Krsna is in the design phase; the IP itself is licensable today and validated on FPGA. Krsna is the proof point. The IP is the product.

What model families does ExSLerate support?

ExSLerate covers four families of model that show up in real products today: LLM / SLM (Llama, Shakti, Qwen, Gemma — text generation), Speech AI (Moonshine, Whisper — STT and TTS), Computer Vision (ResNet, YOLO, VGG — CNN inference), and State Space (Mamba, Jamba — linear recurrence). All four run through the same compiler and runtime stack down to the silicon.

Is the ExSLerate software stack proprietary?

No. The ExSLerate SDK is built on IREE, the open MLIR-based compiler runtime. Standard MLIR dialects (Linalg, TOSA) compile to .vmfb (Virtual Machine FlatBuffer) artifacts. PyTorch, TensorFlow, and JAX are first-class frontends. Anything that targets IREE today targets ExSLerate tomorrow. ExSLerate-specific extensions are three custom passes on top of stock IREE: graph optimization tuned to the IP, proprietary encoding, and quantization for INT4 / FP8 native paths.

How does FP8 accuracy on ExSLerate compare to BF16 on A100?

On Llama 3.1 8B, FP8 (E4M3) on ExSLerate IP tracks BF16 on NVIDIA A100 within rounding across ten benchmarks — MMLU, SST-2, GSM8K, COT, PIQA, HellaSwag, WinoGrande, BoolQ, Lambada, and ARC-C. Detailed throughput, latency, and power numbers are released under NDA on an engagement basis.

// EXSLERATE · NPU IP · FPGA PROTOTYPE VALIDATED · LICENSABLE TODAY

ExSLerate.
NPU IP for the Talk-to-Chip era.

Name: ExSLerate NPU IP · Krsna Prototype SoC
Brand: SandLogic
Availability: InStock

ExSLerate is a system-focused NPU IP — validated on FPGA and available for licensing today. Built around one thesis: for inference at the edge, the memory wall is the only wall that matters. Proprietary, patented hardware-software co-design cuts DRAM traffic by up to 50%, lossless at 8-bit precision. Supports inference of computer-vision, language, and speech models. Krsna is the planned prototype AI SoC built around the IP.

DRAM traffic reduction · 8-bit baseline

50%

Prototype validated

FPGA

IP configurations · M64 → M4096

Native precision

INT4 · FP8

Validated on silicon-class FPGA

ExSLerate runs today on AMD Xilinx ZCU106 and Kria KR260 SOM. The full software stack — IREE compiler, runtime, and host driver — executes on the on-board ARM Cortex application processor with the NPU IP in programmable logic. The system runs end-to-end inference of the supported model families.

// 02 · RECOGNITION

Jury-validated since 2019.

ExSLerate did not arrive in a vacuum. Four institutional milestones — across India's flagship semiconductor programs and one of the chip industry's defining names — mark the path that brought the IP to where it is today.

2019

India Microprocessor Challenge.

ExSLerate V1 ranked #1 of 30 finalists in MeitY's India Microprocessor Challenge. Foundational silicon recognition that seeded the IP family. More →

2023

Aegis Graham Bell + MeitY C2S.

Aegis Graham Bell Award for the chip program. Selected into MeitY C2S — 1 of 13 companies in India's flagship semiconductor program. More →

2024

Qualcomm QSMP.

Selected into Qualcomm QSMP as 1 of 2 cohort companies. Industry-partner validation from the chip leader. More →

2025

Brandworks co-development.

Co-development partnership with Brandworks Technologies announced. First wave of co-developed AI hardware planned for 2026. More →

// 03 · IP CONFIGURATIONS

One IP. Four configurations.

ExSLerate NPU IP ships in four configurations, supported by the modular and scalable engines for compute tiles, scheduler and data pipeline, and the compiler software toolchain. The configurations provide variants with different MAC counts and on-die memory budget, sized for different thermal and product envelopes. License the configuration that fits your design.

Apex

M4096

Talk to chip.

Real-time conversational AI for robotics and heavy edge applications. STT, TTT, and TTS in one inference pipeline. Sized for service robots, automotive HMIs, and industrial control surfaces where latency is the contract.

MAC count: 4096
Target: Robotics · Automotive · Industrial

Surge

M1024

Edge in flight.

Light edge AI for drones and platforms where every gram and milliwatt counts. Object detection, classification, and on-board SLMs in the same envelope. The variant that goes where a fan cannot.

MAC count: 1024
Target: Drones · Aerial · Light edge

Pulse

M256

Pocket inference.

Tuned for the audio-and-display class of consumer devices. Smartwatches with on-device NLU, smart speakers, and any product where the model is a feature shipping in the BOM, not a fallback to the cloud.

MAC count: 256
Target: Smartwatch · Smart speaker

Lite

M64

Always on.

The lowest-power inference target in the family. Built for wearables and hearables where the model never sleeps because the battery cannot afford the wake-up cost. Always-on is the feature.

MAC count: 64
Target: Always-on wearables · Hearables

Memory interface

AXI4 master

Control interface

AXI4 Lite slave

Configurable bus widths per IP configuration. Drops into a standard AXI fabric.

// 04 · FULL STACK VIEW / REFERENCE INTEGRATION

One IP. The full stack around it.

A reference integration view of ExSLerate inside a customer SoC, with the SandLogic software stack riding on top. From production-deployed foundation models, through the IREE open compiler and runtime, down to the silicon blocks and the AXI fabric that ties them together.

Three layers, one stack. Foundation models on top — SandLogic's own production-deployed Shakti, Sruthi and Svara alongside the open model ecosystem. The IREE compiler and runtime in the middle — open and MLIR-based, with the ExSLerate compiler extensions plugged in. And the silicon below: ExSLerate as the NPU accelerator inside a reference SoC, with tightly-coupled SRAM, standard AXI4 to the rest of the system, and DRAM off-chip.

// 05 · WHAT YOU LICENSE

IP deliverables.

ExSLerate ships as a complete IP package: the RTL you integrate, the software stack that drives it, the FPGA bitstream you can stand it up on, and the verification environment we sign it off against ourselves.

RTL

Encrypted RTL.

The IP ships as IEEE 1735 encrypted Verilog, ready for standard simulator and synthesis flows — Cadence Xcelium, Synopsys VCS, AMD Vivado. What you get is the configuration you license: the M4096 RTL is a different deliverable from the M64 RTL, sized accordingly.

STACK

Software stack.

Everything the IP needs to actually run a model on your SoC. The IREE compiler with our extensions, the runtime, the HAL drivers, and frontends for PyTorch, JAX, and TensorFlow. We use it ourselves in the FPGA flow — so what we ship is what we run.

BITSTREAM

FPGA bitstream.

A working bitstream for the validated ZCU106 and Kria target — so you can stand the IP up against your own models on day one, rather than spending a quarter integrating before you see anything inference.

UVM

Verification environment.

UVM testbench, the regression suite we use internally, and the scripts that wire it together. It is the same environment the IP signs off against on our end — not a stripped-down version we hand over.

// 06 · WHY EXSLERATE

Two outcomes that matter at the edge.

ExSLerate NPU IP achieves two critical outcomes that determine the performance of a modern model: how much of the model can fit in the available memory, and how much of the math for the expensive activations is cleanly executed. These outcomes are achieved through proprietary, patented hardware-software co-design.

/ OUTCOME 01

Memory wall, solved.

Up to 50% less DRAM traffic

The dominant cost in edge LLM inference is moving tensors across the memory bus. ExSLerate cuts that traffic by up to 50% at peak context, lossless at 8-bit precision, through proprietary patented co-design — ensuring that the model inferred on chip is the same as provided by the compiler toolchain. The benefits include longer context, lower power, or both.

LosslessPatented8-bit precisionLong contextLow power

* Comparison is against an 8-bit baseline without proprietary algorithms.

/ OUTCOME 02

High-fidelity math, no offload.

Inline non-linear activation

The non-linear functions — the GeLUs, SiLUs, and Softmaxes — execute inline on the ExSLerate datapath. No area cost on die for special-function units. No increase in latency for offload round-trips to a host CPU. FP8 compute units; output precision tracks BF16 on silicon within rounding, across speech, vision, and language workloads.

SiLUGeLUSoftmaxNo CPU offload

// 07 · FUNCTIONALITY

Four model families. One stack, to the silicon.

ExSLerate covers the four families of model that show up in real products today. Language, speech, vision, and state-space models are supported with the same compiler and runtime stack all the way down to the silicon.

LLM / SLM

Text generation

Llama · Shakti · Qwen · Gemma

Speech AI

STT & TTS

Moonshine · Whisper

Computer vision

CNN inference

ResNet · YOLO · VGG

State space

Linear recurrence

Mamba · Jamba

[ ↳ ]

Performance data is shared separately. Throughput, latency, power, and per-configuration benchmarks are released under NDA on an engagement basis. For the full performance dossier and FPGA prototype numbers, write to sales@sandlogic.com.

// 08 · MEMORY WALL

How 8 GB of RAM
holds 128K tokens.

LLM serving fails on edge hardware, primarily because weights and KV-cache do not fit in the system-provided memory. System-focused development of ExSLerate turns that math around: less data crosses the memory bus, more model content fits in the RAM, and compute operates on full tensors.

Same compute, half the bus. Standard NPU on the left, ExSLerate IP on the right — drawn the same way so the differentiation is the IP as a whole. Up to 50% less data crosses the DRAM bus, lossless at 8-bit precision.

* Comparison is against an 8-bit baseline without proprietary algorithms.

50%

DRAM traffic reduction

Up to 50% less data crosses the bus at peak context, lossless at 8-bit precision.

128K

Tokens on an 8 GB endpoint

Llama 3 8B with RAM + SSD swap, where the baseline runs out of memory immediately. More model in the same memory budget.

// CONTEXT EXPANSION ON AN 8 GB ENDPOINT

FP8 PRECISION · RAM + SSD SWAP

Model	Standard RAM (baseline)	ExSLerate IP (in RAM)	SSD extension	Total max context
Llama 3 · 8B	0 (OOM)	40k tokens	+88k tokens	128k tokens
Shakti · 2.5B	45.4k tokens	92k tokens	+36k tokens	128k tokens
Shakti · 500M	32k tokens	32k tokens	Fits in RAM	32k tokens

// 09 · ACCURACY VERIFICATION

FP8 accuracy. Within rounding of BF16.

BF16 baseline on NVIDIA A100 versus FP8 (E4M3) on ExSLerate IP. The table below shows the delta — what they look like in reality.

Config	MMLU	SST-2	GSM8K	COT	PIQA	HELLA	WINO	BoolQ	Lamb	ARC-C
Llama 3.1 8B A100 · BF16 baseline	65.68%	94.00%	55.00%	82.00%	55.00%	79.00%	78.00%	69.00%	52.00%	69.88%
Llama 3.1 8B Krsna · ExSLerate V2 · FP8 (E4M3)	62.91%	93.00%	44.00%	84.00%	53.00%	78.00%	80.00%	65.00%	51.00%	67.87%

// 10 · SOFTWARE STACK

Built on IREE. Open from frontend to silicon.

Hardware is half the product. The ExSLerate SDK is founded on IREE, the open MLIR-based compiler runtime. Standard dialects in, .vmfb out. No proprietary frontend, no vendor lock-in, no rewrite of your model.

No vendor lock-in

Models enter the toolchain through standard MLIR dialects, Linalg and TOSA. Anything that targets IREE today will target ExSLerate tomorrow. Your existing toolchain stays put.

Broad frontend compatibility

PyTorch, TensorFlow, and JAX are first-class. The frontend you ship in is the frontend you stay in. No re-export, no rewrite, no parallel model branch.

Flexible deployment

IREE decouples the model graph from the hardware executable. Update one without rebuilding the other. The HAL handles scheduling and runtime; the FlatBuffer carries the deployable.

ExSLerate extensions

Three custom passes ride on top of stock IREE: graph optimization tuned to the IP, proprietary encoding passes injected at the right edges, and quantization for INT4 and FP8 native paths.

// COMPILATION FLOW

User Model

PyTorch · JAX · TensorFlow

↓ Import to MLIR (Linalg / TOSA)

IREE Compiler + ExSLerate Extensions

Graph optimization · Proprietary encoding · Quantization

↓ .vmfb (Virtual Machine FlatBuffer)

IREE-Based ExSLerate Runtime

HAL (Hardware Abstraction Layer) · Scheduling

↓ PCIe / AXI

ExSLerate IP

M64 · M256 · M1024 · M4096

Supported precision

Native datapath formats

INT4FP8 (E4M3)

// 11 · TARGET INTEGRATIONS

Where ExSLerate lives.

Four market segments and use cases, four product envelopes. License the configuration that fits your design.

M4096 · Apex

Robotics & heavy edge AI.

→Service and receptionist robots
→Hospital and elderly-care assistants
→Automotive HMIs and cockpit voice agents
→Industrial control surfaces with NLU
→Real-time speech-to-speech translation

M1024 · Surge

Drones & light edge AI.

→Aerial inspection and survey drones
→Delivery and logistics drones
→IoT and security cameras
→Industrial vision and anomaly detection
→On-board SLM for autonomous platforms

M256 · Pulse

Smartwatch & smart speaker.

→Smartwatches with on-device NLU
→Smart speakers with local intent
→Home hubs and voice appliances
→In-ear and audio-first devices
→Display-bearing wearables

M64 · Lite

Always-on wearables.

→Hearables and earbuds
→Fitness and health bands
→Continuous biometric monitors
→Always-listening keyword and wake detect
→Low-power sensor fusion endpoints

// 12 · THE PLAN AHEAD

Krsna prototype SoC. The IP, in silicon.

ExSLerate is FPGA-validated and shipping as licensable IP today. The next phase is silicon. Krsna is the prototype SoC we are building around the IP — the reference integration that demonstrates the full stack on a single die.

In market

Phase 01

IP available

ExSLerate IP

FPGA prototype validated. Available for licensing across the four configurations. Customer engagements active.

Underway

Phase 02

Krsna design

Reference SoC

Krsna prototype SoC under design. Demonstrates the full ExSLerate stack on a single die, end to end.

Planned

Phase 03

First silicon

Tapeout

Krsna goes to silicon. First samples back, brought up against the full ExSLerate compiler and runtime stack.

Beyond Krsna

Phase 04

Customer SoCs

Multiple integrations

Customer-defined SoCs built around licensed ExSLerate IP, in parallel. The IP is the product; Krsna is the proof.

// 13 · IP ROADMAP

ExSLerate, beyond Gen 1.

ExSLerate Gen 1 is the IP available today, targeted at endpoint and robotics-class designs. Future generations push the same IP family into SOHO server and data-center envelopes — advanced architectural enhancements for inter-die and intra-die compute clusters, bigger memory and bandwidth.

Available today

Gen 1

Endpoint & robotics

ExSLerate

Run 8B-class models on edge devices. Up to 50% DRAM traffic reduction at 8-bit precision. Four IP configurations from M64 to M4096.

Next gen

Gen 2

SOHO server

ExSLerate · server-class

Targets local 27B-class inference for enterprise RAG. Engineered to land on cost-effective hardware (24 GB GDDR6, 128-bit bus) instead of the 48 GB / 384-bit alternative.

Future

Gen 3

Data center

ExSLerate · DC-class

A100-class throughput envelope. Built for full-rack deployment in sovereign and private clouds.

// ARCHITECTURE EFFICIENCY · 27B MODEL TARGET · GEN 2 SOHO SERVER VS STANDARD

Specification	Standard requirement	ExSLerate Gen 2
Required RAM	48 GB GDDR6	24 GB GDDR62× smaller
Memory bus width	384-bit (expensive)	128-bitoptimized
Target application	SOHO / local privacy	SOHO server · local RAG

Notes. ExSLerate is FPGA-validated NPU IP available for licensing. Krsna is the planned prototype SoC built around the IP. Detailed throughput, latency, power, and per-configuration benchmarks are released under NDA on an engagement basis. For the full performance dossier, contact sales@sandlogic.com.

// LET'S BUILD

License ExSLerate IP. Talk to us about your design.

Request IP brief Talk to engineering

ExSLerate.NPU IP for the Talk-to-Chip era.