// EXSLERATE · NPU IP · FPGA PROTOTYPE VALIDATED · LICENSABLE TODAY

ExSLerate.
NPU IP for the Talk-to-Chip era.

ExSLerate is a system-focused NPU IP — validated on FPGA and available for licensing today. Built around one thesis: for inference at the edge, the memory wall is the only wall that matters. Proprietary, patented hardware-software co-design cuts DRAM traffic by up to 50%, lossless at 8-bit precision. Supports inference of computer-vision, language, and speech models. Krsna is the planned prototype AI SoC built around the IP.

DRAM traffic reduction · 8-bit baseline
50%
Prototype validated
FPGA
IP configurations · M64 → M4096
4
Native precision
INT4 · FP8
Validated on silicon-class FPGA

ExSLerate runs today on AMD Xilinx ZCU106 and Kria KR260 SOM. The full software stack — IREE compiler, runtime, and host driver — executes on the on-board ARM Cortex application processor with the NPU IP in programmable logic. The system runs end-to-end inference of the supported model families.

Jury-validated since 2019.

ExSLerate did not arrive in a vacuum. Four institutional milestones — across India's flagship semiconductor programs and one of the chip industry's defining names — mark the path that brought the IP to where it is today.

2019

India Microprocessor Challenge.

ExSLerate V1 ranked #1 of 30 finalists in MeitY's India Microprocessor Challenge. Foundational silicon recognition that seeded the IP family. More →

2023

Aegis Graham Bell + MeitY C2S.

Aegis Graham Bell Award for the chip program. Selected into MeitY C2S — 1 of 13 companies in India's flagship semiconductor program. More →

2024

Qualcomm QSMP.

Selected into Qualcomm QSMP as 1 of 2 cohort companies. Industry-partner validation from the chip leader. More →

2025

Brandworks co-development.

Co-development partnership with Brandworks Technologies announced. First wave of co-developed AI hardware planned for 2026. More →

One IP. Four configurations.

ExSLerate NPU IP ships in four configurations, supported by the modular and scalable engines for compute tiles, scheduler and data pipeline, and the compiler software toolchain. The configurations provide variants with different MAC counts and on-die memory budget, sized for different thermal and product envelopes. License the configuration that fits your design.

Apex
M4096
Talk to chip.

Real-time conversational AI for robotics and heavy edge applications. STT, TTT, and TTS in one inference pipeline. Sized for service robots, automotive HMIs, and industrial control surfaces where latency is the contract.

MAC count
4096
Target
Robotics · Automotive · Industrial
Surge
M1024
Edge in flight.

Light edge AI for drones and platforms where every gram and milliwatt counts. Object detection, classification, and on-board SLMs in the same envelope. The variant that goes where a fan cannot.

MAC count
1024
Target
Drones · Aerial · Light edge
Pulse
M256
Pocket inference.

Tuned for the audio-and-display class of consumer devices. Smartwatches with on-device NLU, smart speakers, and any product where the model is a feature shipping in the BOM, not a fallback to the cloud.

MAC count
256
Target
Smartwatch · Smart speaker
Lite
M64
Always on.

The lowest-power inference target in the family. Built for wearables and hearables where the model never sleeps because the battery cannot afford the wake-up cost. Always-on is the feature.

MAC count
64
Target
Always-on wearables · Hearables
Memory interface
AXI4 master
Control interface
AXI4 Lite slave

Configurable bus widths per IP configuration. Drops into a standard AXI fabric.

One IP. The full stack around it.

A reference integration view of ExSLerate inside a customer SoC, with the SandLogic software stack riding on top. From production-deployed foundation models, through the IREE open compiler and runtime, down to the silicon blocks and the AXI fabric that ties them together.

LAYER 01 · FOUNDATION MODELSPRODUCTION DEPLOYEDLanguage (TTT)Shakti 2.5B+ open modelsSpeech (STT)Sruthi+ open modelsSpeech (TTS)Svara+ open modelsVisionOpen modelsYOLO, ResNet, etc.Imported as MLIR (Linalg / TOSA)LAYER 02 · IREE SOFTWARE STACKOPEN · MLIR-BASEDFrontendsPyTorch · JAX · TensorFlowIREE Compiler + ExSLerate ExtensionsGraph optimization · Quantization · Proprietary encodingIREE Runtime + HAL.vmfb · Scheduling · DriverRuntime drives the IP through AXILAYER 03 · REFERENCE SOC INTEGRATIONEXAMPLE SOCSOCCPUhost / controlPCIehost / NICAXI4 INTERCONNECTAXI4 LiteAXI4AXI4ExSLerate IPM64 · M256M1024 · M4096NPU acceleratorAXI4 LiteAXI4SRAMtightly coupleddedicatedDRAMLPDDR / GDDRAXI4 · memory bus

Three layers, one stack. Foundation models on top — SandLogic's own production-deployed Shakti, Sruthi and Svara alongside the open model ecosystem. The IREE compiler and runtime in the middle — open and MLIR-based, with the ExSLerate compiler extensions plugged in. And the silicon below: ExSLerate as the NPU accelerator inside a reference SoC, with tightly-coupled SRAM, standard AXI4 to the rest of the system, and DRAM off-chip.

IP deliverables.

ExSLerate ships as a complete IP package: the RTL you integrate, the software stack that drives it, the FPGA bitstream you can stand it up on, and the verification environment we sign it off against ourselves.

RTL

Encrypted RTL.

The IP ships as IEEE 1735 encrypted Verilog, ready for standard simulator and synthesis flows — Cadence Xcelium, Synopsys VCS, AMD Vivado. What you get is the configuration you license: the M4096 RTL is a different deliverable from the M64 RTL, sized accordingly.

STACK

Software stack.

Everything the IP needs to actually run a model on your SoC. The IREE compiler with our extensions, the runtime, the HAL drivers, and frontends for PyTorch, JAX, and TensorFlow. We use it ourselves in the FPGA flow — so what we ship is what we run.

BITSTREAM

FPGA bitstream.

A working bitstream for the validated ZCU106 and Kria target — so you can stand the IP up against your own models on day one, rather than spending a quarter integrating before you see anything inference.

UVM

Verification environment.

UVM testbench, the regression suite we use internally, and the scripts that wire it together. It is the same environment the IP signs off against on our end — not a stripped-down version we hand over.

Two outcomes that matter at the edge.

ExSLerate NPU IP achieves two critical outcomes that determine the performance of a modern model: how much of the model can fit in the available memory, and how much of the math for the expensive activations is cleanly executed. These outcomes are achieved through proprietary, patented hardware-software co-design.

/ OUTCOME 01

Memory wall, solved.

Up to 50% less DRAM traffic

The dominant cost in edge LLM inference is moving tensors across the memory bus. ExSLerate cuts that traffic by up to 50% at peak context, lossless at 8-bit precision, through proprietary patented co-design — ensuring that the model inferred on chip is the same as provided by the compiler toolchain. The benefits include longer context, lower power, or both.

LosslessPatented8-bit precisionLong contextLow power

* Comparison is against an 8-bit baseline without proprietary algorithms.

/ OUTCOME 02

High-fidelity math, no offload.

Inline non-linear activation

The non-linear functions — the GeLUs, SiLUs, and Softmaxes — execute inline on the ExSLerate datapath. No area cost on die for special-function units. No increase in latency for offload round-trips to a host CPU. FP8 compute units; output precision tracks BF16 on silicon within rounding, across speech, vision, and language workloads.

SiLUGeLUSoftmaxNo CPU offload

Four model families. One stack, to the silicon.

ExSLerate covers the four families of model that show up in real products today. Language, speech, vision, and state-space models are supported with the same compiler and runtime stack all the way down to the silicon.

LLM / SLM
Text generation
Llama · Shakti · Qwen · Gemma
Speech AI
STT & TTS
Moonshine · Whisper
Computer vision
CNN inference
ResNet · YOLO · VGG
State space
Linear recurrence
Mamba · Jamba
[ ↳ ]

Performance data is shared separately. Throughput, latency, power, and per-configuration benchmarks are released under NDA on an engagement basis. For the full performance dossier and FPGA prototype numbers, write to sales@sandlogic.com.

// 08 · MEMORY WALL

How 8 GB of RAM
holds 128K tokens.

LLM serving fails on edge hardware, primarily because weights and KV-cache do not fit in the system-provided memory. System-focused development of ExSLerate turns that math around: less data crosses the memory bus, more model content fits in the RAM, and compute operates on full tensors.

STANDARD NPUEXSLERATE IPDRAMFull tensors8-bit baselineStandard NPUcompute inside100% DRAM TRAFFIC8-bit precision at max contextDRAMExSLerateencodedExSLerate IPCOMPUTE INSIDE~50% DRAM TRAFFICLossless 8-bit precision at max context

Same compute, half the bus. Standard NPU on the left, ExSLerate IP on the right — drawn the same way so the differentiation is the IP as a whole. Up to 50% less data crosses the DRAM bus, lossless at 8-bit precision.

* Comparison is against an 8-bit baseline without proprietary algorithms.

50%

DRAM traffic reduction

Up to 50% less data crosses the bus at peak context, lossless at 8-bit precision.

128K

Tokens on an 8 GB endpoint

Llama 3 8B with RAM + SSD swap, where the baseline runs out of memory immediately. More model in the same memory budget.

// CONTEXT EXPANSION ON AN 8 GB ENDPOINT

FP8 PRECISION · RAM + SSD SWAP

ModelStandard RAM (baseline)ExSLerate IP (in RAM)SSD extensionTotal max context
Llama 3 · 8B0 (OOM)40k tokens+88k tokens128k tokens
Shakti · 2.5B45.4k tokens92k tokens+36k tokens128k tokens
Shakti · 500M32k tokens32k tokensFits in RAM32k tokens

FP8 accuracy. Within rounding of BF16.

BF16 baseline on NVIDIA A100 versus FP8 (E4M3) on ExSLerate IP. The table below shows the delta — what they look like in reality.

ConfigMMLUSST-2GSM8KCOTPIQAHELLAWINOBoolQLambARC-C
Llama 3.1 8B
A100 · BF16 baseline
65.68%94.00%55.00%82.00%55.00%79.00%78.00%69.00%52.00%69.88%
Llama 3.1 8B
Krsna · ExSLerate V2 · FP8 (E4M3)
62.91%93.00%44.00%84.00%53.00%78.00%80.00%65.00%51.00%67.87%

Built on IREE. Open from frontend to silicon.

Hardware is half the product. The ExSLerate SDK is founded on IREE, the open MLIR-based compiler runtime. Standard dialects in, .vmfb out. No proprietary frontend, no vendor lock-in, no rewrite of your model.

No vendor lock-in

Models enter the toolchain through standard MLIR dialects, Linalg and TOSA. Anything that targets IREE today will target ExSLerate tomorrow. Your existing toolchain stays put.

Broad frontend compatibility

PyTorch, TensorFlow, and JAX are first-class. The frontend you ship in is the frontend you stay in. No re-export, no rewrite, no parallel model branch.

Flexible deployment

IREE decouples the model graph from the hardware executable. Update one without rebuilding the other. The HAL handles scheduling and runtime; the FlatBuffer carries the deployable.

ExSLerate extensions

Three custom passes ride on top of stock IREE: graph optimization tuned to the IP, proprietary encoding passes injected at the right edges, and quantization for INT4 and FP8 native paths.

// COMPILATION FLOW
User Model
PyTorch · JAX · TensorFlow
↓  Import to MLIR (Linalg / TOSA)
IREE Compiler + ExSLerate Extensions
Graph optimization · Proprietary encoding · Quantization
↓  .vmfb (Virtual Machine FlatBuffer)
IREE-Based ExSLerate Runtime
HAL (Hardware Abstraction Layer) · Scheduling
↓  PCIe / AXI
ExSLerate IP
M64 · M256 · M1024 · M4096

Supported precision

Native datapath formats

INT4FP8 (E4M3)
// 11 · TARGET INTEGRATIONS

Where ExSLerate lives.

Four market segments and use cases, four product envelopes. License the configuration that fits your design.

M4096 · Apex

Robotics & heavy edge AI.

  • Service and receptionist robots
  • Hospital and elderly-care assistants
  • Automotive HMIs and cockpit voice agents
  • Industrial control surfaces with NLU
  • Real-time speech-to-speech translation
M1024 · Surge

Drones & light edge AI.

  • Aerial inspection and survey drones
  • Delivery and logistics drones
  • IoT and security cameras
  • Industrial vision and anomaly detection
  • On-board SLM for autonomous platforms
M256 · Pulse

Smartwatch & smart speaker.

  • Smartwatches with on-device NLU
  • Smart speakers with local intent
  • Home hubs and voice appliances
  • In-ear and audio-first devices
  • Display-bearing wearables
M64 · Lite

Always-on wearables.

  • Hearables and earbuds
  • Fitness and health bands
  • Continuous biometric monitors
  • Always-listening keyword and wake detect
  • Low-power sensor fusion endpoints

Krsna prototype SoC. The IP, in silicon.

ExSLerate is FPGA-validated and shipping as licensable IP today. The next phase is silicon. Krsna is the prototype SoC we are building around the IP — the reference integration that demonstrates the full stack on a single die.

In market

Phase 01

IP available

ExSLerate IP

FPGA prototype validated. Available for licensing across the four configurations. Customer engagements active.

Underway

Phase 02

Krsna design

Reference SoC

Krsna prototype SoC under design. Demonstrates the full ExSLerate stack on a single die, end to end.

Planned

Phase 03

First silicon

Tapeout

Krsna goes to silicon. First samples back, brought up against the full ExSLerate compiler and runtime stack.

Beyond Krsna

Phase 04

Customer SoCs

Multiple integrations

Customer-defined SoCs built around licensed ExSLerate IP, in parallel. The IP is the product; Krsna is the proof.

ExSLerate, beyond Gen 1.

ExSLerate Gen 1 is the IP available today, targeted at endpoint and robotics-class designs. Future generations push the same IP family into SOHO server and data-center envelopes — advanced architectural enhancements for inter-die and intra-die compute clusters, bigger memory and bandwidth.

Available today

Gen 1

Endpoint & robotics

ExSLerate

Run 8B-class models on edge devices. Up to 50% DRAM traffic reduction at 8-bit precision. Four IP configurations from M64 to M4096.

Next gen

Gen 2

SOHO server

ExSLerate · server-class

Targets local 27B-class inference for enterprise RAG. Engineered to land on cost-effective hardware (24 GB GDDR6, 128-bit bus) instead of the 48 GB / 384-bit alternative.

Future

Gen 3

Data center

ExSLerate · DC-class

A100-class throughput envelope. Built for full-rack deployment in sovereign and private clouds.

// ARCHITECTURE EFFICIENCY · 27B MODEL TARGET · GEN 2 SOHO SERVER VS STANDARD
SpecificationStandard requirementExSLerate Gen 2
Required RAM48 GB GDDR624 GB GDDR62× smaller
Memory bus width384-bit (expensive)128-bitoptimized
Target applicationSOHO / local privacySOHO server · local RAG

Notes. ExSLerate is FPGA-validated NPU IP available for licensing. Krsna is the planned prototype SoC built around the IP. Detailed throughput, latency, power, and per-configuration benchmarks are released under NDA on an engagement basis. For the full performance dossier, contact sales@sandlogic.com.

// LET'S BUILD

License ExSLerate IP. Talk to us about your design.