What does "Any AI" mean at SandLogic?

"Any AI" is a chip-level architectural claim — the Krsna SoC, powered by ExSLerate V2 IP and CORE (the compiler + runtime engine inside EdgeMatrix), runs any AI workload because the silicon was co-designed with the compiler that targets it. Most AI chips lock device-makers to either CNN or LLM workloads. Krsna runs both today (plus Speech and State Space architectures), and the operator set is engineered to absorb the architectures that emerge next without silicon redesign.

How does CORE differ from EdgeFlow?

EdgeMatrix has two layers. CORE is the compiler + runtime engine — it recognizes the model architecture (transformer attention vs SSM scan vs CNN convolution vs RNN recurrence) and emits the appropriate kernel sequence for the target silicon. EdgeFlow is the inference acceleration engine — the layer where models actually execute, where token throughput is optimized, where the +73% over vLLM benchmarks live. CORE handles "what runs where"; EdgeFlow handles "how fast it runs."

How does Krsna absorb new AI architectures without silicon redesign?

Four engineered mechanisms: (1) architecture-aware compilation — CORE emits different kernel sequences per architecture family, so new architectures slot in as new compilation paths; (2) operator-set discipline — the chip ships the union of operators current and emerging architectures need (matmul, attention, conv, layernorm, scan, recurrence, gating), so the cost of new architecture support sits on the compiler side; (3) co-design with EdgeFlow — when the inference layer absorbs RWKV, LFMs, or the next architecture, dispatch extends through CORE to Krsna without silicon changes; (4) disciplined production scope — we keep theoretical extensibility and production claims distinct.

// WHY SANDLOGIC · ANY AI

One chip. Any AI.

Name: Any AI — Krsna + CORE Architectural Capability
Brand: SandLogic
Availability: InStock

Any AI is not a runtime claim — it's a chip-level claim. The Krsna SoC architecturally runs any AI workload because the silicon was co-designed with CORE, the compiler + runtime engine that maps any model family down to the operator set. Most AI chips lock you to LLMs or CNNs. Krsna runs both today, and the architectures that emerge next.

Production model families

Krsna configurations

Compiler + runtime engine

CORE

Silicon IP

ExSLerate V2

// THE PROBLEM CHIPS TODAY SOLVE BADLY

Pick a chip. Pick an architecture. Pick wrong.

The AI silicon market today asks device-makers to commit, at design time, to which model family the chip will run. NPUs optimized for CNN inference are awkward on transformers. AI accelerators built for LLMs treat vision workloads as second-class. State-space models and the architectures that haven't shipped yet are nowhere on most roadmaps.

// PROBLEM 01

CNN-only OR LLM-only — never both.

The BOM decision today forces device-makers to pick the workload family up front. A smart-TV chip that does vision can't do conversational AI. An LLM accelerator can't run YOLO with any throughput.

// PROBLEM 02

Today's architecture isn't tomorrow's.

Mamba broke transformer dominance in late 2023. RWKV-7 dropped in 2025. Liquid Foundation Models are emerging through 2026. Each new architecture demands different kernels — and most chips can't absorb them.

// PROBLEM 03

Silicon design cycles vs AI architecture cycles.

Silicon takes 18–36 months to design and tape out. AI architectures shift every 6–12 months. A chip designed in 2024 to run "what's hot now" is a chip that can't run what ships in 2027.

// THE ARCHITECTURE

Silicon + compiler, co-designed.

"Any AI" is the property that falls out when you co-design the chip with the compiler-runtime layer that targets it. ExSLerate V2 (silicon) and CORE (compiler + runtime engine inside EdgeMatrix) are engineered together. The silicon ships the operator-set superset that current and emerging architectures need; CORE handles the dispatch.

/ SILICON

ExSLerate V2 inside Krsna SoC

Four chip configurations (Lite to Apex). MAC arrays, on-die memory, and operator support engineered as the union of what current and emerging AI architectures need. Two patented engines inside: Dynamic Neural Compression and the Infinite Series Engine (non-linear math in-datapath).

Krsna SoC architecture →

/ COMPILER + RUNTIME ENGINE

CORE (inside EdgeMatrix)

The compiler + runtime engine that recognizes the incoming model architecture (transformer / SSM / RNN / CNN), selects the appropriate kernel sequence, and emits silicon-ready code. Built on IREE / MLIR — open frontends, no vendor lock. Same toolchain whether you target Krsna or third-party silicon.

EdgeMatrix · CORE layer →

// PRODUCTION SCOPE TODAY

Four model families. All four, on one chip.

What the chip runs in real products today. Built for what ships, not for theoretical coverage — a discipline that makes the claim defensible and the integration straightforward.

PRODUCTION

LLM / SLM

Llama · Shakti · Qwen · Gemma

Transformer-class language models. The architecture that dominates current enterprise AI workloads. Production support across the variant family.

PRODUCTION

Speech AI

Sruthi · Svara · Moonshine · Whisper

STT and TTS pipelines. The architecture that voice agents, dictation, and translation workloads depend on. End-to-end on-chip support.

PRODUCTION

Computer Vision

ResNet · YOLO · VGG · MobileNet

CNN inference. The architecture every camera-class device runs. Most LLM accelerators treat CNNs as an afterthought — Krsna treats both as first-class.

PRODUCTION

State Space Models

Mamba · Jamba · Mamba-2

Linear-time recurrence. The architecture that broke the transformer monopoly in 2023. SambaASR — our Mamba-based speech model — already proves the chip's SSM dispatch.

// CORE DISPATCH COVERAGE

What CORE dispatches, where.

CORE is the compiler + runtime engine that translates any model family down to silicon. Eight architecture families × five silicon platforms = forty dispatch paths. We won't claim every path is production-ready — but most are, and the rest are honestly labeled.

Each cell encodes CORE's dispatch maturity. Production = live customer deployments. Beta = customer pilots. Supported = CORE handles the architecture, no customer scenarios yet. Research = engineering validation only. Roadmap = planned.

// FOOTNOTE · DIFFUSION ON EDGE

Diffusion is marked roadmap on ARM and Qualcomm — not because CORE can't dispatch the workload, but because the preprocessing pipeline (T5-XXL / CLIP text encoders) carries a memory footprint that exceeds edge silicon budgets. Architectural mismatch at the silicon layer, not a CORE gap.

// FOOTNOTE · "SUPPORTED" vs "PRODUCTION"

Cells marked supported (indigo) mean CORE handles the model × silicon combination today, but we don't yet have named customer deployments on that combination. Editorial discipline: "production" requires a customer scenario.

// MODEL ARCHITECTURE COVERAGE

Eight architecture families. One compiler-runtime.

Each architecture family wants different things from CORE — KV cache for transformers, scan kernels for SSMs, recurrent loops for RWKV, ODE solvers for LFMs, conv passes for CNNs. CORE handles the dispatch end to end.

Transformers

PRODUCTION

Llama · Qwen · Mistral · Shakti · GPT-class · DeepSeek

The dominant LLM architecture today. ISE supports the full attention-mechanism family with hybrid KV-cache reuse — the optimization that drives EdgeMatrix's +73% throughput lift on L40s.

Vision-Language Models (VLM)

PRODUCTION

Shakti-VLM · Qwen2-VL · LLaVA · MiniCPM-V

Multimodal architectures combining vision encoders with transformer decoders. ISE handles VLM workloads natively, where vLLM and TensorRT-LLM still require workarounds.

State Space Models

PRODUCTION

Mamba · Mamba-2 · Jamba (hybrid)

Linear-time alternatives to attention. SambaASR — our Mamba-based speech model — runs natively on ISE with 4× throughput vs Whisper. Jamba (Mamba+Transformer hybrid) handles long-context workloads.

Linear Attention (RWKV)

PRODUCTION

RWKV-4 · RWKV-5 (Eagle) · RWKV-6 (Finch) · RWKV-7 (Goose)

RNN-style models with constant memory and linear time complexity. ISE supports the RWKV family for ultra-long-context and on-device workloads where transformer memory cost is prohibitive.

Liquid Foundation Models

BETA

LFM-1B · LFM-3B · LFM-40B (Liquid AI)

Continuous-time neural network architectures. ISE has end-to-end support for the LFM family — important for time-series, control, and reasoning workloads where Liquid AI is pushing state-of-the-art.

Convolutional Networks (CNN)

PRODUCTION

ResNet · EfficientNet · YOLO · MobileNet · ConvNeXt

Classical vision architectures. Most AI chips serve EITHER CNNs OR LLMs — ISE serves both on the same runtime. Critical for device-makers who need vision today and LLMs tomorrow without replacing silicon.

Mixture of Experts (MoE)

PRODUCTION

Mixtral 8×7B · DeepSeek-V3 · DBRX

Sparse routing architectures where only a fraction of parameters activate per token. ISE handles expert routing and KV-cache across MoE layouts without bespoke per-model engineering.

Diffusion & Image Generation

SUPPORTED

Stable Diffusion · SDXL · Flux · custom UNet

Iterative denoising architectures for image and video generation. ISE's runtime supports diffusion, but customer deployments are still pending. Architectural note: preprocessing pipelines (T5-XXL, CLIP encoders) carry large memory footprints — diffusion is well-suited to data-center silicon (NVIDIA, AMD, Intel) but not to true-edge processors where the preprocessor alone exceeds the memory budget.

// EXTENSIBILITY

Built for the next architecture.

The four families above are what ships today. The architectural promise of "Any AI" is what makes the chip absorb what ships tomorrow — without redesigning the silicon. Four mechanisms, all engineered in.

// MECHANISM 01

Architecture-aware compilation

CORE recognizes the model architecture at compile time — transformer attention vs SSM scan vs CNN convolution vs RNN recurrence — and emits the appropriate kernel sequence for the silicon. New architectures slot in as new compilation paths, not as silicon rework.

// MECHANISM 02

Operator-set discipline

The chip's operator set is engineered to be the union of what current architectures need (matmul, attention, conv, layernorm, activation) — and the primitives future architectures will need (scan, recurrence, gating). The cost of supporting the next architecture is on the compiler side, not the silicon side.

// MECHANISM 03

Co-designed with EdgeFlow

When the inference layer (EdgeFlow) absorbs a new model family — RWKV, Liquid Foundation Models, the next thing — the same dispatch model extends down through CORE to the Krsna chip. Co-design means new model coverage propagates without silicon redesign.

// MECHANISM 04

Disciplined production scope

The four families above are what production customers run today — not theoretical coverage. We say four production today; we say extensible for tomorrow. We do not conflate the two. That discipline is itself a feature of the program.

// THE EDITORIAL DISCIPLINE

What we mean by "Any AI."

"Any AI" is a property of the chip's architecture + CORE — the compiler-runtime engine that dispatches model families to silicon. The matrix above is CORE's dispatch coverage. It is not a claim about inference throughput or token economics (that's EdgeFlow's domain). It is not a claim that every model runs at peak performance on every chip variant (Lite obviously won't run a 70B-class model).

Production scope (four families on chip today) and architectural promise (CORE dispatches the architectures that emerge tomorrow) are kept distinct — by design. Discipline at the claim layer is the foundation that lets the chip claim work.

// RELATED SURFACES

Where "Any AI" connects to the rest of the stack.

/krsna — the SoC product. Four chip configurations, two engines (DNC + Infinite Series Engine), 128k tokens on 8GB endpoint.
/exslerate — the licensable IP behind Krsna. ARM-style licensing for AI silicon.
/edgematrix — the umbrella product. CORE (compiler + runtime engine) + EdgeFlow (inference engine).
/edgeflow — the inference engine. 193 models pre-tuned, multi-silicon coverage, token optimization at runtime.
/token-economy — the business outcome. How chip + CORE + EdgeFlow + HaluMon + LingoForge together prevent ~23% token leakage and unlock 30–40% structural cost reduction.

// LET'S BUILD

One chip. Every AI workload.

Talk to engineering See the chip