Engineered first.
Published second.
We publish what we build. Three research papers on arXiv, fifteen filed patents (ten at PCT stage), and 47 internally documented patentable innovations across the platform. Research as accountability — not marketing.
Papers, preprints, engineering notes.
Shakti-2.5B: A Sovereign Small Language Model
Architecture and training methodology for a 2.5B-parameter Indic-optimized SLM matching frontier models 3× its size.
Shakti-VLM: Vision-Language Models for the Edge
A 1B and 4B parameter VLM family with QK-normalization and hybrid normalization, surpassing Qwen2VL-7B on multimodal benchmarks.
Edge Fine-tuning: Adapting Foundation Models in Resource-Constrained Environments
A practical methodology for fine-tuning sub-4B models on edge hardware without quality degradation.
Benchmarking LMCache vs EdgeMatrix: Why Caching Alone Is Not Enough
Why hybrid KV-cache reuse beats prefix-only caching in multi-tenant inference workloads.
47 innovations.
Filed and rising.
We undertook a structured IP-mining exercise in 2024. Fifteen patents filed. Ten already at PCT phase. The remaining are within the standard 12-month PCT filing window. Filing cadence is accelerating.
From idea → filing → PCT.
Optimized Real-Time NLP on Edge Devices in Resource-Constrained Environments
Tensulator + Tensor Codec — Dynamic Neural Compression in ExSLerate V2
Hybrid KV-Cache Reuse Architecture for LLM Inference
Spatial Programming Compiler for AI Accelerators
HaluMon — Hallucination Detection via Multi-Metric Scoring
Every layer of the stack protected.
The 4 representative patents above span the full stack — from silicon-aware compiler design (L01) to hallucination detection at the platform layer (L03). The remaining 11 filed (and 32 in pipeline) distribute across all five layers.
Reproducible. By design.
Every performance number we publish includes the model version, hardware, batch size, sequence length, and methodology. Comparisons are run on public datasets with peer-reviewed harnesses (DeepEval, lm-evaluation-harness, OpenCompass).
Hardware reference
NVIDIA A100 (80 GB), L40s, H100 · AMD MI300 · Intel Gaudi · Krsna simulator
Frameworks compared
vLLM 0.10.2, TensorRT-LLM 1.0.0, SGLang, FlashInfer, llama.cpp
Benchmark suites
MMLU, GSM8K, MATH, HumanEval, MedQA, MMMU, DocVQA, OCRBench, ChartQA
Methodology
Q4_KM quantization · 5-shot for aggregate · 0-shot for chain-of-thought · 3-run average