Abstract: Hybrid language models (HLMs) are inference-time architectures that combine the low-latency efficiency of small language models (SLMs) on clients (edge devices) with the high accuracy of ...