Abstract: The rapid growth in demand for large language models (LLMs) has strained cloud-edge infrastructure. While edges offer low latency and clouds provide vast resources, scheduling LLM requests ...
Abstract: This paper addresses the data-locality-aware task assignment and scheduling problem for distributed job executions. Our goal is to minimize job completion times without prior knowledge of ...