Past event
Computer Science PGR Seminar Jiawei Luo
Jiawei Luo will present Multi-DNN Inference of Sparse Models on Edge SoCs
Abstract: Modern edge applications increasingly require the concurrent execution of multiple deep neural networks (DNNs) on heterogeneous system-on-chips (SoCs). These systems must dynamically balance accuracy and latency under diverse and evolving service-level objectives (SLOs), while efficiently utilizing heterogeneous processors such as CPUs, GPUs, and NPUs. However, existing multi-DNN inference systems typically rely on a limited set of model variants per task, which restricts their ability to adapt to varying SLO requirements and often results in high SLO violation rates and low inference throughput.
In this work, we present SparseLoom, a multi-DNN inference system that expands the design space of model variants through a training-free technique called model stitching. By recombining subgraphs from different sparse variants derived from the same base model, SparseLoom constructs a significantly richer set of candidate models without incurring additional training cost . This enlarged variant space enables more flexible adaptation to diverse SLO configurations.
To make this approach practical, SparseLoom integrates three key components—a performance profiler, a joint optimizer for variant selection and processor placement, and a memory-aware subgraph preloader—to efficiently manage the expanded stitched-variant space. As a result, SparseLoom achieves lower SLO violation rates and higher inference throughput over state-of-the-art multi-DNN inference systems.
Bio: Jiawei Luo is a second-year PhD student under supervision of Prof Blesson Varghese and Prof Simon Dobson. His research interests lie in edge computing and machine learning system.