Efficient LLM offloading

Pipeline scheduling for mobile GPUs

  • Role
    First author
  • Timeline
    2024
  • Stack
    CUDA, PyTorch, FlexGen, TensorRT
LLM hero

Overview

Research findings

Impact