We investigate scale-out context memory architectures for large language models, treating context as a first-class system resource rather than a static data object. Our research spans memory systems, storage, networking, runtime scheduling, and power management to enable efficient management of massive context workloads across heterogeneous resources. The project aims to establish the foundation for future hyperscale AI inference platforms supporting long-context reasoning, personalization, and agent-based AI services.
We are developing Ethernet-based GPU cluster network fabric system and optimization technologies to maximize network efficiency in large-scale GPU cluster environments. This three-year project is carried out in collaboration with Acryl Co., Ltd., Yonsei University, and Sungkyunkwan University.
Key research directions include: