Projects

Ongoing Projects

Efficient Hyperscale LLM Inference System based on Scale-out Context Memory (Basic Research Laboratory[2026.6~2029.5])

We investigate scale-out context memory architectures for large language models, treating context as a first-class system resource rather than a static data object. Our research spans memory systems, storage, networking, runtime scheduling, and power management to enable efficient management of massive context workloads across heterogeneous resources. The project aims to establish the foundation for future hyperscale AI inference platforms supporting long-context reasoning, personalization, and agent-based AI services.

Development of Ethernet Based GPU Cluster Network Fabric Systems and Optimization Technologies for Maximizing Network Efficiency in Large Scale Environments (IITP [2026.4~2029.3])

We are developing Ethernet-based GPU cluster network fabric system and optimization technologies to maximize network efficiency in large-scale GPU cluster environments. This three-year project is carried out in collaboration with Acryl Co., Ltd., Yonsei University, and Sungkyunkwan University.

Key research directions include:

Network fabric design for large-scale GPU clusters
Communication and transport optimization for distributed AI workloads
Performance analysis, monitoring, and evaluation of cluster network efficiency

Inference-over-Fabrics: A Kernel-Integrated Architecture for Remote AI Inference (NRF Mid-Career Researcher Program [2025.9~2028.8])

Artificial intelligence services increasingly rely on large language models (LLMs) and heterogeneous AI accelerators distributed across cloud and data center environments. This project proposes Inference-over-Fabrics (IoF), a kernel-integrated architecture that enables remote AI inference resources to be accessed as first-class system resources, analogous to the way NVMe-over-Fabrics virtualizes remote storage. By moving inference management into the operating system kernel, IoF aims to eliminate unnecessary user-kernel transitions, reduce communication overhead, and provide a unified, low-latency interface for remote AI accelerators. The project investigates kernel-level resource abstraction, high-performance request/response communication over RDMA and TCP, and lightweight remote inference servers, ultimately establishing a scalable foundation for next-generation cloud and edge AI infrastructure.