HCM 2023

Program

Opening Remarks 8:00 am - 8:10 am ET

Section 1 8:10 am - 9:50 am ET

Abstract:

Deep learning-based recommendation systems are resource-intensive and require large amounts of memory space to achieve high accuracy. To meet these demands, hyperscalers have scaled up their recommendation models to consume tens of terabytes of memory space. Additionally, these models must be fault-tolerant and trained for long periods without accuracy degradation. In this talk, we present TrainingCXL, an innovative solution that leverages CXL 3.0 to efficiently process large-scale RMs in disaggregated memory while ensuring training is failure-tolerant with low overhead. By integrating persistent memory (PMEM) and GPU as Type-2 devices in a cache-coherent domain, we enable direct access to PMEM without software intervention. TrainingCXL employs computing and checkpointing logic near the CXL controller to manage persistency actively and efficiently. To ensure fault tolerance, we use the unique characteristics of RMs to take checkpointing off the critical path of their training. We also employ an advanced checkpointing technique that relaxes the updating sequence of embeddings across training batches. The evaluation shows that TrainingCXL achieves significant performance improvements, including a 5.2x speedup and 72.6% energy savings compared to modern PMEM-based recommendation systems.

Speaker Bio:

Junhyeok Jang is a highly accomplished Ph.D. candidate under the supervision of Prof. Myoungsoo Jung at CAMELab of KAIST. His research expertise is focused on the cutting-edge field of hardware and software co-design for large-scale machine learning applications, with a particular emphasis on recommendation systems and graph neural networks (GNNs). Mr. Jang's extensive work in this field has led to numerous breakthroughs, including his pioneering research in developing TrainingCXL, a highly efficient and failure-tolerant system for processing large-scale recommendation models in disaggregated memory pools.

Abstract: As memory systems are becoming increasingly heterogeneous with different latencies, bandwidths, and device characteristics, we are met with the question: where to place data and when to move data? In this talk, I will discuss the problems with using traditional caching for data tiering, and I will show how this technique is inappropriate for heterogeneous memory systems. Further, I will discuss some recent and ongoing work from the UC Davis Computer Architecture Research Group (DArchR) that uses software hints to enable both transparent and application-specific data movement showing large benefits over traditional hardware caching.

Speaker Bio:

Jason Lowe-Power is an Assistant Professor at University of California, Davis where he leads the Davis Computer Architecture Research Lab (DArchR). His research interests include optimizing data movement in heterogeneous systems, hardware support for security, and simulation infrastructure. Professor Lowe-Power is also the Chair of the Project Management Committee for the gem5 open-source simulation infrastructure. He received his PhD in 2017 from the University of Wisconsin, Madison, and received an NSF CAREER Award and a Google Research Scholar Award.

Abstract: Cloud providers seek to deploy CXL-based memory pools to reduce fragmentation as well as the footprint of their embedded carbon. However, the design space of CXL-based memory systems is large. Key questions center around the size, reach, topology, and cost of the memory pool. Pooling also requires navigating complex design constraints around performance, virtualization, and management. This talk discusses why cloud providers are working to deploy CXL memory pools, key design constraints, and observations in designing towards practical deployment.

Speaker Bio:

Daniel S. Berger is a Senior Researcher at Microsoft Azure Systems Research and an Affiliate Assistant Professor at the University of Washington. His research focuses on improving memory efficiency, sustainability, and robustness in public clouds. He is the recipient of the 2018-2019 Mark Stehlik Postdoctoral Fellowship at Carnegie Mellon University and the 2021 ACM SOSP Best Paper Award.

Abstract:

Resource disaggregation offers a cost effective solution to resource scaling, utilization, and failure-handling in data centers by physically separating hardware devices in a server. Servers are architected as pools of processor, memory, and storage devices, organized as independent failure-isolated components interconnected by a high-bandwidth network. A critical challenge, however, is the high performance penalty of accessing data from a remote memory module over the network. Addressing this challenge is difficult as disaggregated systems have high runtime variability in network latencies/bandwidth, and page migration can significantly delay critical path cache line accesses in other pages. In this talk, we present a characterization analysis on different data movement strategies in fully disaggregated systems and discuss their performance overheads in a variety of workloads. Then, we describe our new adaptive and software-transparent mechanism that can significantly alleviate data movement overheads in fully disaggregated memory systems. We demonstrate how our proposed hardware mechanism achieves high system performance and robustness in data movement across a wide variety of emerging workloads at different network and architecture configurations. We conclude by providing future research directions on designing intelligent architectures and adaptive approaches for modern computing systems.

Speaker Bio:

Christina Giannoula is a Postdoctoral Researcher at the University of Toronto working with Prof. Gennady Pekhimenko and the EcoSystem research group. She is also working with the SAFARI research group, which is led by Prof. Onur Mutlu. She received her Ph.D. in October 2022 from School of Electrical and Computer Engineering (ECE) at the National Technical University of Athens (NTUA) advised by Prof. Georgios Goumas, Prof. Nectarios Koziris and Prof. Onur Mutlu. Her research interests lie in the intersection of computer architecture, computer systems and high-performance computing. Specifically, her research focuses on the hardware/software co-design of emerging applications, including graph processing, pointer-chasing data structures, machine learning workloads, and sparse linear algebra, with modern computing paradigms, such as large-scale multicore systems, disaggregated memory systems and near-data processing architectures. She has several publications and awards for her research on these topics.

Abstract:

CXL is a dynamic multi-protocol interconnect technology designed to support accelerators and memory devices. CXL provides a rich set of protocols that include I/O semantics similar to PCIe (i.e., CXL.io), caching protocol semantics (i.e., CXL.cache), and memory access semantics (i.e., CXL.mem) over PCIe PHY. CXL 2.0 specification enabled additional usage models beyond CXL 1.1, while being fully backwards compatible with CXL 1.1 (and CXL 1.0). CXL 2.0 enables dynamic resource allocation including memory and accelerator dis-aggregation across multiple domains. It enables switching, managed hot-plug, security enhancements, persistence memory support, memory error reporting, and telemetry. CXL 3.0 adds new fabric capabilities to build large scale-out systems while doubling the bandwidth with full backwards compatibility to CXL 1.0 and CXL 2.0. The availability of commercial IP blocks, Verification IPs, and industry standard internal interfaces enables CXL to be widely deployed across the industry. These along with a well-defined compliance program will ensure smooth interoperability across CXL devices in the industry.

Speaker Bio:

Dr. Debendra Das Sharma is an Intel Senior Fellow in the Data Platforms and Artificial Intelligence Group and chief architect of the I/O Technology and Standards Group at Intel Corporation. He drives PCI Express, Compute Express Link (CXL), Intel’s Coherency interconnect, and multichip package interconnect. He is a member of the Board of Directors of PCI-SIG and a lead contributor to PCIe specifications since its inception. He is a co-inventor and founding member of the CXL consortium and co-leads the CXL Technical Task Force. He is also the co-inventor of Universal Chiplet Interconnect Express (UCIe) and chairs the 100+ member UCIe consortium. Dr. Das Sharma holds 160+ US patents and is a frequent keynote speaker, distinguished lecturer, invited speaker, and panelist at the Hot Interconnects, PCI-SIG Developers Conference, CXL consortium events, Open Server Summit, Open Fabrics Alliance, Flash Memory Summit, SNIA SDC, and Intel Developer Forum. He has a B.Tech in Computer Science and Engineering from the Indian Institute of Technology, Kharagpur and a Ph.D. in Computer Engineering from the University of Massachusetts, Amherst. He has been awarded the Distinguished Alumnus Award from Indian Institute of Technology, Kharagpur in 2019, the IEEE Region 6 Outstanding Engineer Award in 2021, the first PCI-SIG Lifetime Contribution Award in 2022, and the IEEE Circuits and Systems Industrial Pioneer Award in 2022.

Abstract: CXL-based memory expansion decouples CPU and memory within a single server and enables flexible server design with different generations and types of memory technologies. It can balance the fleet-wide resource utilization and address the memory bandwidth and capacity scaling challenges in hyperscale datacenters. Without efficient memory management, however, such systems can significantly degrade application-level performance. We propose a novel OS-level application-transparent page placement mechanism (TPP) for efficient CXL-memory management. TPP employs lightweight mechanisms to identify and place hot and cold pages to appropriate memory tiers. It enables page allocation to work independently from page reclamation logic which is tightly-coupled in today's Linux kernel. At the same time, TPP can promptly promote performance-critical hot pages trapped in the slow memory tiers to the fast tier node. Both promotion and demotion mechanisms work transparently without prior knowledge of an application's memory access behavior. TPP improves Linux's performance by up to 18% and outperforms state-of-the-art solutions for tiered memory by 10–17%. TPP has been actively being used in Meta datacenter for over a year and parts of it have been merged to the Linux kernel since v5.18.

Coffee Break 10:05 am - 10:20 am ET

Section 2 10:20 am - 12:20 pm ET

Abstract:

Memory is one of the most expensive, yet over-provisioned and underutilized resources in current data centers. Systems for remote memory aim to improve memory utilization allowing memory pooling across a rack of hosts. In this talk, I will provide a high-level overview of the research in this space, and discuss a taxonomy of remote and disaggregated memory systems architected at different levels of the computing stack. I will discuss the benefits that these systems can provide for emerging ML applications and cloud operators, as well as tradeoffs between the various approaches.

Speaker Bio:

Irina Calciu is a co-founder at Graft, a cloud-native startup that makes the AI of the 1% accessible to the 99%. Irina is broadly interested in machine learning systems, as well as parallel and distributed systems, with a focus in algorithms and systems for rack-scale computing. Before Graft, Irina was a Sr. Researcher at VMware Research, working on novel software-hardware co-design solutions for memory disaggregation. Irina completed her PhD at Brown University, working with Maurice Herlihy and Justin Gottschlich (Intel Labs) on algorithms for non-uniform memory access (NUMA) architectures and hybrid transactional memory. Irina has co-authored papers at top conferences, obtaining Best Paper awards at ASPLOS and TRANSACT, and holds more than 15 issued patents. She served as a program co-chair for ATC 2021 and on numerous program committees for top systems conferences, including OSDI, ASPLOS and ATC.

Panelists Bio:

Manoj Wadekar is a Hardware Systems Technologist driving storage and memory technology and roadmaps at Meta. Manoj has been designing and building servers, storage, and network solutions for over 30 years. He is leading the Composable Memory Systems group in OCP. Manoj has evangelized Memory and Storage Disaggregation, NVMe over Fabric, Lossless Ethernet (DCB/CEE) in the industry conferences. Before joining Meta, he held engineering positions at eBay, QLogic and Intel.

Michele Gazzetti is a Research Software Engineer part of IBM Research Europe. His research interests include control plane software management for composable systems and performance evaluation of workloads leveraging composable resources. Michele is also an active member in the OpenFabrics Management Framework Working Group part of the OpenFabrics Alliance.

Jason Lowe-Power is an Assistant Professor at University of California, Davis where he leads the Davis Computer Architecture Research Lab (DArchR). His research interests include optimizing data movement in heterogeneous systems, hardware support for security, and simulation infrastructure. Professor Lowe-Power is also the Chair of the Project Management Committee for the gem5 open-source simulation infrastructure. He received his PhD in 2017 from the University of Wisconsin, Madison, and received an NSF CAREER Award and a Google Research Scholar Award.

Daniel S. Berger is a Senior Researcher at Microsoft Azure Systems Research and an Affiliate Assistant Professor at the University of Washington. His research focuses on improving memory efficiency, sustainability, and robustness in public clouds. He is the recipient of the 2018-2019 Mark Stehlik Postdoctoral Fellowship at Carnegie Mellon University and the 2021 ACM SOSP Best Paper Award.

Workshop for Heterogeneous and Composable Memory

February 26th, Montreal, Co-located with HPCA 2023

Welcome to the First Workshop for Heterogeneous and Composable Memory (HCM 2023)!

The workshop on HCM aims to deepen our knowledge on HCM and bring together researchers in academia and industry to share early discoveries, successful examples, and opinions on opportunities and challenges regarding HCM.

Program

Opening Remarks 8:00 am - 8:10 am ET

Section 1 8:10 am - 9:50 am ET

Invited talk: Junhyeok Jang, KAIST - Training Resilience with Persistent Memory Pooling using CXL Technology

Extended Abstract

Extended Abstract

Invited Talk: Jason Lowe-power, UC Davis - Managing Memory Smartly

Invited Talk: Invited Talk: Daniel S. Berger, Microsoft - Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms

Invited Talk: Christina Giannoula, University of Toronto - Efficient Data Movement in Fully Disaggregated Memory Systems

Extended Abstract

Extended Abstract

Invited Talk: Debendra Das Sharma, Intel - Compute Express Link (CXL): A game-changer in the Data Center

Invited Talk: Hasan Al Maruf, University of Michigan - TPP: Transparent Page Placement for CXL-Enabled Tiered Memory

Coffee Break 10:05 am - 10:20 am ET

Section 2 10:20 am - 12:20 pm ET

Keynote: Irina Calciu, Graft - A Trip Down (Heterogeneous and Disaggregated) Memory Lane

Panel: Research Challenges in Heterogeneous and Composable Memory Systems

Panelists: Jason Lowe-power (UC Davis), Daniel S. Berger (Microsoft), Michele Gazzetti (IBM), Manoj Wadekar (Meta), Debendra Das Sharma (Intel), Hasan Al Maruf (University of Michigan)

Attending

Venue:

Hotel Bonaventure Montreal

900 Rue De La Gauchetière O, H5A 1E4, Quebec, Montreal, Canada

Date:

Feb 26 (Sunday) morning

Call for Papers

The workshop papers will be related with but not limited to the following topics:

Submission

SUBMIT YOUR WORK

Important Dates

Submission deadline: Dec 19, 2022 (the deadline has been extended)

Review Assigned: Dec 20, 2022

Notification: Jan 25, 2023

Workshop date: Feb 26, 2023

Organizers

Dong Li, University of California Merced

Hyeran Jeon, University of California Merced

Jie Ren, College of William and Mary

Program Committee

Welcome to the First Workshop for Heterogeneous and Composable Memory (HCM 2023)!

The workshop on HCM aims to deepen our knowledge on HCM and bring together researchers in academia and industry to share early discoveries, successful examples, and opinions on opportunities and challenges regarding HCM.

Program

Opening Remarks 8:00 am - 8:10 am ET

Section 1 8:10 am - 9:50 am ET

Invited talk: Junhyeok Jang, KAIST - Training Resilience with Persistent Memory Pooling using CXL Technology Extended Abstract

Extended Abstract

Invited Talk: Jason Lowe-power, UC Davis - Managing Memory Smartly

Invited Talk: Invited Talk: Daniel S. Berger, Microsoft - Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms

Invited Talk: Christina Giannoula, University of Toronto - Efficient Data Movement in Fully Disaggregated Memory Systems Extended Abstract

Extended Abstract

Invited Talk: Debendra Das Sharma, Intel - Compute Express Link (CXL): A game-changer in the Data Center

Invited Talk: Hasan Al Maruf, University of Michigan - TPP: Transparent Page Placement for CXL-Enabled Tiered Memory

Coffee Break 10:05 am - 10:20 am ET

Section 2 10:20 am - 12:20 pm ET

Keynote: Irina Calciu, Graft - A Trip Down (Heterogeneous and Disaggregated) Memory Lane

Panel: Research Challenges in Heterogeneous and Composable Memory Systems

Panelists: Jason Lowe-power (UC Davis), Daniel S. Berger (Microsoft), Michele Gazzetti (IBM), Manoj Wadekar (Meta), Debendra Das Sharma (Intel), Hasan Al Maruf (University of Michigan)

Attending

Venue:

Hotel Bonaventure Montreal

900 Rue De La Gauchetière O, H5A 1E4, Quebec, Montreal, Canada

Date:

Feb 26 (Sunday) morning

Call for Papers

The workshop papers will be related with but not limited to the following topics:

Submission

SUBMIT YOUR WORK

Important Dates

Submission deadline: Dec 19, 2022 (the deadline has been extended) Review Assigned: Dec 20, 2022 Notification: Jan 25, 2023 Workshop date: Feb 26, 2023

Organizers

Dong Li, University of California Merced Hyeran Jeon, University of California Merced Jie Ren, College of William and Mary

Program Committee

Invited talk: Junhyeok Jang, KAIST - Training Resilience with Persistent Memory Pooling using CXL Technology

Extended Abstract

Invited Talk: Christina Giannoula, University of Toronto - Efficient Data Movement in Fully Disaggregated Memory Systems

Extended Abstract

Submission deadline: Dec 19, 2022 (the deadline has been extended)

Review Assigned: Dec 20, 2022

Notification: Jan 25, 2023

Workshop date: Feb 26, 2023

Dong Li, University of California Merced

Hyeran Jeon, University of California Merced

Jie Ren, College of William and Mary