CS Seminar: Dr. Sarp Oral, August 4 2010, 13:40, FENS G035
Lessons Learned in Deploying the World’s Largest Scale Lustre File System
Dr. Sarp Oral
The Oak Ridge National Laboratory
Abstract:The
Spider system at the Oak Ridge National Laboratory’s Leadership
Computing Facility (OLCF) is the world’s largest scale Lustre parallel
file system. Envisioned as a shared parallel file system capable of
delivering both the bandwidth and capacity requirements of the OLCF’s
diverse computational environment, the project had a number of
ambitious goals. To support the workloads of the OLCF’s diverse
computational platforms, the aggregate performance and storage capacity
of Spider exceed that of OLCF’s previously deployed systems by a factor
of 6x - 240 GB/sec, and 17x - 10 Petabytes, respectively. Furthermore,
Spider supports over 26,000 clients concurrently accessing the file
system, which exceeds OLCF’s previously deployed systems by nearly 4x.
In addition to these scalability challenges, moving to a center-wide
shared file system required dramatically improved resiliency and
fault-tolerance mechanisms. Through a phased approach of research and
development, prototyping, deployment, and transition to operations,
this work has resulted in a number of insights into large-scale
parallel file system architectures. This presentation details our
efforts in designing, deploying, and operating Spider and particularly
focuses on reducing parallel file system journaling overheads.
Journaling is a
widely used technique to increase file system robustness against
metadata and/or data corruptions. While the overhead of journaling can
be masked by the page cache for small-scale, local file systems, we
found that Lustre's use of journaling for the object store
significantly impacted the overall performance of our large-scale
center-wide parallel file system. By requiring that each write request
wait for a journal transaction to commit, Lustre introduced
serialization to the client request stream and imposed additional
latency due to disk head movement (seeks) for each request.
This work provides a
head-to-head comparison of two significantly different approaches to
increasing the overall efficiency of the Lustre file system. First, a
hardware solution using external journaling devices to eliminate the
latencies incurred by the extra disk head seeks due to journaling is
presented. Second, a software-based optimization to remove the
synchronous commit for each write request, side-stepping additional
latency and amortizing the journal seeks across a much larger number of
requests is introduced. Both solutions have been implemented and
experimentally tested on the Spider storage system. Tests show both
methods considerably improve the write performance, in some cases up to
93%. Testing with a real-world scientific application showed a 37%
decrease in the number journal updates, each with an associated seek —
which translated into an average I/O bandwidth improvement of 56.3%.
In this
presentation, also will be covered our solutions to issues such as
network congestion, performance baselining and evaluation, and high
availability in a system with tens of thousands of components. Areas of
continued challenges, such as stressed metadata performance and the
need for file system quality of service alongside with efforts to
address them will also be discussed. Finally, operational aspects of
managing a system of this scale are discussed along with real-world
data and observations are presented.
Short Bio:Dr.
Sarp Oral is a Research Staff member at the Oak Ridge National
Laboratory (ORNL). He is a member of the Technology Integration Group
at the National Center for Computational Sciences (NCCS) at the ORNL.
His research and professional interests focus upon computer
benchmarking, parallel I/O and file systems, high-performance computing
and networking, system and storage area networks, computer
architecture, fault-tolerance, storage technologies, and networked
storage.
Prior to joining
NCCS, Dr. Oral served as a Systems Analyst at the NCI Inc. and as the
Associate Laboratory Director and a Research Associate at the
High-performance and Simulation (HCS) Laboratory at the University of
Florida (UF). Dr. Oral received his Ph.D. in Computer Engineering from
the University of Florida in 2003.
August 4 2010, 13:40, FENS G035