Vault 2017: Full Schedule

Be sure to join us at Vault 2017 and register here!

11:00am EDT

Evolving Ext4 for Shingled Disks - Abutalib Aghayev, CMU & Theodore Ts'o, Google

Drive-Managed Shingled Magentic Recording (SMR) disks offer a higher capacity alternative to traditional disk drives. However, non-sequential workloads can show bi-modal behaviour. After a short period of high performance they enter a continuous period of low performance. We were able to make a small change (600 LOC) to ext4 that significantly improves the throughput in both modes, resulting in 2-13x improvements on metadata-heavy workloads, and 1.7-4.9x improvements on a file server benchmark. The changes also resulted in performance improvements on conventional disk drives.

Speakers

Abutalib Aghayev

Graduate Student, Carnegie Mellon University

Abutalib Aghayev is a PhD student in the Computer Science department at Carnegie Mellon University.

Theodore Ts'o

Staff Programmer, Google

Theodore Ts'o is the first North American Linux Kernel Developer, and started working with Linux in September, 1991. He previously served as CTO for the Linux Foundation, and is currently employed at Google. Theodore is a Debian Developer, and is the maintainer of the ext4 file system... Read More →

Wednesday March 22, 2017 11:00am - 11:50am EDT
Thomas Paine AB

Filesystems

Experience Level Advanced

12:00pm EDT

Improving Block Discard Support throughout the Linux Storage Stack - Christoph Hellwig

Flash based storage supports the concept of discarding data in blocks without actually overwriting, a concept that helps with the internal wear level and data placement algorithms. Linux has supported this concept, which has different names in different storage protocols (trim, unmap, deallocate), for a long time. But the concept of "online" or live discards that notify the device instantly after the deletion of data in the file system has only seen limited traction in Linux, mostly due to the severe performance degradation caused by it.

This talks explains optimizations to the file system and block layer to allow better batching and asynchronous execution of discard requests, as well how the file system block allocator can better be aware of ongoing discards.

It will also explore how discarding of data overlaps with fast zeroing operations, and why it really shouldn't at the interface level.

Speakers

Christoph Hellwig

Christoph Hellwig has been working on Linux Storage and File system projects for 15 years. He works all the way up and down the Storage and File system stack, and runs a business focused on Linux Storage architecture and training.

Wednesday March 22, 2017 12:00pm - 12:50pm EDT
William Dawes AB

Solid State

Experience Level Advanced

2:50pm EDT

The Extent of GFS2 - Steven Whitehouse, Red Hat

The GFS2 cluster filesystem has proved itself as a robust and reliable filesystem for cluster use cases. Most of the recent development has been focussed on performance improvments, and improving the ease of use and deployment. This talk will cover recent developments in GFS2, but also discuss future plans too. When GFS was originally designed, it took a lot of inspiration from ext2/3 which resulted in bitmap based resource groups, and an equal height pointer tree for inode metadata. The question arises as to whether and how GFS2 might integrate support for extents. The talk will present the issues involved and the latest thoughts of the development team on this topic.

Speakers

Steven Whitehouse

Senior Manager, RHEL Filesystems, Red Hat

Steven Whitehouse currently manages the RHEL Filesystems team at Red Hat. His introduction to Linux kernel development came in 1993 when he wrote a small patch for AX.25, he is also the previous maintainer of Linux DECnet and the GFS2 Filesystem. Steven has spoken at a number of conferences... Read More →

Wednesday March 22, 2017 2:50pm - 3:40pm EDT
Thomas Paine AB

Filesystems

Experience Level Advanced

5:00pm EDT

Predicting Storage Failures with Machine Learning - Ahmed El-Shimi, Minima

Disk drives fail at an average annual rate of ~2%. Any system with Availability and Durability requirements must mitigate for such failures through a redundancy technique such as RAID, Erasure Coding, Replication or Backup.

With the wealth of monitoring data available nowadays and the ability to process the data in near real-time, can we predict such failures? How well can we do it? And how would that impact how we design and operate large distributed systems?

We examine and motivate predictive failure detection in the context of Availability, Rebuild Times and Recovery Objectives of large systems. We then train and evaluate multiple models achieving favorable accuracy (97.5%) to common datacenter practices. We demonstrate how we can tune our learners to achieve different Precision and Recall objectives thus improving Availability, Protection or Operational Efficiency.

Speakers

Ahmed El-Shimi

Founder, Minima

Ahmed El-Shimi has worked in Storage, Distributed Systems, and Cloud for over 15 years. He built technologies such as Deduplication, Automated Tiering, Hybrid Cloud Storage and Data Awareness. He is currently Co-Founder of Minima Inc. a Cloud Data Governance Startup. Prior he led... Read More →

LF Vault 2017 aelshimi pdf

Wednesday March 22, 2017 5:00pm - 5:50pm EDT
Paul Revere C

Management

Experience Level Advanced

10:25am EDT

Campaign Storage - Gary Grider, Los Alamos National Laboratory

Computing sites need long-term retention of cool data often “data lakes” which focus on capacity but with non trivial bandwidth requirements. For many years, tape was the best economic solution but bandwidth and access needs have outstripped tape solutions. Disk can be more economically for this storage tier. The Cloud Community uses erasure based object stores to gain scalability and durability using commodity hardware. The Object Interface works for new applications but legacy applications utilize POSIX. Campaign Storage is a Near-POSIX File System using cloud storage for data and many POSIX file systems for metadata. Campaign Storage scales namespace metadata to trillions of files and billions of files in a single directory and files from 1 byte to Petabytes. This solution is now available commercially. This talk describes Campaign Storage motivation, design, and performance.

Speakers

Gary Grider

HPC Division Leader, Los Alamos National Laboratory (LANL)

Gary Grider is the Leader of the High Performance Computing (HPC) Division at Los Alamos National Laboratory. Los Alamos’ HPC Division operates one of the largest governmental supercomputing centers in the world focused on US National Security for the US/DOE National Nuclear Security... Read More →

Thursday March 23, 2017 10:25am - 11:15am EDT
Paul Revere C

Distributed

Experience Level Advanced

3:25pm EDT

Container Interfaces for Storage: Are We there Yet - James Bottomley, IBM Research

Many talks about containers start with Orchestration systems like Docker or Kubernetes. However, this one will look at the storage impacts on the actual in-kernel container API. With the addition of the superblock namespace (essentially a user namespace for the kernel to filesystem boundary) much of the stage is now set for fixing one of the biggest underlying container problems: that of translating unprivileged container writes into real filesystem uid/gids. This talk will examine how this system works, why it is necessary and what pieces still need to be added for orchestration systems to make use of it (yes, we'll also cover fully unprivileged Docker ... but only briefly).

Speakers

James Bottomley

Distinguished Engineer, IBM

James Bottomley is a Distinguished Engineer at IBM Research where he works on Cloud and Container technology. He is also Linux Kernel maintainer of the SCSI subsystem. He has been a Director on the Board of the Linux Foundation and Chair of its Technical Advisory Board. He went to... Read More →

Thursday March 23, 2017 3:25pm - 4:15pm EDT
Paul Revere AB

Containers

Experience Level Advanced