Vault 2017 has ended
Be sure to join us at Vault 2017 and register here
Back To Schedule
Wednesday, March 22 • 5:00pm - 5:50pm
Predicting Storage Failures with Machine Learning - Ahmed El-Shimi, Minima

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Disk drives fail at an average annual rate of ~2%. Any system with Availability and Durability requirements must mitigate for such failures through a redundancy technique such as RAID, Erasure Coding, Replication or Backup.

With the wealth of monitoring data available nowadays and the ability to process the data in near real-time, can we predict such failures? How well can we do it? And how would that impact how we design and operate large distributed systems?

We examine and motivate predictive failure detection in the context of Availability, Rebuild Times and Recovery Objectives of large systems. We then train and evaluate multiple models achieving favorable accuracy (97.5%) to common datacenter practices. We demonstrate how we can tune our learners to achieve different Precision and Recall objectives thus improving Availability, Protection or Operational Efficiency.

avatar for Ahmed El-Shimi

Ahmed El-Shimi

Founder, Minima
Ahmed El-Shimi has worked in Storage, Distributed Systems, and Cloud for over 15 years. He built technologies such as Deduplication, Automated Tiering, Hybrid Cloud Storage and Data Awareness. He is currently Co-Founder of Minima Inc. a Cloud Data Governance Startup. Prior he led... Read More →

Wednesday March 22, 2017 5:00pm - 5:50pm EDT
Paul Revere C