Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twitter (Alex Levenson)

My talk at Hadoop Summit 2015

Published in: Software, Data & Analytics, Engineering

Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twitter (Alex Levenson) from Alex Levenson

Transcript

1. L E S S O N S L E A R N E D AT T W I T T E R H A D O O P P E R F O R M A N C E O P T I M I Z AT I O N AT S C A L E A L E X L E V E N S O N | I A N O ' C O N N E L L | @ T H I S W I L LW O R K @ 0 X 1 3 8
2. DATA PLATFORM @TWITTER Develop, maintain, and support the core data processing libraries used at Twitter In a good position to make system-wide performance improvements Core Data Libraries Team
3. DATA PLATFORM @TWITTER Idiomatic functional Scala library for writing Hadoop map reduce Functional programming is a natural ﬁt for map reduce Compile time type checked Core Data Libraries Team github.com/twitter/scalding
4. DATA PLATFORM @TWITTER Columnar storage format for the Hadoop ecosystem Uses the Google Dremel column shredding and assembly algorithm Core Data Libraries Team APACHE PARQUET github.com/apache/parquet-mr
5. DATA PLATFORM @TWITTER Streaming map reduce for hybrid realtime / batch topologies Write once, execute in parallel on Storm / Heron (online) and Scalding (oﬄine) Core Data Libraries Team SUMMINGBIRD github.com/twitter/summingbird
6. Hadoop at Twitter Scale H A D O O P AT T W I T T E R
7. 300+PETABYTES OF DATA
8. 100k MAP REDUCE JOBS DAILY MULTIPLES OF
9. 1000+MACHINE HADOOP CLUSTERS MULTIPLE
10. LARGEST HADOOP CLUSTERS IN THE WORLD AMONG THE
11. At this scale, even small system-wide improvements can save signiﬁcant amounts of compute resources C O S T AT S C A L E
12. What does your Hadoop cluster spend most of its time doing? W H AT T O I M P R O V E ?
13. Proﬁle your cluster, you might be surprised by what you ﬁnd M E A S U R E - D O N ' T G U E S S
14. ENABLE JVM PROFILING WITH -XPROF Built into the JVM (HotSpot), so there's nothing to install Xprof: a low overhead proﬁler built into the jvm mapreduce.task.profile='true' mapreduce.task.profile.maps='0-' mapreduce.task.profile.reduces='0-' mapreduce.task.profile.params='-Xprof'

The world

Thursday, June 11, 2015

Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twitter (Alex Levenson)

Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twitter (Alex Levenson)

Transcript