Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twitter (Alex Levenson)
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twitter (Alex Levenson) from Alex Levenson
Transcript
- 1. L E S S O N S L E A R N E D AT T W I T T E R H A D O O P P E R F O R M A N C E O P T I M I Z AT I O N AT S C A L E A L E X L E V E N S O N | I A N O ' C O N N E L L | @ T H I S W I L LW O R K @ 0 X 1 3 8
- 2. DATA PLATFORM @TWITTER Develop, maintain, and support the core data processing libraries used at Twitter In a good position to make system-wide performance improvements Core Data Libraries Team
- 3. DATA PLATFORM @TWITTER Idiomatic functional Scala library for writing Hadoop map reduce Functional programming is a natural fit for map reduce Compile time type checked Core Data Libraries Team github.com/twitter/scalding
- 4. DATA PLATFORM @TWITTER Columnar storage format for the Hadoop ecosystem Uses the Google Dremel column shredding and assembly algorithm Core Data Libraries Team APACHE PARQUET github.com/apache/parquet-mr
- 5. DATA PLATFORM @TWITTER Streaming map reduce for hybrid realtime / batch topologies Write once, execute in parallel on Storm / Heron (online) and Scalding (offline) Core Data Libraries Team SUMMINGBIRD github.com/twitter/summingbird
- 6. Hadoop at Twitter Scale H A D O O P AT T W I T T E R
- 7. 300+PETABYTES OF DATA
- 8. 100k MAP REDUCE JOBS DAILY MULTIPLES OF
- 9. 1000+MACHINE HADOOP CLUSTERS MULTIPLE
- 10. LARGEST HADOOP CLUSTERS IN THE WORLD AMONG THE
- 11. At this scale, even small system-wide improvements can save significant amounts of compute resources C O S T AT S C A L E
- 12. What does your Hadoop cluster spend most of its time doing? W H AT T O I M P R O V E ?
- 13. Profile your cluster, you might be surprised by what you find M E A S U R E - D O N ' T G U E S S
- 14. ENABLE JVM PROFILING WITH -XPROF Built into the JVM (HotSpot), so there's nothing to install Xprof: a low overhead profiler built into the jvm mapreduce.task.profile='true' mapreduce.task.profile.maps='0-' mapreduce.task.profile.reduces='0-' mapreduce.task.profile.params='-Xprof'