Transcript
1. How to Be Productive Data Engineer Rafal Wojdyla - rav@spotify.com Note: My views are my own and don't necessarily represent those of Spotify.
2. • Operations • Development • Organization • Culture
3. What is Spotify? For everyone: • Streaming Service • Launched in October 2008 • 60 Million Monthly Users • 15 Million Paid Subscribers + and for me: • 1.3K nodes Hadoop cluster
4. Automation
5. ME ADAM
6. Apache Ambari Cloudera Manager
7. + Puppet
8. Not Invented Here
9. Never Invented Here
10. Wild Wild West
11. Apache Bigtop
12. Enable log aggregation
13. To enable log aggregation yarn.log-aggregation-enable = true yarn.log-aggregation.retain-seconds = ?
14. + + yarn.log-aggregation-enable + true + + + + yarn.log-aggregation.retain-seconds + 315569260 + +
15. Heap Memory used is 97%
16. Hellelephant
17. Custom logs • Profiling • Garbage collection
18. Right tool for the job
19. Right abstraction for the job
20. Scaling machines is easy, scaling people is hard
21. • Map split size • Number of reducers • HDFS data retention • User feedback (ongoing) Automation
22. Organization
23. Ownerless
24. Ownerless Squad
25. Ownerless Squad Upgrades
26. Ownerless Squad Upgrades Getting there
27. Culture
28. Experiment Fail Fast Embrace Failure
29. Spark But we have tried! Non grata
30. Spark spark.storage.memoryFraction (0.6) spark.shuffle.memoryFraction (0.2) In shuffle heavy algorithms reduce cache fraction in favour of shuffle.
31. Spark spark.executor.heartbeatInterval (10K) spark.core.connection.ack.wait.timeout (60) Increase in case of long GC pauses.
32. Learnings • Operations Automation • Development Abstraction • Organization Team • Culture Experiment
33. Join the band Engineers wanted in NYC & Stockholm http://spotify.com/jobs