- The problem space and example applications
- Why don't traditional approaches scale?
- Hadoop History
- The ecosystem and stack: HDFS, MapReduce, Hive, Pig...
- Cluster architecture overview
- Hadoop distribution and basic commands
- Eclipse development
- The HDFS command line and web interfaces
- The HDFS Java API (lab)
- Key philosophy: move computation, not data
- Core concepts: Mappers, reducers, drivers
- The MapReduce Java API (lab)
- Optimizing with Combiners and Partitioners (lab)
- More common algorithms: sorting, indexing and searching (lab)
- Relational manipulation: map-side and reduce-side joins (lab)
- Chaining Jobs
- Testing with MRUnit
- Patterns to abstract "thinking in MapReduce"
- The Cascading library (lab)
- The Hive database (lab)