Learn the fundamentals of Hadoop development, including data processing, storage, and analysis using Hadoop's ecosystem tools. Gain hands-on experience with real-world projects.
Hadoop Development Training Syllabus
Curriculum Designed by Experts
Hadoop Introduction
Move computation not data
Hadoop performance and data scale facts
Hadoop in the context of other data stores
The Apache Hadoop Project
Hadoop - an inside view: MapReduce and HDFS
The Hadoop Ecosystem
What about NoSQL?
Comparison with Other Systems
RDBMS
Grid Computing
Volunteer Computing
A Brief History of Hadoop
Apache Hadoop and the Hadoop Ecosystem
Hadoop Releases
MapReduce
Analyzing the Data with Hadoop
Map and Reduce
Java MapReduceScaling Out
Data FlowCombiner Functions
Running a Distributed MapReduce Job
Hadoop Streaming
Ruby
Python
Hadoop Pipes
Constructing the basic template of a MapReduce program
Counting things
Adapting for Hadoop's API changes
Streaming in Hadoop
Streaming with Unix commands
Streaming with scripts
Streaming with key/value pairs
Streaming with the Aggregate package
Improving performance with combiners
Distributing Data with HDFS
The Design of HDFS
HDFS Concepts
Blocks
Namenodes and Datanodes
HDFS Federation
HDFS High-Availability
The Command-Line Interface
Basic Filesystem Operations
Hadoop Filesystems
Interfaces
The Java Interface
Reading Data from a Hadoop URL
Reading Data Using the FileSystem API
Writing Data
Directories
Querying the Filesystem
Deleting Data
Data Flow
Anatomy of a File Read
Anatomy of a File Write
Coherency Model
Parallel Copying with distcp
Keeping an HDFS Cluster Balanced
Hadoop Archives
Using Hadoop Archives
Limitations
Understanding Hadoop I/O
Data Integrity
Data Integrity in HDFS
LocalFileSystem
ChecksumFileSystem
Compression
Codecs
Compression and Input Splits
Using Compression in MapReduce
Serialization
The Writable Interface
Writable Classes
Implementing a Custom Writable
Serialization Frameworks
Avro
File-Based Data Structures
SequenceFile
MapFile
Advanced MapReduce
Chaining MapReduce jobs
Chaining MapReduce jobs in a sequence
Chaining MapReduce jobs with complex dependency
Chaining preprocessing and postprocessing steps
Joining data from different sources
Reduce-side joining
Replicated joins using DistributedCache
Semijoin: reduce-side join with map-side filtering