Course Content
Big Data Overview:
- What is Big Data
- Why Big Data is gaining popularity
- Big Data Case Studies
- Big Data Characteristics
- Solutions to work on Big Data.
Hadoop& Its components:
- What is Hadoop and what are its components.
- Hadoop Architecture and its characteristics of Data it can handle /Process.
- Brief on Hadoop History, companies using it and why they have started using it.
- Hadoop Frame work & its components- explained in detail.
- What is HDFS and Reads -Writes to Hadoop Distributed File System.
- How to Setup Hadoop Cluster in different modes- Pseudo/Multi Node cluster.
- This includes setting up Hadoop cluster in VM BOX/VMware or on individual machines, Network configurations that need to be carefully looked into, running Hadoop Daemons and testing the cluster.
- What is Map Reduce frame work and how it works.
- Running Map Reduce jobs on Hadoop cluster.
- Understanding Replication , Mirroring and Rack awareness in context of Hadoop .
- All the above topics include Demos and practice sessions for learners to have hands on experience on the technology.
Hadoop Cluster Planning:
- How to plan your hadoop cluster.
- Understanding hardware-software to plan your hadoop cluster.
- Understanding workloads and planning cluster to avoid failures and perform optimum.
Working with Hadoop cluster- Hadoop Administration
- Understanding functionalities of JOB TRACKER _ resource management and Job scheduling.
- Understanding Schedulers- Fair | FIFO | capacity scheduler
- Hadoop Administration: Setting paramters to Setup | Trash | Schedulers & pool | Metadata & Data storage at specific locations | replication | Hadoop client | commissioning and decommissioning of data nodes and many more.
- Hadoop Administration commands to work on Hadoop clusters: Balancer | Job List, Status, Setting priority | Save namespace | Metasave | DFSadmin commands | FS commands | distcp | fsck |setting space quota | write /read access to HDFS | securing Hadoop cluster and many more.
- Backup and recovery
- Analyzing problems and resolving them : Some examples from live real time environments :
- Hadoop daemons not starting up | namespace IDs out of sync | connectivity issues between slave and master nodes | data being under replicated | browsing through respective UIs | job failures | etc.
Hadoop cluster with latest features:
- Hadoop 1.x and 2.x differences
- Hadoop 2.x new features
- What is Yarn, Federation and high Availability?
- Hadoop daemons and what has changed.
Working on Hadoop 2.x cluster:
- Upgrading Hadoop old versions ( 0.22.x or 1.X.X) to Hadoop 2.X in different modes
- Setting up Hadoop 2.x clusters in different modes and verifying the setup.
- Running Map Reduce jobs on Hadoop Yarn.
- A revisit on Hadoop configuration files, deprecated parameters, add on's to existing config files and miscellaneous.
- Understanding QJM-Quorum Journal Manager
Understanding and working with Hadoop Components:
- What is oozie, flume, Hive, Hbase and why are they used.
- How to setup Hive/PIG/Hbase on Hadoop clusters.
- Setting up and working with HIVE, HBASE, PIG on Hadoop 1.x or Hadoop 2.x
- Some real time case studies.
- What is cloudera Manager and how is it used.
- How to run a mapreduce jobs from Java API.