Course Content
What is Big Data & Why Hadoop Hive and Pig?
- Big Data Characteristics, Challenges with traditional system
Hadoop Hive and Pig Overview & it's Ecosystem
- Anatomy of Hadoop Hive and Pig Cluster, Installing and Configuring Hadoop Hive and Pig
- Hands On Exercise - Build a pseudo - distributed cluster
HDFS - Hadoop Hive and Pig Distributed File System and Map Reduce Anatomy
- HDFS Architecture, Name Nodes, Data Nodes and Secondary Name Node
- How Map Reduce Works?
- Hands On Exercise - Basic HDFS operations and Running map reduce programs
Monitoring & Management of Hadoop Hive and Pig
- Managing HDFS with Tools like fsck and dfsadmin
- Using HDFS & Job Tracker Web UI
Hive Basics
- Hive Architecture, Hive Variables Creating Internal & External Tables,Partitioning Data, Configuring Shared Meta Store
- Loading Data into Hive, Storing Query Output
- Writing queries - Joining Table, Union, Filtering, Grouping, Sorting etc.and advanced queries
Advanced Hive
- Sampling, Buckets and Clusters
- TRANSFORM, Creating User Defined Functions and SerDes
- Debugging & Troubleshooting Hive Queries
- Hive Best Practices
- Hands on Exercise - Configuring Hive and shared meta store,Creating tables and partitions, Structured data analysis.
- Hands on Exercise - Writing UDFs and SerDes.
Sqoop & Flume
- Importing and Exporting data from using RDBMS and Log Files.
- Hands on Exercise - Import and Export data from MySQL to Hive using Sqoop.
- Hands on Exercise - Importing logs from applications using Flume.
Pig Basics
- Pig Basics, Loading data files, dumping and storing results
- Writing queries - Filter, Group, Join and Sort, FOREACH, SPLIT,SAMPLE etc.
Pig Advanced
- UDFs and Macros
- Diagnostic operators, debugging and collecting statistics
- Performance Optimizations, Multi Query Execution
- Hands on Exercise - Semi - structured Data Analysis(Tweets and Log Analysis)
- Hands on Exercise - Writing UDFs