Hadoop Administration for Data Analysts & Solution Architect

Have Queries? Ask us +91 72592 22234

Course Overview


Xpertised Offers Advanced and Personalized Instructor Led Online Classroom training on Hadoop Administration for Data Analysts & Solution Architect which gives you the opportunity to interact with a Hadoop Administration for Data Analysts & Solution Architect instructor and help you enhance yourself to meet the demands of the industry.

Learn from our instructors from the convenience of your home or office. Interact and learn live with trainers and other participants. The Big Data Hadoop Solutions Architect Program is designed to ensure that you transform into a Hadoop Solutions Architect Expert. You will master core skill sets, including designing, deploying, maintaining, and securing Hadoop clusters as well as NoSQL database technologies to lead the enterprise-wide Hadoop infrastructure administration for the Big Data Ecosystem.

Course Content


Big Data Overview:

  • What is Big Data
  • Why Big Data is gaining popularity
  • Big Data Case Studies
  • Big Data Characteristics
  • Solutions to work on Big Data.

Hadoop & Its components:

  • What is Hadoop and what are its components.
  • Hadoop Architecture and its characteristics of Data it can handle /Process.
  • Hadoop Frame work & its components- explained in detail.
  • What is HDFS and Reads -Writes to Hadoop Distributed File System.
  • How to Setup Hadoop Cluster in different modes- Pseudo/Multi Node cluster.
  • This includes setting up Hadoop cluster in VM BOX/VMware or on individual machines,Network configurations that need to be carefully looked into, running Hadoop Daemons and testing the cluster.
  • What is Map Reduce frame work and how it works.
  • Running Map Reduce jobs on Hadoop cluster.
  • Understanding Replication , Mirroring and Rack awareness in context of Hadoop .
  • All the above topics include Demos and practice sessions for learners to have hands on experience on the technology.

Hadoop Cluster Planning:

  • How to plan your hadoop cluster.
  • experience on the technology.
  • Understanding hardware-software to plan your hadoop cluster.
  • Understanding workloads and planning cluster to avoid failures and perform optimum.

Working with Hadoop cluster- Hadoop Administration

  • Understanding functionalities of JOB TRACKER –resource management and Job scheduling.
  • Understanding Schedulers- Fair | FIFO | capacity scheduler
  • Hadoop Administration commands to work on Hadoop clusters: Balancer | Job List, Status,Metadata & Data storage at specific locations | replication | Hadoop client | commissioning and decommissioning of data nodes and many more.
  • Setting priority | Save namespace | Metasave | DFSadmin commands | FS commands |distcp | fsck |setting space quota | write /read access to HDFS | securing Hadoop cluster and many more.
  • Backup and recovery
  • Analyzing problems and resolving them : Some examples from live real time environments :
  • Hadoop daemons not starting up | namespace IDs out of sync | connectivity issues between slave and master nodes | data being under replicated | browsing through respective UIs | job failures | etc.

Hadoop cluster with latest features:

  • Hadoop 1.x and 2.x differences
  • Hadoop 2.x new features
  • What is Yarn, Federation and high Availability?
  • Hadoop daemons and what has changed.

Working on Hadoop 2.x cluster:

  • Upgrading Hadoop old versions ( 0.22.x or 1.X.X) to Hadoop 2.X in different modes
  • Setting up Hadoop 2.x clusters in different modes and verifying the setup.
  • Running Map Reduce jobs on Hadoop Yarn.
  • A revisit on Hadoop configuration files, deprecated parameters, add on’s to existing config files and miscellaneous.
  • What is cloudera Manager and how is it used.
  • Comparing Hadoop Distributions

Hive

Introduction to Hive

  • What Is Hive?
  • Hive Schema and Data Storage
  • Comparing Hive to Traditional Databases
  • What is PIG
  • Hive vs. Pig
  • Hive Use Cases
  • Interacting with Hive

Relational Data Analysis with Hive

  • Hive Data Formats
  • Basic HiveQL Syntax
  • Data Types
  • Joining Data Sets
  • Functions
  • Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue

Hive Data Management

  • Hive Databases and Tables
  • Creating Databases and Hive-Managed Tables
  • Loading Data into Hive
  • Altering Databases and Tables
  • Self-Managed Tables
  • Simplifying Queries with Views
  • Controlling Access to Data

Text Processing with Hive

  • Overview of Text Processing
  • Important String Functions
  • Using Regular Expressions in Hive
  • Sentiment Analysis and N-Grams
  • Gaining Insight with Sentiment Analysis

Hive Optimization

  • Understanding Query Performance
  • Controlling Job Execution Plan
  • Partitioning
  • Bucketing
  • Indexing Data

Extending Hive

  • SerDes
  • Data Transformation with Custom Scripts
  • Parameterized Queries
  • Data Transformation with Hive

HBASE

Introduction to HBASE & Its architecture

  • subtopics

Understanding HBASE INTERNALS

  • subtopics

HBase Cluster planning and installing

  • subtopics

HBase schema & row designing, I/O considerations. Advanced configurations.

  • subtopics

The HBase Administration Client API basics and advanced features.

  • subtopics

Working with Data.

  • subtopics

HBase cluster monitoring using frameworks & Tools

  • subtopics

HBase cluster Administration Operational scenarios and tasks.

  • subtopics

Case studies in details with standard practices.

Impala

Introduction to Impala

  • What is Impala?
  • How Impala Differs from Hive and Pig
  • How Impala Differs from Relational Databases
  • Limitations and Future Directions
  • Using the Impala Shell

Analyzing Data with Impala

  • Basic Syntax
  • Data Types
  • Filtering, Sorting, and Limiting Results
  • Joining and Grouping Data
  • Improving Impala Performance
  • Hands-On Exercise: Interactive Analysis with Impala

Choosing the Best Tool for the Job

  • Comparing MapReduce, Pig, Hive, Impala, and Relational Databases
  • Which to Choose?

Customer Reviews


Thanks to Xpertised and the tutor who walked me through all the topics with Practical exposure which is helping me in my current project.
-Waseem

Course was quite helpful in terms of understanding of concepts and practicality. Its really a very friendly environment to learn. The timing were mutually chosen, as we both are working professional. I am quite satisfied with the course.
-Tanmoy

...more
Share:

For Batch Details
Call us at: +91 7259222234

Not sure? Consult Our Experts

Looking for a Training for

Myself

My Team/Organization

I agree to be contacted over mail or phone

or
Call us at: +91 7259222234