Big Data Analytics

Big Data and Hadoop course is essential to understand the importance of Big Data in the in the real world scenario. The course introduces about Hadoop, MapReduce, and Hadoop Distributed File system (HDFS). It will take you through the process of developing distributed processing of large data sets across clusters of computers and administering Hadoop environment.

Program Overview

Learning Outcomes

The importance of Hadoop in the current scenario
Role of Relational Database Management System (RDBMS) and Grid computing.
Using Hadoop I/O to write MapReduce programs
Develop MapReduce applications to solve the problems
Installation and group membership in ZooKeeper

Set up Hadoop cluster and administer
Pig for creating MapReduce programs
Hive, a data warehouse software, for querying and managing large datasets residing in distributed storage
Hbase implementation, installation, and services
* Use of Sqoop in controlling the import and consistency

Duration : 45 hours

Big Data career opportunities are on the rise, and Hadoop is quickly becoming a must-know technology

Enroll Now

Course Contents

Module 1

Hadoop Architecture
Introduction to Hadoop
Parallel Computer vs. Distributed Computing
How to install Hadoop on your system
How to install Hadoop cluster on multiple machines

Hadoop Daemons introduction:

NameNode, DataNode, JobTracker, TaskTracker
Exploring HDFS (Hadoop Distributed File System)
Exploring the HDFS Apache Web UI
NameNode architecture(EditLog, FsImage, location of replicas)
Secondary NameNode architecture
DataNode architecture

Module 2

MapReduce Architecture
Exploring JobTracker/TaskTracker
How to run a Map-Reduce job
Exploring Mapper/Reducer/Combiner
Shuffle: Sort & Partition
Input/output formats
Exploring the Apache MapReduce Web UI

Module 3

Hadoop Developer Tasks
Writting a Map-Reduce programme
Reading and writing data using Java
Hadoop Eclipse integration
Mapper in details
Reducer in details
Using Combiners
Reducing Intermediate Data with Combiners
Writing Partitioners for Better Load Balancing
Sorting in HDFS
Searching in HDFS
Hands-On Exercise

Module 4

Hadoop Administrative Tasks
Writting a Map-Reduce programme
Reading and writing data using Java
Hadoop Eclipse integration
Mapper in details
Reducer in details
Using Combiners
Reducing Intermediate Data with Combiners
Writing Partitioners for Better Load Balancing
Sorting in HDFS
Searching in HDFS
Hands-On Exercise

Module 5

HBase Architecture
Routine Administrative Procedures
Understanding dfsadmin and mradmin
Block Scanner, Balancer
Health Check & Safe mode
Monitoring and Debugging on a production cluster
NameNode Back up and Recovery
DataNode commissioning/decommissioning
ACL (Access control list)
Upgrading Hadoop

Module 6

Hive Architecture
Introduction to Hive
HBase vs Hive
Installation of Hive on your system
HQL (Hive query language )
Basic Hive commands
Hands-on-Exercise

Module 7

PIG Architecture hadoop
Introduction to Pig
Installation of Pig on your system
IBasic Pig commands
Hands-On Exercise

Module 8

Sqoop Architecture
Introduction to Sqoop
Installation of Sqoop on your system
Import/Export data from RDBMS to HDFS
Import/Export data from RDBMS to HBase
Import/Export data from RDBMS to Hive
Hands-On Exercise

Big data Analytics

Big Data Analytics

Program Overview

Learning Outcomes

Duration : 45 hours

Duration : 45 hours

Course Contents

Module 1

Module 2

Module 3

Module 4

Module 5

Module 6

Module 7

Module 8

Information Technology

Industry 4.0

Industry Learning

Quick Links