Big Data Analytics

Big Data and Hadoop course is essential to understand the importance of Big Data in the in the real world scenario. The course introduces about Hadoop, MapReduce, and Hadoop Distributed File system (HDFS). It will take you through the process of developing distributed processing of large data sets across clusters of computers and administering Hadoop environment.

Program Overview


Learning Outcomes

  • The importance of Hadoop in the current scenario
  • Role of Relational Database Management System (RDBMS) and Grid computing.
  • Using Hadoop I/O to write MapReduce programs
  • Develop MapReduce applications to solve the problems
  • Installation and group membership in ZooKeeper
  • Set up Hadoop cluster and administer
  • Pig for creating MapReduce programs
  • Hive, a data warehouse software, for querying and managing large datasets residing in distributed storage
  • Hbase implementation, installation, and services
  • * Use of Sqoop in controlling the import and consistency

Duration : 45 hours

Duration : 45 hours


 

Big Data career opportunities are on the rise, and Hadoop is quickly becoming a must-know technology

Enroll Now

Course Contents

Module 1
  • Hadoop Architecture
  • Introduction to Hadoop
  • Parallel Computer vs. Distributed Computing
  • How to install Hadoop on your system
  • How to install Hadoop cluster on multiple machines
Hadoop Daemons introduction:
  • NameNode, DataNode, JobTracker, TaskTracker
  • Exploring HDFS (Hadoop Distributed File System)
  • Exploring the HDFS Apache Web UI
  • NameNode architecture(EditLog, FsImage, location of replicas)
  • Secondary NameNode architecture
  • DataNode architecture
Module 2
  • MapReduce Architecture
  • Exploring JobTracker/TaskTracker
  • How to run a Map-Reduce job
  • Exploring Mapper/Reducer/Combiner
  • Shuffle: Sort & Partition
  • Input/output formats
  • Exploring the Apache MapReduce Web UI
Module 3
  • Hadoop Developer Tasks
  • Writting a Map-Reduce programme
  • Reading and writing data using Java
  • Hadoop Eclipse integration
  • Mapper in details
  • Reducer in details
  • Using Combiners
  • Reducing Intermediate Data with Combiners
  • Writing Partitioners for Better Load Balancing
  • Sorting in HDFS
  • Searching in HDFS
  • Hands-On Exercise
Module 4
  • Hadoop Administrative Tasks
  • Writting a Map-Reduce programme
  • Reading and writing data using Java
  • Hadoop Eclipse integration
  • Mapper in details
  • Reducer in details
  • Using Combiners
  • Reducing Intermediate Data with Combiners
  • Writing Partitioners for Better Load Balancing
  • Sorting in HDFS
  • Searching in HDFS
  • Hands-On Exercise
Module 5
  • HBase Architecture
  • Routine Administrative Procedures
  • Understanding dfsadmin and mradmin
  • Block Scanner, Balancer
  • Health Check & Safe mode
  • Monitoring and Debugging on a production cluster
  • NameNode Back up and Recovery
  • DataNode commissioning/decommissioning
  • ACL (Access control list)
  • Upgrading Hadoop
Module 6
  • Hive Architecture
  • Introduction to Hive
  • HBase vs Hive
  • Installation of Hive on your system
  • HQL (Hive query language )
  • Basic Hive commands
  • Hands-on-Exercise
Module 7
  • PIG Architecture hadoop
  • Introduction to Pig
  • Installation of Pig on your system
  • IBasic Pig commands
  • Hands-On Exercise
Module 8
  • Sqoop Architecture
  • Introduction to Sqoop
  • Installation of Sqoop on your system
  • Import/Export data from RDBMS to HDFS
  • Import/Export data from RDBMS to HBase
  • Import/Export data from RDBMS to Hive
  • Hands-On Exercise