Essentials of Big Data Using Hadoop

Course Duration:  60 Hours

Course Fees:         Rs. 9520/- + Taxes

     Participant Profile:

  • Basic Understanding of Databases, Programming Languages

    End Objective:

  • Appreciate the need of Big Data in recent computing environment
  • Implement Hadoop

c.     Course Outline

1.            Big-Data and Hadoop

1.1.         Introduction to big data and Hadoop

1.2.         Hadoop Architecture

1.3.         Installing Linux with Java 1.8 on VM Workstation 11

1.4.         Hadoop Versioning and Configuration

1.5.         Single Node Hadoop 1.2.1 installation on Linux

1.6.         Multi Node Hadoop 1.2.1 installation on Linux

1.7.         Linux commands and Hadoop commands

1.8.         Cluster architecture and block placement

1.9.         Modes in Hadoop

1.9.1.     Local Mode

1.9.2.     Pseudo Distributed Mode

1.9.3.     Fully Distributed Mode

1.10.      Hadoop Daemon

1.10.1.   Master Daemons (Name Node, Secondary Name Node, Job Tracker)

1.10.2.   Slave Daemons (Job tracker, Task tracker)

1.11.      Task Instance

1.12.      Hadoop HDFS Commands

1.13.      Accessing HDFS

                1.13.1.   CLI Approach

                1.13.2.   Java Approach

2.            Map-Reduce     

2.1.         Understanding Map Reduce Framework

2.2.         Inspiration to Word-Count Example

2.3.         Developing Map-Reduce Program using Eclipse Luna

2.4.         HDFS Read-Write Process

2.5.         Map-Reduce Life Cycle Method

2.6.         Serialization(Java)

2.7.         Datatypes

2.8.         Comparator and Comparable(Java)

2.9.         Custom Output File

2.10.      Analysing Temperature dataset using Map-Reduce

2.11.      Custom Partitioner & Combiner

2.12.      Running Map-Reduce in Local and Pseudo Distributed Mode.

3.            HIVE

3.1.         Hive Introduction & Installation

3.2.         Data Types in Hive

3.3.         Commands in Hive

                3.4.         Exploring Internal and External Table

                3.5.         Partitions

                3.6.         Complex data types

                3.7.         UDF in Hive

3.7.1.     Built-in UDF

3.7.2.     Custom UDF

3.8.         Thrift Server

3.9.         Java to Hive Connection

3.10.      Joins in Hive

3.11.      Working with HWI

3.12.      Bucket Map-side Join

3.13.      More commands

3.13.1.   View

3.13.2.   SortBy

3.13.3.   Distribute By

3.13.4.   Lateral View

3.14.      Running Hive in Cloudera

4.            SQOOP

4.1.         Sqoop Installations and Basics

4.2.         Importing Data from Oracle to HDFS

4.3.         Advance Imports

4.4.         Real Time UseCase

4.5.         Exporting Data from HDFS to Oracle

4.6.         Running Sqoop in Cloudera

5.            PIG

5.1.         Installation and Introduction

5.2.         WordCount in Pig

5.3.         NYSE in Pig

5.4.         Working with Complex Data types

5.5.         Pig Schema

5.6.         Miscellaneous Command

5.6.1.     Group

5.6.2.     Filter

5.6.3.     Order

5.6.4.     Distinct

5.6.5.     Join

5.6.6.     Flatten

5.6.7.     Co-group

5.6.8.     Union

5.6.9.     Illustrate

5.6.10.   Explain

5.7.         UDFs in Pig

5.8.         Parameter Substitution and Dry Run

5.9.         Pig Macros

5.10.      Running Pig in Cloudera

6.            HBase

6.1.         HBase Introduction & Installation

6.2.         Exploring HBase Shell

6.3.         HBase Storage Techinique

6.4.         HBasing with Java

6.5.         CRUD with HBase

6.6.         Hive HBase Integration

7.            OOZIE

7.1.         Installing Oozie

7.2.         Running Map-Reduce with Oozie

7.3.         Running Pig and Sqoop with Oozie