• Image

Best Big Data Training in Chennai

Learn from our Bigdata experts from the scratch to extreme level to do the analytics about the data. The syllabus designed to give you training more on practical way with set of scenario based question along with mini projects. It helps you to learn in expert level.

What Is Big Data?

It is very difficult to process complex and large set of data With the help of traditional systems like RDBMS, Enterprise System, etc. Now a days any business like Banking, Finance, Telecom or social media like Twitter, Facebook generating more than 90% of data what they generated few years back. Hadoop is the system or frame work used for storing or handling those large volumn of data including different formats like structured, unstructured semi structured. This system is generally designed to scale up to thousands of servers. Also it is open source provided by Apache. Most of large giants like Google, Twitter, Facebook, Amazon, etc are moved their project into hadoop eco system.

Why should join Big Data Training in Infycle?

This course used to analyze more on about data and you can play more on analytics part which will leads to become an data scientist. This course is best base of all data related technologies like Data Science, Machine Learning, IOT, AI, etc. In this digital world, Going forward every business developement cycle purely based on data and data analytics. So it helps everyone to get into analytics part more.

Introduction to Apache Hadoop and the Hadoop Ecosystem

Introduction to Apache Hadoop and the Hadoop Ecosystem

Apache Hadoop Overview

Data Ingestion and Storage

Data locality

Data Analysis and Exploration

Other Ecosystem Tools

Hadoop Ecosystem Installation

Ubuntu 14.04 LTS Installation through VMware Player

Installing Hadoop 2.7.1 on Ubuntu 14.04 LTS (Single-Node Cluster)

Apache Spark Installation

Jdk -8 Installation

Scala Installation

SBT Installation

Apache Hadoop File Storage

Why we need HDFS

Apache Hadoop Cluster Components

HDFS Architecture

Failures of HDFS 1.0

Reading and Writing Data in HDFS

Fault tolerance

Distributed Processing on an Apache Hadoop Cluster

Overview and Architecture of Map Reduce

Components of MapReduce

How MapReduce works

Flow and Difference of MapReduce Version

YARN Architecture

Apache Hive

Hive Installation on Ubuntu 14.04 With MySQL Database Metastore

Hive Overview and Architecture

Hive command execution in shell and HUE

Hive Data Loading methods

Hive Partition and Bucketing

External and Managed tables in Hive

File formats in Hive

Hive Joins

Serde in Hive

Apache Sqoop

Apache Sqoop Overview and Architecture

Apache Sqoop Import Examples

Apache Sqoop Export Examples

Sqoop Incremental load

Introduction to Scala

Functional Programing Vs Object Orient Programing

Scala Overview

Configuring Apache Spark with Scala

Variable Declaration

Operations on variables

Conditional Expressions

Pattern Matching

Iteration

Deep Dive into Scala

Scala Functions

Scala Oops Concept

Scala Abstract Class & Traits

Scala Access Modifier

Scala Array and String

Scala Exceptions

Scala Collections

Scala Tuples

Scala File handling

Scala Multithreading

Spark Ecosystem

Scala Fundamentals

Scala File handling

Introduction and Setting up of Scala

Setup Scala on Windows

Basic Programming Constructs

Functions

Object Oriented Concepts - Classes

Object Oriented Concepts - Objects

Object Oriented Concepts - Case Classes

Collections - Seq, Set and Map

Basic Map Reduce Operations

Setting up Data Sets for Basic I/O Operations

Basic I/O Operations and using Scala Collections APIs

Tuples

Development Cycle - Developing Source code

Development Cycle - Compile source code to jar using SBT

Development Cycle - Setup SBT on Windows

Development Cycle - Compile changes and run jar with arguments

Development Cycle - Setup IntelliJ with Scala

Development Cycle - Develop Scala application using SBT in IntelliJ

Spark Scala Environment setup in different ways

Setup Environment - Locally

Setup Environment - using Cloudera QuickStart VM

Using Windows - Putty and WinSCP

Using Windows - Cygwin

HDFS Quick Preview

YARN Quick Preview

Setup Data Sets

Apache Spark Basics

What is Apache Spark?

Starting the Spark Shell

Using the Spark Shell

Getting Started with Datasets and Data Frames

Data Frame Operations

Apache Spark Overview and Architecture

RDD and Paired RDD

RDD Overview

RDD Data Sources

Creating and Saving RDDs

RDD Operations

Transformations and Actions

Converting Between RDDs and Data Frames

Key-Value Pair RDDs

Map-Reduce operations

Other Pair RDD Operations

Transform, Stage and Store – Spark

Quick overview about Spark documentation

Initializing Spark job using spark-shell

Create Resilient Distributed Data Sets (RDD)

Previewing data from RDD

Reading different file formats - Brief overview using JSON

Transformations Overview

Manipulating Strings as part of transformations using Scala

Row level transformations using map

Row level transformations using flat Map

Filtering the data

Joining data sets - inner join

Joining data sets - outer join

Aggregations - Getting Started

Aggregations - using actions (reduce and countByKey)

Aggregations - understanding combiner

Aggregations using groupByKey - least preferred API for aggregations

Aggregations using reduceByKey

Aggregations using aggregateByKey

Sorting data using sortByKey

Global Ranking - using sortByKey with take and takeOrdered

By Key Ranking - Converting (K, V) pairs into (K, Iterable[V]) using groupByKey

Get topNPrices using Scala Collections API

Get topNPricedProducts using Scala Collections API

Get top n products by category using groupByKey, flatMap and Scala function

Set Operations - union, intersect, distinct as well as minus

Save data in Text Input Format

Save data in Text Input Format using Compression

Saving data in standard file formats - Overview

Revision of Problem Statement and Design the solution

Working with Data Frames, Schemas and Datasets

Creating Data Frames from Data Sources

Saving Data Frames to Data Sources

Data Frame Schemas

Eager and Lazy Execution

Querying Data Frames Using Column Expressions

Grouping and Aggregation Queries

Joining Data Frames

Querying Tables, Files, Views in Spark Using SQL

Comparing Spark SQL and Apache Hive-on-Spark

Creating Datasets

Loading and Saving Datasets

Dataset Operations

Running Apache Spark Applications

Writing a Spark Application

Building and Running an Application

Application Deployment Mode

The Spark Application Web UI

Configuring Application Properties

Distributed Processing

RDD Partitions

Example: Partitioning in Queries

Stages and Tasks

Job Execution Planning

Example: Catalyst Execution Plan

Example: RDD Execution Plan

Data Frame and Dataset Persistence

Persistence Storage Levels

Viewing Persisted RDDs

Difference between RDD, Data frame and Dataset

Common Apache Spark Use Cases

Data Analysis - Spark SQL or HiveQL

Different interfaces to run Hive queries

Create Hive tables and load data in text file format

Create Hive tables and load data in ORC file format

Using spark-shell to run Hive queries or commands

Functions - Getting Started

Functions - Manipulating Strings

Functions - Manipulating Dates

Functions - Aggregations

Functions - CASE

Row level transformations

Joins

Aggregations

Sorting

Set Operations

Analytics Functions - Aggregations

Analytics Functions - Ranking

Windowing Functions

Create Data Frame and Register as Temp table

Writing Spark SQL Applications - process data

Writing Spark SQL Applications - Save data into Hive tables

Data Frame Operations

Apache Flume

Introduction to Flume & features

Flume topology & core concepts

Flume Agents: Sources, Channels and Sinks

Property file parameters logic

Apache Kafka

Apache Kafka Installation

Apache Kafka Overview and Architecture

Consumer and Producer

Deploying Kafka in real world business scenarios

Integration with Spark for Spark Streaming

Apache Zookeeper

Introduction to zookeeper concepts

Overview and Architecture of Zookeeper

Zookeeper principles & usage in Hadoop framework

Use of Zookeeper in Hbase and Kafka

Apache Oozie

Oozie Fundamentals

Oozie workflow creations

Concepts of Coordinates and Bundles