Welcome to Big Data Hadoop

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.

Why Hadoop ?

Capturing data

Curation

Searching

Presentation

Course Content - Big Data

Hadoop Architecture

History of Hadoop – Facebook, Dynamo, Yahoo, Google
Hadoop Core
Yarn architecture, Hadoop 2.0

Hadoop Distributed File System (HDFS)

HDFS Clusters – NameNodes, DataNodes & Clients
Metadata
Web-based Administration

MapReduce

Processing & Generating large data sets
Map functions
Programming MapReduce using SQL / Bash / Python
Parallel Processing
Failover

Data warehousing with Hive

Data Summarisation
Ad-hoc queries
Analysing large datasets
HiveQL (SQL-like Query Language)
Integration with SQL databases
n-grams analysis

Parallel Processing with Pig

Parallel evaluation
Query language interface
Relational Algebra

Data Mining with Mahout

Clustering
Classification
Batch-based collaborative filtering

Searching with Elastic Search

Elastic search concepts
Installation, import of the data
Demonstration of API, sample queries

Structured Data Storage with HBase

Big Data: How big is big?
Optimised Real-time read/write access

Cassandra multi-master database

The Cassandra Data Model
Eventual Consistency
When to use Cassandra

Redis

Redis Data Model
When to use Redis

MongoDB

MongoDB data model
Installation of MongoDB
When to use MongoDB

Kafka

Kafka architecture
Installation
Example usage
When to use Kafka

Lambda Architecture

Concept
Hadoop + Stream processing integration
Architecture examples

Big Data in the Cloud

Amazon Web Services
Concepts: Pay pay use model
Amazon S3, EC2, EMR
Google Cloud Platform
Google Big Query

HDFS
Map Reduce

Based on 8413 People

We can help you choose the right university and guide you in completing the application process.

Infycle Technologies

Let Profession Search You