Welcome to Big Data Hadoop
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.
Why Hadoop ?
Capturing data
Curation
Searching
Presentation
Course Content - Big Data
Hadoop Architecture
- History of Hadoop – Facebook, Dynamo, Yahoo, Google
- Hadoop Core
- Yarn architecture, Hadoop 2.0
Hadoop Distributed File System (HDFS)
- HDFS Clusters – NameNodes, DataNodes & Clients
- Metadata
- Web-based Administration
MapReduce
- Processing & Generating large data sets
- Map functions
- Programming MapReduce using SQL / Bash / Python
- Parallel Processing
- Failover
Data warehousing with Hive
- Data Summarisation
- Ad-hoc queries
- Analysing large datasets
- HiveQL (SQL-like Query Language)
- Integration with SQL databases
- n-grams analysis
Parallel Processing with Pig
- Parallel evaluation
- Query language interface
- Relational Algebra
Data Mining with Mahout
- Clustering
- Classification
- Batch-based collaborative filtering
Searching with Elastic Search
- Elastic search concepts
- Installation, import of the data
- Demonstration of API, sample queries
Structured Data Storage with HBase
- Big Data: How big is big?
- Optimised Real-time read/write access
Cassandra multi-master database
- The Cassandra Data Model
- Eventual Consistency
- When to use Cassandra
Redis
- Redis Data Model
- When to use Redis
MongoDB
- MongoDB data model
- Installation of MongoDB
- When to use MongoDB
Kafka
- Kafka architecture
- Installation
- Example usage
- When to use Kafka
Lambda Architecture
- Concept
- Hadoop + Stream processing integration
- Architecture examples
Big Data in the Cloud
- Amazon Web Services
- Concepts: Pay pay use model
- Amazon S3, EC2, EMR
- Google Cloud Platform
- Google Big Query
Infycle Technologies
Let Profession Search You