Application fee : 0 USD


Certification Body: Aegis School of Data Science
Location: On-campus (India, Mumbai, Pune, Bangalore)
Type: Certificate course
Director: Dr. Vinay Kulkarni
Coordinator: Ritin Joshi
Language: English
Course fee: 0 USD
GST: 18%
Total course fee: 0 USD
No Ratings


Course Details

What is Hadoop? 

Apache™ Hadoop® is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software's ability to detect and handle failures at the application layer.

It is a data storage processing system that enables data storage, file sharing, data analytics etc. The technology is scalable & enables effective analysis from large unstructured data therefore adding value. With increasing role of social media and internet communication Hadoop is being largely used by various spectrum of companies ranging from Facebook to Yahoo. Other big users of Hadoop include Cloudera, Hortonworks, IBM, Amazon, Intel, Mapr, Microsoft. This technology facilitates its users to handle more data through enhanced storage capacity also enables data retrieval in case of hardware failure.

Hadoop as a solution is increasingly offering data retrieval and data security features. These features are getting better with time. This is leading to enhanced solution to database management systems (DBMS). Hadoop software is the highest growing market in comparison to hardware and services.

Who is Using Hadoop? 

With increasing role of social media and internet communication Hadoop is being largely used by various spectrum of companies ranging from Facebook to Yahoo. This technology facilitates its users to handle more data through enhanced storage capacity also enables data retrieval in case of hardware failure.

  • Yahoo ( One of the biggest user & more than 80% code contributor to Hadoop)
  • Facebook
  • Cloudera
  • Hortonworks
  • IBM
  • Intel
  • Mapr
  • Microsoft
  • Netflix 
  • Amazon
  • Adobe 
  • Ebay
  • Hulu
  • Twitter
  • Snapdeal
  • TataSky

Why use Hadoop?

Hadoop changes the economics and the dynamics of large-scale computing. Its impact can be boiled down to four salient characteristics. Hadoop enables a computing solution that is:


 A cluster can be expanded by adding new servers or resources without having to move, reformat, or change the dependent analytic workflows or applications.

Cost effective

 Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data.


Hadoop is schema-less and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide.

Fault tolerant

 When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.

Hadoop architecture

Hadoop is composed of four core components—Hadoop Common, Hadoop Distributed File System (HDFS), MapReduce and YARN.

Hadoop for the Enterprise from IBM

IBM BigInsights brings the power of Hadoop to the enterprise, enhancing Hadoop by adding administrative, discovery, development, provisioning, security and support, along with best-in-class analytical capabilities. IBM® BigInsights™ for Apache™ Hadoop® is a complete solution for large-scale analytics. Explore the Big Data Anlytics using IBM Infosphre Big Insight taught by IBM Experts at Aegis. 

Sample Jobs in Hadoop

  • Hadoop Developer
  • Hadoop Architect
  • Hadoop Tester
  • Hadoop Administrator
  • Data Scientist

Companies Recruiting

  • IBM
  • Myntra
  • Snapdeal
  • HP
  • EMC
  • Cloudera

Hadoop Course Overview

Through lectures, hands-on exercises, case studies, and projects the students will explore the Hadoop ecosystem, learning topics such as:

  • What is Hadoop and the real-world problems it solves
  • Understand MapReduce concepts and how it works
  • Write MapReduce programs
  • Architect Hadoop-based applications
  • Understand Hadoop operations

Prerequisites and Requirements

  • Programming proficiency in Java or Python is required. Prior knowledge of Apache Hadoop is not required.
  • Primary language during lectures will be Java. However, assignments and projects completed in both Java and Python will be accepted.

Note: For students who do not have a programming background in Java or Python, additional readings or learning videos will be prescribed. The programming prerequisites will need to be completed within the first 2 weeks of the course.

Course Contents

  1. Introduction to Hadoop: Real-World Hadoop Applications and Use Cases, Hadoop Ecosystem & projects, Types of Hadoop Processing, Hadoop Distributions, Hadoop Installation.
  2. Hadoop MapReduce: Developing a MapReduce Application, MapReduce Internals, MapReduce I/O, Examples illustrating MapReduce Features.
  3. Hadoop Application Architectures: Data Modeling, Data Ingestion, and Data Processing in Hadoop.
  4. Hadoop Projects: Practical Tips for Hadoop Projects.
  5. Hadoop Operations: Planning a Hadoop Cluster, Installation and Configuration, Security, and Monitoring.
  6. Hadoop Case Studies: A series of Group Discussions and Student Project Presentations.


  1. Lecture notes will form the primary reference material for the course.
  2. Specific reading material, papers, and videos will be assigned during the course to augment the lecture notes.
  3. Reference Textbook: Hadoop: The Definitive Guide, 4th Edition, Tom White, O’Reilly Media, Inc.


  1. Homework Assignments: Students will be assigned 5 homework assignments during the course: 10%
  2. Quizzes: Best 2 out 3 unannounced quizzes will be counted towards the final grade: 10%
  3. Mid-Term Exam: 20%
  4. Final Exam: 40%
  5. Student Project: The students will individually, or optionally in teams of two, research, design, implement, and present a substantial term project: 20%

  Consult our Big Data Career Advisers +91 704 531 4371 /+91 981 900 8153 on how to add wings to your career