Big Data - Apache Hadoop at a Glance

Big Data – Apache Hadoop – A Glance

Posted on February 9, 2022 by Muhammad Rawish Siddiqui

Apache Hadoop Overview

Apache Hadoop is a collection of open-source frameworks that are used to efficiently store and process big datasets in a distributed computing environment, ranging in size from gigabytes to petabytes of data. It runs applications on clusters of commodity hardware. It provides massive storage for any data, enormous processing power, and the ability to handle virtually limitless concurrent tasks or jobs.

Hadoop Architecture Layers

The Hadoop architecture comprises mainly three layers:

Storage Layer – HDFS

HDFS holds a very large amount of data and provides easier access. To store such huge data, the files are stored across multiple machines. These files are stored in a redundant fashion to rescue the system from possible data losses in case of failure. HDFS also makes applications available to parallel processing.

Resource Management Layer – YARN

YARN stands for Yet Another Resource Negotiator. It is the resource management layer of Hadoop. It was introduced in Hadoop 2. YARN is designed with the idea of splitting up the functionalities of job scheduling and resource management into separate daemons.

Processing Layer – MapReduce in Apache Hadoop

It is the data processing layer of Hadoop. It is a software framework for writing applications that process vast amounts of data (terabytes to petabytes in range) in parallel on a cluster of commodity hardware. Hadoop is capable of running MapReduce programs written in various languages: Java, Ruby, Python, and C++. The programs of MapReduce in cloud computing are parallel in nature, and thus are very useful for performing large-scale data analysis using multiple machines in a cluster.

How is Hadoop special?

No high-end expensive systems are needed
- Built on commodity hardware
- Can run on your PC/Laptop, etc
Can run on Linux, Windows, Mac OS/X, as well as Solaris
- No discrimination, as it’s written in Java
Fault-Tolerant System
- Execution of the job continues even if nodes fail
- It accepts failure as part of the system
Highly Reliable and Efficient Storage System
Built-in Intelligence to Speed Up the Application
- Speculative Execution
Fit for a lot of Applications
- Web Log Processing
- Page Indexing and Page Ranking
MapReduce Framework

Big Data – Apache Hadoop – A Glance

Apache Hadoop Overview

Hadoop Architecture Layers

Storage Layer – HDFS

Resource Management Layer – YARN

Processing Layer – MapReduce in Apache Hadoop

How is Hadoop special?

Leave a Reply Cancel reply

Recent Posts

Archives

Categories