image

Big Data – Apache Hadoop – A Glance

Apache Hadoop is a collection of open source framework that is used to efficiently store and process big datasets in a distributed computing environment, ranging in size from gigabytes to petabytes of data. It runs applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.

The Hadoop architecture comprises mainly three following layers:

  • Storage Layer – HDFS: HDFS holds very large amount of data and provides easier access. To store such huge data, the files are stored across multiple machines. These files are stored in redundant fashion to rescue the system from possible data losses in case of failure. HDFS also makes applications available to parallel processing.
  • Resource Management Layer – YARN: YARN stands for Yet Another Resource Negotiator. It is the resource management layer of Hadoop. It was introduced in Hadoop 2. YARN is designed with the idea of splitting up the functionalities of job scheduling and resource management into separate daemons.
  • Processing Layer – MapReduce: It is the data processing layer of Hadoop. It is a software framework for writing applications that process vast amounts of data (terabytes to petabytes in range) in parallel on the cluster of commodity hardware. Hadoop is capable of running MapReduce programs written in various languages: Java, Ruby, Python, and C++. The programs of Map Reduce in cloud computing are parallel in nature, thus are very useful for performing large-scale data analysis using multiple machines in the cluster.

How Hadoop is Special ?

  • No high end expensive systems are needed
    • Built on commodity hardware
    • Can run on your PC/Laptop etc
  • Can Run on Linux, Windows, Mac OS/X as well as Solaris
    • No discrimination as its written in Java
  • Fault Tolerant System
    • Execution of the job continues even of nodes are failed
    • It accepts failure as part of system
  • Highly Reliable and Efficient Storage System
  • Built in Intelligence to Speed-Up the Application
    • Speculative Execution
  • Fit for a lot of Applications
    • Web Log Processing
    • Page Indexing and Page Ranking
  • MapReduce Framework

Leave a Reply

Your email address will not be published. Required fields are marked *

two + 6 =