The Apache Hadoop system

< 1 min reading
Digital transformation / 20 January 2020
The Apache Hadoop system
The Apache Hadoop system


The Apache Hadoop system is the software most commonly associated with Big Data. Is designed to handle hundreds of computers from individual servers. It works as a framework that allows large volumes of data to be processed through groups of computers using simple programming models.

The Apache Hadoop system is designed to handle hundreds of computers from individual servers, each offering local computing and storage. This system is based on Java and allows calculation tasks to be fragmented into different processes and distributed in the nodes of an interrelated group of computers so they can work in parallel. In fact, thousands of computers can be used, which makes better financial sense as they only require several standard servers instead of a latest generation machine.

Rather than depending on the hardware to ensure high availability, Apache Hadoop is designed to detect and manage faults in the application layer.

Hadoop is a very extensive software package, which is why it is sometimes known as the Hadoop ecosystem. Along with the central components (Core Hadoop), this package includes a wide variety of extensions (Pig, Chukwa, Oozie and ZooKeeper) that add a large number of extra functions to the framework and serve to handle large volumes of data groups.

The basis of the Hadoop ecosystem is the Core Hadoop. However, the project includes the following modules:

The first version comprises the basic module Hadoop Common, the Hadoop Distributed File System (HDFS) and a MapReduce engine. Starting with version 2.3, this last element was replaced by the YARN interconnected computer group management technology, also called MapReduce 2.0.


You can find more information on this software on the official website.

Are you interested in financial APIs? Discover all the APIs we can offer you at BBVA

It may interest you