BBVA API Market
At the BBVA Innovation Center on Plaza Santa Bárbara in Madrid, Jorge López-Malla, Big Data Architect at Stratio, explained why so many great things have been heard for some time about Spark, which he had no access to until he started working for his present company. Stratio is looking for developers, and this is what they said during the presentation by their Human Resources representatives.
The event, which promised to give answers to key issues for development in the Apache framework -how to combine SparkSQL processes with others launched from the Spark Core or the algorithm application of MILib to real-time logic- began like a recent history lesson.
Since the concept of Big Data was born in 2003, in a paper by Google on distributed file processing, we had to wait until 2006 for the Yahoo! team to launch Hadoop, which ended up constituting the basis on which operations with Big Data would take shape.
The problem, according to López-Malla, is that Hadoop emerged in response to a type of problem different from that faced today by a developer who works with distributed file processing. Technology has changed in 10 years, but so has the market and the demand for software.
Flink (also open source) and Spark emerged in response to today’s problems and, according to López-Malla, the latter “in not the future, but the present of Big Data”. The ground-breaking feature of Spark is its processing speed. It is therefore “an evolution of Hadoop and its paradigm”, but with the advantage of offering a performance 10 to 100 times greater than any distributed computing platform.
Everything is based on RDDs, or “collections of distributed collections”, focused on processing in partitions. These partitions, which are independent from each other, enable the workflow to continue with no interruptions without taking into account what happens in the others.
If Hadoop’s core improved, the modules did not benefit from that improvement. Spark changes this radically. Programmers now benefit from the fact that Spark has a single API for everything.
Three of Spark’s most popular modules were present during the afternoon at the BBVA Innovation Center. Spark SQL (for querying structured data with SQL language or an API), Spark Streaming (for managing data in real time instead of by batches) and MILib, for providing Spark with functionalities related to machine learning.
The full presentation by Jorge López-Malla, with visual and operation examples, is available on the BBVA Innovation Center’s YouTube channel, where the video that you can see embedded below can be found along with many others.
Follow us on @BBVAAPIMarket
The real estate sector is becoming digitized by investing in technological solutions to adapt to a user looking for simple processes and transparent documentation in the purchasing processes.
Checkout financing is a digital alternative to credit cards that boasts advantages such as flexibility, creating one credit facility per customer and ensuring their future loyalty, thus improving the customer lifetime value.
Open Future World is born with a clear vocation: to become a rallying point and meeting place for all players in the open banking ecosystem.