“Docker is changing the way we develop software”

Álvaro Barbero, Chief Data Scientist at the Knowledge Engineering Institute in the Autónoma University (Madrid), talks about Docker, an open code technology that allows the creation, execution, testing and implementation of apps distributed in an ingenious system of containers.

6 min reading

Development / 16 February 2017

Álvaro Barbero, Chief Data Scientist at the Knowledge Engineering Institute in the Autónoma University (Madrid), talks about Docker, an open code technology that allows the creation, execution, testing and implementation of apps distributed in an ingenious system of containers.

The development world is constantly undergoing changes and advances in information technology, and Docker is now the latest revolution. This technology has transformed the way of developing software, streamlining processes through an ingenious system of containers.

Álvaro Barbero is an expert in machine learning and the head of the Algorithmic Engineering Team at the Knowledge Engineering Institute. He was the only Spaniard shortlisted in the Big Data World Championship (TEXATA) international data analysis competition, where he came second in 2015. He talked about data analysis and machine learning for BBVAOPEN4U, after the event entitled “Using Docker in machine learning projects” at the BBVA Innovation Center.

What are the advantages of using Docker?

It has lots of advantages. What we rate most highly is that it gives us reproducibility. When we have to build a complicated machine-learning model, we need a whole lot of pieces and dependencies in addition to a lot of open source software that we connect between ourselves with pieces we’ve made.

When you have to build this for use in a production system, if you don’t follow each step exactly the same way you did when you were developing it, it doesn’t work. Docker allows you to create a type of recipe so you know that whenever you build a solution it’s going to be exactly the same and it’s going to work, regardless of whether you deploy it in a laptop, a DPC or in a cloud. That’s its main advantage.

Do you consider Docker to be an easy-to-use platform?

In my personal experience, the first time I saw Docker I said: “What’s this?”, because it’s one of those technologies that changes the way you think about developing software. So it’s hard at first to change the way you design a software system so it fits in Docker, but I think once you’ve made the leap you see it makes things easier. It’s certainly true that the changeover isn’t easy, particularly because it requires a shift in your mindset, but once you get used to it, it’s much easier to build large applications.

How does Docker apply to machine learning?

The same as to any software development. That is, when you want to develop an app, you can do so using two main models:

· one we could call the monolithic model, where everything is stable and concentrated;

· and another which consists of making many small pieces that communicate with each other, meaning that each piece is smaller, easier to maintain and easier for another person to apply changes.

However, the orchestration –making everything fit together– is very difficult. Although they always tell you at IT school that it’s better to make independent pieces, it’s very difficult in practice.

You can do all this much more simply with Docker. Machine learning represents a major innovation, and we can do everything in new libraries that are a vast improvement on what we had before. We can’t carry on using a specific library. We must have an ecosystem with many pieces that can communicate with each other, and with Docker we can do this quite easily.

Why, when we talk about Docker, do we talk about containers?

When you think about containers, you imagine those crates that are loaded onto boats. This is a little different. The idea of a container refers to the fact that it contains an application, so let’s say that isolates it from its surroundings. But that application can be as large as you want. When you run an app in a container you see it as though it were isolated. But the operating system sees it as though it were just one more process. The limits of the disc, memory and processor are only set by the system itself.

I’d say it’s like a virtual machine, but in Docker you can scale without any problem, provided you don’t overload the main machine. That’s why I think it gives you some additional advantages, particularly compared to developing in the virtual machines of recent years.

Does the use of Docker represent a revolution?

At the beginning it requires you to change your mindset, and that’s exactly what defines a revolution: not doing things like you did before, but making a total change. This isn’t just my opinion. I’ve heard opinions from several experts who agree with that view, and believe we’re already seeing a change in how software is developed. Before, the normal procedure was for programmers to work in a code repository and then gradually add versions of the code, and so on. Then there was a system of continuous integration, where each time you changed the code status you ran tests and then deployed it.

Docker is a way of doing this, and adds another step –namely, that you have a type of container, a repository of stable versions of the code. With Docker, programmers still develop in the code repository, but when they finish a stable version they test it. Once it’s in the repository of the artifact, that’s where you insert the Docker container that’s already been tested –where we know the code’s okay–, and you can take that to any machine because you know it’s going to work. This changes the way you deploy applications. Now it’s much simpler, it’s not so manual, but you have a container that you can put wherever you like. It really cuts down on the time it takes to deploy apps, and on maintenance issues. Now we’re closer to the paradigm of making large apps based on much smaller pieces that communicate with each other.

How do you see the future of machine learning/Big Data?

The media have singled out these areas mainly because industry is moving forward thanks to the application of all these techniques. Before, machine learning was something that belonged exclusively to the field of science, but the term is now common currency, it’s in the news, and that’s why almost any technology company on its way up knows how useful it is. I think it’s a trend that’s going to continue in the short term. There may also be a bubble around this, because when everybody talks so much about a subject it gives rise to unrealistic expectations. There’s no question it works, but not for everything –because I still can’t predict what number’s going to win the lottery, for example.

What’s more, many advances taking place in machine learning are rather fortuitous. For example, we’re hearing a lot now about deep learning –these deep neuronal networks–, and it’s true that many university research groups have been working on this for a long time. But some of the advances taking place are simply ideas that are being tried out –simple, but without the theoretical part that shows why it works.

There’s a ceiling there that we won’t be able to break through because we lack some fundamental knowledge. The fact of the matter is that we don’t know how the brain works. The systems we try to avoid, smart systems –we still don’t know how they work. When we talk about the long term and about smart robots, I think there’s something missing that we still haven’t quite got. That’s why I wouldn’t be surprised if this whole furore around machine learning starts to die down, a little like what happened earlier in the case of artificial intelligence. These are booms that crop up every so often, and which represent a technological benefit. They’re exploited, then they cool down until the next big thing comes along.

Do you believe data analytics can mark the difference in a company?

The answer is yes. Last year I was at an event in the United States, and one of the speakers said: “I challenge you name me ten successful start-ups that don’t use machine learning or data analysis”. And it’s complicated, because you think: Uber uses it; WhatsApp probably does to a certain extent too because it’s been bought by Facebook and it’s known for its data mining –so it looks like it’s something that does contribute a lot of value.

Should it be implemented by all companies? My view is that it’s most valuable for companies that are closely involved with data generation, such as banking, telecommunications and so forth. And another thing we’re seeing is that it’s spreading to other sectors, maybe not so technified, such as agriculture, for example.

You were the first Spanish finalist in the Big Data Analytics World Championship. What advice would you give to professionals in the sector?

Training. I began studying computer engineering because I was interested in it, and then I heard about things like artificial intelligence and neuronal networks, and I went on to get my master’s and my doctorate. Then you have to carry on training because this is a very new field, and within two years the technologies are beginning to become obsolete. You have to have passion and interest in looking at what’s going on, and you have to change what you’re doing so you can do it better. I think this is the fundamental ingredient –total ongoing training. You have to ask yourself if you’ve learned something new this month, and if you haven’t, you should start to worry.

What exactly do you need to be a data analyst?

It depends a lot on how you define a data analyst. They used to say before that a data analyst –the people who were known before as “data unicorns” because they were so difficult to find– was someone who knew about computer technology, mathematics, who had a knowledge of business and an ability to explain and generate visualizations that could be understood by the general public… So that’s a pretty comprehensive profile. However today there are very complete teams with different profiles that form a multidisciplinary unit to cover that role.

Speaking of data analysis leads us directly to data visualization. Why is this field becoming so important? Have companies become aware of its power?

Visualization is the first thing companies saw when they entered the world of Big Data, because it’s before your very eyes. You can listen for ages to people talking about predictive algorithms and data statistics, but when you actually see –when you’re able to summarize a vast amount of information in a graph that organizes everything your company has, and can show you if you’re doing well for one reason or another… From that point on you can take decisions. I’d say the first thing for any Big Data project is to have a visualization, then the phases of predictive algorithms and so on follow on later.

Are you interested in financial APIs? Discover all the APIs we can offer you at BBVA

Name	Owner	Duration	Description
gobp.lang	BBVA	1 month	Language preference
aceptarCookies	BBVA	1 year	Configuration Accepted Cookies
_abck	BBVA	1 year	Helps protect against malicious website attacks
bm_sz	BBVA	4 hours	Helps protect against malicious website attacks
ADRUM_BTs	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
ADRUM_BT1	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
ADRUM_BTa	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
ADRUM_BT	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
xt_0d95e	Salesforce Marketing Cloud	Session	Remember user preferences (if any)
__s9744cdb192d044faa1bf201d29fafd1e	Salesforce Marketing Cloud	Session	Remember user preferences (if any)
wpml_browser_redirect_test	WPML	Session	Text translation in the portal
wp-wpml_current_language	WPML	24 hours	Text translation in the portal

Name	Owner	Duration	Description
AMCV_***	Adobe Analytics	Session	Unique Visitor IDs used in Cloud Marketing solutions
AMCVS_***	Adobe Analytics	2 years	Unique Visitor IDs used in Cloud Marketing solutions
demdex (safari)	Adobe Analytics	180 days	Create and store unique and persistent identifiers
sessionID	Adobe Analytics	Session	Launch's internal cookie used to identify the user
gpv_URL	Adobe Analytics	Session	Adobe Analytics plugin: getPreviousValue Capture the value of a certain variable in the following page view, in this case the prop1
gpv_level1	Adobe Analytics	Session	Cookie used to store the DataLayer levl1 of the previous page.
gpv_pageIntent	Adobe Analytics	Session	Cookie used to store the pageIntent of the previous page.
gpv_pageName	Adobe Analytics	Session	Cookie used to store the pagename of the previous page.
aocs	Adobe Analytics	Session	Cookie that stores the first values collected at the beginning of a process.
TTC	Adobe Analytics	Session	Cookie used to store the time between the App Page Visit event and the App Completed event.
TTCL	Adobe Analytics	Session	Cookie used to store the time between the LogIn event and App Completed.
s_cc	Adobe Analytics	Session	Determine if cookies are active
s_hc	Adobe Analytics	Session	Cookie used by Adobe for analytical purposes
s_ht	Adobe Analytics	Session	Cookie used by Adobe for analytical purposes
s_nr	Adobe Analytics	2 years	Determine the number of user visits
s_ppv	Adobe Analytics	Permanent	Adobe Analytics plugin: getPercentPageViewed Determine what percentage of the page a user views
s_sq	Adobe Analytics	Session	ClickMap/ActivityMap features
s_tp	Adobe Analytics	Session	Cookie used by Adobe for analytical purposes
s_visit	Adobe Analytics	2 years	Cookie used by Adobe to know when a session has been started.

Name	Owner	Duration	Description
OT2	VersaTag	90 days	VersaTag Cookie used to store a user id and the number of user visits.
u2	VersaTag	90 days	VersaTag Cookie where the user ID is stored
TargetingInfo 2	MediaMind	1 year	Cookie that serves to assign a unique random number that generates MediaMind.

Name	Owner	Duration	Description
mbox	Adobe Target	9 days	Cookie used by Adobe Target to test user experience customization.

“Docker is changing the way we develop software”

Álvaro Barbero, Chief Data Scientist at the Knowledge Engineering Institute in the Autónoma University (Madrid), talks about Docker, an open code technology that allows the creation, execution, testing and implementation of apps distributed in an ingenious system of containers.

What are the advantages of using Docker?

Do you consider Docker to be an easy-to-use platform?

How does Docker apply to machine learning?

Why, when we talk about Docker, do we talk about containers?

Does the use of Docker represent a revolution?

How do you see the future of machine learning/Big Data?

Do you believe data analytics can mark the difference in a company?

You were the first Spanish finalist in the Big Data Analytics World Championship. What advice would you give to professionals in the sector?

What exactly do you need to be a data analyst?

Speaking of data analysis leads us directly to data visualization. Why is this field becoming so important? Have companies become aware of its power?

It may interest you

APIs in selling: the final push

APIs are everywhere, but… what about their documentation?

Tools to measure the success and effectiveness of your API