BBVA API Market
The development world is constantly undergoing changes and advances in information technology, and Docker is now the latest revolution. This technology has transformed the way of developing software, streamlining processes through an ingenious system of containers.
Álvaro Barbero is an expert in machine learning and the head of the Algorithmic Engineering Team at the Knowledge Engineering Institute. He was the only Spaniard shortlisted in the Big Data World Championship (TEXATA) international data analysis competition, where he came second in 2015. He talked about data analysis and machine learning for BBVAOPEN4U, after the event entitled “Using Docker in machine learning projects” at the BBVA Innovation Center.
It has lots of advantages. What we rate most highly is that it gives us reproducibility. When we have to build a complicated machine-learning model, we need a whole lot of pieces and dependencies in addition to a lot of open source software that we connect between ourselves with pieces we’ve made.
When you have to build this for use in a production system, if you don’t follow each step exactly the same way you did when you were developing it, it doesn’t work. Docker allows you to create a type of recipe so you know that whenever you build a solution it’s going to be exactly the same and it’s going to work, regardless of whether you deploy it in a laptop, a DPC or in a cloud. That’s its main advantage.
In my personal experience, the first time I saw Docker I said: “What’s this?”, because it’s one of those technologies that changes the way you think about developing software. So it’s hard at first to change the way you design a software system so it fits in Docker, but I think once you’ve made the leap you see it makes things easier. It’s certainly true that the changeover isn’t easy, particularly because it requires a shift in your mindset, but once you get used to it, it’s much easier to build large applications.
The same as to any software development. That is, when you want to develop an app, you can do so using two main models:
· one we could call the monolithic model, where everything is stable and concentrated;
· and another which consists of making many small pieces that communicate with each other, meaning that each piece is smaller, easier to maintain and easier for another person to apply changes.
However, the orchestration –making everything fit together– is very difficult. Although they always tell you at IT school that it’s better to make independent pieces, it’s very difficult in practice.
You can do all this much more simply with Docker. Machine learning represents a major innovation, and we can do everything in new libraries that are a vast improvement on what we had before. We can’t carry on using a specific library. We must have an ecosystem with many pieces that can communicate with each other, and with Docker we can do this quite easily.
When you think about containers, you imagine those crates that are loaded onto boats. This is a little different. The idea of a container refers to the fact that it contains an application, so let’s say that isolates it from its surroundings. But that application can be as large as you want. When you run an app in a container you see it as though it were isolated. But the operating system sees it as though it were just one more process. The limits of the disc, memory and processor are only set by the system itself.
I’d say it’s like a virtual machine, but in Docker you can scale without any problem, provided you don’t overload the main machine. That’s why I think it gives you some additional advantages, particularly compared to developing in the virtual machines of recent years.
At the beginning it requires you to change your mindset, and that’s exactly what defines a revolution: not doing things like you did before, but making a total change. This isn’t just my opinion. I’ve heard opinions from several experts who agree with that view, and believe we’re already seeing a change in how software is developed. Before, the normal procedure was for programmers to work in a code repository and then gradually add versions of the code, and so on. Then there was a system of continuous integration, where each time you changed the code status you ran tests and then deployed it.
Docker is a way of doing this, and adds another step –namely, that you have a type of container, a repository of stable versions of the code. With Docker, programmers still develop in the code repository, but when they finish a stable version they test it. Once it’s in the repository of the artifact, that’s where you insert the Docker container that’s already been tested –where we know the code’s okay–, and you can take that to any machine because you know it’s going to work. This changes the way you deploy applications. Now it’s much simpler, it’s not so manual, but you have a container that you can put wherever you like. It really cuts down on the time it takes to deploy apps, and on maintenance issues. Now we’re closer to the paradigm of making large apps based on much smaller pieces that communicate with each other.
The media have singled out these areas mainly because industry is moving forward thanks to the application of all these techniques. Before, machine learning was something that belonged exclusively to the field of science, but the term is now common currency, it’s in the news, and that’s why almost any technology company on its way up knows how useful it is. I think it’s a trend that’s going to continue in the short term. There may also be a bubble around this, because when everybody talks so much about a subject it gives rise to unrealistic expectations. There’s no question it works, but not for everything –because I still can’t predict what number’s going to win the lottery, for example.
What’s more, many advances taking place in machine learning are rather fortuitous. For example, we’re hearing a lot now about deep learning –these deep neuronal networks–, and it’s true that many university research groups have been working on this for a long time. But some of the advances taking place are simply ideas that are being tried out –simple, but without the theoretical part that shows why it works.
There’s a ceiling there that we won’t be able to break through because we lack some fundamental knowledge. The fact of the matter is that we don’t know how the brain works. The systems we try to avoid, smart systems –we still don’t know how they work. When we talk about the long term and about smart robots, I think there’s something missing that we still haven’t quite got. That’s why I wouldn’t be surprised if this whole furore around machine learning starts to die down, a little like what happened earlier in the case of artificial intelligence. These are booms that crop up every so often, and which represent a technological benefit. They’re exploited, then they cool down until the next big thing comes along.
The answer is yes. Last year I was at an event in the United States, and one of the speakers said: “I challenge you name me ten successful start-ups that don’t use machine learning or data analysis”. And it’s complicated, because you think: Uber uses it; WhatsApp probably does to a certain extent too because it’s been bought by Facebook and it’s known for its data mining –so it looks like it’s something that does contribute a lot of value.
Should it be implemented by all companies? My view is that it’s most valuable for companies that are closely involved with data generation, such as banking, telecommunications and so forth. And another thing we’re seeing is that it’s spreading to other sectors, maybe not so technified, such as agriculture, for example.
Training. I began studying computer engineering because I was interested in it, and then I heard about things like artificial intelligence and neuronal networks, and I went on to get my master’s and my doctorate. Then you have to carry on training because this is a very new field, and within two years the technologies are beginning to become obsolete. You have to have passion and interest in looking at what’s going on, and you have to change what you’re doing so you can do it better. I think this is the fundamental ingredient –total ongoing training. You have to ask yourself if you’ve learned something new this month, and if you haven’t, you should start to worry.
It depends a lot on how you define a data analyst. They used to say before that a data analyst –the people who were known before as “data unicorns” because they were so difficult to find– was someone who knew about computer technology, mathematics, who had a knowledge of business and an ability to explain and generate visualizations that could be understood by the general public… So that’s a pretty comprehensive profile. However today there are very complete teams with different profiles that form a multidisciplinary unit to cover that role.
Visualization is the first thing companies saw when they entered the world of Big Data, because it’s before your very eyes. You can listen for ages to people talking about predictive algorithms and data statistics, but when you actually see –when you’re able to summarize a vast amount of information in a graph that organizes everything your company has, and can show you if you’re doing well for one reason or another… From that point on you can take decisions. I’d say the first thing for any Big Data project is to have a visualization, then the phases of predictive algorithms and so on follow on later.
APIs allow companies to offer their own e-wallets, building a new user experience that drives and contributes to customer loyalty.