The most widely used APIs in machine learning, apart from Google, IBM or Amazon

4 min reading
APIs , Developers / 14 April 2016
The most widely used APIs in machine learning, apart from Google, IBM or Amazon
The most widely used APIs in machine learning, apart from Google, IBM or Amazon


At BBVAOpen4U we have already seen on more than one occasion the importance of machine learning for the business development of companies and its huge impact on large technology companies like Google, IBM, Amazon or Microsoft. But they are not the only market players in the creation of predictive models or natural language processing. Some companies successfully struggle to find a place in this highly competitive field with a bright future.

PredictionIO, AT&T Speech, and Diffbot are four practical examples that prove that it is possible to emerge and grow within machine learning and natural language processing only, although this later leads to accepting offers for integrating with big companies. The success of these projects is explained by APIs (application programming interfaces). Without them, doing business would be impossible.


PredictionIO is an open source machine learning server that enables development and data science teams to build fully scalable prediction engines, a major consideration when working with data in real time. These are some of the most interesting features of PredictionIO:

Simplified data infrastructure management.

Support for such well-known machine learning and data processing libraries as Spark MLlib (the tool offered by Apache Spark, the open source distributed computing platform, that contains algorihtms for logistic regression, support vector machines (SVM), Bayesian regression tree models, least square techniques, analysis of average K conglomerates…) or OpenNLP (machine learning library based on natural language processing).  

Incorporate proprietary predictive models into the PredictionIO engine.

Response to dynamic queries in real time.

Unification of data from different platforms, both in batches and in real time sources, to make predictive analysis fully comprehensive.

PredictionIO has several SDKs for languages such as Java, Ruby, Python or PHP. The tool is basically based on three components: 

●       PredictionIO platform: An open source development stack that enables clients to build, evaluate and implement engines using machine learning algorithms in an easy and scalable way.

●       Event server: This PredictionIO tool enables applications to send events to the server through an API.

●       Template gallery: There is no need to download templates for the different engines based on each machine learning application.

AT&T Speech

AT&T Speech APIs enable developers to include voice recognition functionality in both web applications and mobile apps. It has three application programming interfaces that transform voice into text and text into voice, in a general or customized way.

●       The voice-to-text API: It only accepts single-channel audio formats and it uses a grammar dictionary to complete transcriptions in both English and Spanish and a contextualization system to optimize accuracy. The API transcribes voice in batches in four minutes. The different batches would later need to be joined to obtained the full transcription.

●       The voice-to-customized text API: In this case, the interface creates transcriptions from the terms (grammar and suggestions) in a database generated by the developers themselves. More accuracy is sought.

●       The text-to-voice API: It accepts plain text or text in XML format with a maximum limit of 500 bytes (equivalent to a text containing around 100 words) and supports both male and female voices in two languages, English and Spanish. 

The different SDKs can be downloaded here is a natural language processing platform for developers, specifically, a community with more than 20,000 professionals. Why do they use To include new functionality in web and mobile applications in fields such as robotics, messaging services or wearables. The API has the ability to learn human language on its own with each interaction.

Some of its key features are:

● is completely free, even for commercial use. The platform’s applications are open because, according to its creators, “only private applications are used when there are privacy restrictions”. “An open application is capable of using the data provided by the community to be even smarter”.

●       The users or developers using own their data, but they must be aware that it will be used to enhance the platform.

●       It now supports many languages: English, Spanish, French, German, Italian, Dutch, Polish, Swedish, Portuguese or Russian.

● has tutorials for mobile operating systems such as iOS, Android and Windows Phone, for all web browsers and for programming languages such as Python, Ruby, C or Rust.

To test the platform’s benefits, is a web app that can be used to try its functionality through microphone access. 


Diffbot is a platform that uses artificial intelligence (a combination of machine learning and natural language recognition) to automatically extract data from websites, such as text, pictures, videos or comments. It is therefore a tool that can be used for scraping anything retrievable from a website. This is possible thanks to the repertoire of APIs provided by the platform. However, Diffbot is not an open source tool.

Some of key features of Diffbot are:

●       The Diffbot APIs are run in JavaScript.

●       It works on websites in English and in other languages.

●       Automatic tagging of scraped information.

●       Extraction of data in JSON or CSV formats. Bulk API enables developers to scrape hundreds of websites simultaneously.

●       Libraries in PHP, Python, JavaScript, Objective-C or Perl

More information on APIs here.

Follow us on @BBVAAPIMarket

It may interest you