Google releases API to convert audio into text: characteristics for developers

One of Google’s latest releases is a perfect example of how large technology companies are working toward the API economy; and how they are working to use application programming interfaces to attract the developer community, facilitate product and service creation and extend their influence beyond the four walls of their head offices. Cloud Speech API by Mountain View converts audio into text in over 80 languages.

You can either transcribe incoming audio from the phone’s microphone or an application, or control the device using your voice. This is possible because the tool applies wide-range neural network models targeted at processing natural language. The first obvious question is: What is a neural network and what is it for?

There are a lot of definitions of “neural network” and some of them are extremely complex. One of the easiest to understand may be by Dr. Simon Haykin in his book ‘Neural Networks: A comprehensive foundation: “A neural network is a massively parallel distributed processor made up of simple processing units that has a natural propensity for storing experiential knowledge and making available for use.”

How does an Artificial Neural Network (ANN) acquire and store knowledge? Through a learning process and neural interconnections that store the information and generate an output stimulus. To some extent, its learning and processing procedure is similar to the procedure of the human brain.

For a neural network to acquire knowledge a learning algorithm is required. The algorithm’s process is based on randomly and sequentially applying a series of training data from which the network gains information and then learns from. It’s a matter of patterns.

There are three types of learning:

● Supervised learning: input values are entered and generate output values. These results are compared to the correct values and any network deviations are corrected to adjust the process.

● Reinforcement learning: input values are entered into the network and their output values are then checked for correctness.

● Unsupervised learning: the neural network creates classification patterns from which it sorts out the supplied information.

The main characteristics of any neural network are:

● Auto-organization and adaptability: adaptive learning algorithm.

● Non-lineal processing: it increases the capacity of the artificial neural network in terms of extracting and classifying patterns from among noise.

● Parallel processing: large number of nodes for greater interconnectivity.

Cloud Speech API: characteristics

Google’s new API contains some of the most interesting functionalities when you need an application programming interface linked to natural language processing, speech recognition and obtaining results in real time. This is important since a sufficiently high processing speed is needed to be able to respond immediately.

● Automatic Speech Recognition (ASR): an in-depth learning neural network is used to recognized speech, provide speech-based search features and transcribe speech.

● Streaming recognition: as the API processes and recognizes the user’s speech, it returns results in real time with no waiting times. This allows the application to offer all speech processing functionalities.

● Buffered audio support: the API processes sound from the microphone of an application or mobile device and packages it in various compression formats: FLAC, AMR, PCMU and linear-16. This compression is necessary to subsequently process the sound.

● Speech recognition in over 80 languages. This characteristic offers a major competitive advantage over other providers of similar services for external developers.

● Integrated API.

● Inappropriate content filtering.

Nuance, largest market rival

For a long time, when developers needed to incorporate speech recognition and natural language processing functionalities into their applications, their usual provider was Nuance. Its technology is part of many current market leaders in language interpretation such as voice assistant Siri and assistants by Apple, S-Voice and Samsung. Also, car manufacturers for instance usually need this type of resource for their on-board computers, e.g. BMW and Chrysler.

By releasing Cloud Speech API, Google aims to attract large mobile device and car manufacturers away from their current providers. In addition to processing speech and responding in real time through the cloud, it supports more languages: 80 languages for Speech API vs. 40 languages currently supported by Nuance’s mobile SDKs (for Android and iOS and browsers).

At the moment, Google Speech API’s access to the cloud is limited but the company has not yet revealed how limited it actually is. Any developer can fill out a simple form and start trying the application programming interface. In the medium term, it is expected that Google Speech API will charge developers for accessing and using it.

If you are interested in APIs, you can now try BBVA’s Sandbox manager.

It may interest you

Buy Now Pay Later B2B: What is it and how can it benefit your company?

In the dynamic world of payments, a new star has emerged in recent years: Buy Now Pay Later (BNPL), i.e. short-term financing that allows you to buy now and pay later. This model allows businesses to purchase goods or services and pay for them in installments, often interest-free, making it an attractive alternative to credit […]

04 November 2024
BBVA and Vecttor (Cabify): an innovative arrangement for driver cash management

BBVA and Vecttor, Cabify’s subsidiary engaged in managing vehicles with drivers, have entered into an alliance that saves time and provides security to the company and its drivers. The collaboration allows drivers to deposit cash collections at any BBVA ATM and Vecttor to automatically reconcile this activity from their accounts with those in the company’s […]

Treasury / 23 October 2024
BBVA Best Bank in World for its Open Banking Offer, according to Global Finance

BBVA has been recognized by Global Finance as the bank with the best global open banking offer for companies. This award comes on top of 12 other recognitions the magazine has bestowed on the company, such as the best bank for corporate clients and the one recognizing its AI factory as one of the best […]

Treasury / 21 October 2024

Name	Owner	Duration	Description
gobp.lang	BBVA	1 month	Language preference
aceptarCookies	BBVA	1 year	Configuration Accepted Cookies
_abck	BBVA	1 year	Helps protect against malicious website attacks
bm_sz	BBVA	4 hours	Helps protect against malicious website attacks
ADRUM_BTs	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
ADRUM_BT1	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
ADRUM_BTa	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
ADRUM_BT	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
xt_0d95e	Salesforce Marketing Cloud	Session	Remember user preferences (if any)
__s9744cdb192d044faa1bf201d29fafd1e	Salesforce Marketing Cloud	Session	Remember user preferences (if any)
wpml_browser_redirect_test	WPML	Session	Text translation in the portal
wp-wpml_current_language	WPML	24 hours	Text translation in the portal

Name	Owner	Duration	Description
AMCV_***	Adobe Analytics	Session	Unique Visitor IDs used in Cloud Marketing solutions
AMCVS_***	Adobe Analytics	2 years	Unique Visitor IDs used in Cloud Marketing solutions
demdex (safari)	Adobe Analytics	180 days	Create and store unique and persistent identifiers
sessionID	Adobe Analytics	Session	Launch's internal cookie used to identify the user
gpv_URL	Adobe Analytics	Session	Adobe Analytics plugin: getPreviousValue Capture the value of a certain variable in the following page view, in this case the prop1
gpv_level1	Adobe Analytics	Session	Cookie used to store the DataLayer levl1 of the previous page.
gpv_pageIntent	Adobe Analytics	Session	Cookie used to store the pageIntent of the previous page.
gpv_pageName	Adobe Analytics	Session	Cookie used to store the pagename of the previous page.
aocs	Adobe Analytics	Session	Cookie that stores the first values collected at the beginning of a process.
TTC	Adobe Analytics	Session	Cookie used to store the time between the App Page Visit event and the App Completed event.
TTCL	Adobe Analytics	Session	Cookie used to store the time between the LogIn event and App Completed.
s_cc	Adobe Analytics	Session	Determine if cookies are active
s_hc	Adobe Analytics	Session	Cookie used by Adobe for analytical purposes
s_ht	Adobe Analytics	Session	Cookie used by Adobe for analytical purposes
s_nr	Adobe Analytics	2 years	Determine the number of user visits
s_ppv	Adobe Analytics	Permanent	Adobe Analytics plugin: getPercentPageViewed Determine what percentage of the page a user views
s_sq	Adobe Analytics	Session	ClickMap/ActivityMap features
s_tp	Adobe Analytics	Session	Cookie used by Adobe for analytical purposes
s_visit	Adobe Analytics	2 years	Cookie used by Adobe to know when a session has been started.

Name	Owner	Duration	Description
OT2	VersaTag	90 days	VersaTag Cookie used to store a user id and the number of user visits.
u2	VersaTag	90 days	VersaTag Cookie where the user ID is stored
TargetingInfo 2	MediaMind	1 year	Cookie that serves to assign a unique random number that generates MediaMind.

Name	Owner	Duration	Description
mbox	Adobe Target	9 days	Cookie used by Adobe Target to test user experience customization.

Google releases API to convert audio into text: characteristics for developers

Cloud Speech API: characteristics

Nuance, largest market rival

It may interest you

Buy Now Pay Later B2B: What is it and how can it benefit your company?

BBVA and Vecttor (Cabify): an innovative arrangement for driver cash management

BBVA Best Bank in World for its Open Banking Offer, according to Global Finance