Big data for big solutions

In 2011, NASA, the European Space Agency and the Royal Astronomical Society publicized an open public challenge to be able to map better the small (extremely small) distortion which dark matter creates in our photographs of galaxies. Within the space of one week, a doctoral student specialized in glaciology (natural phenomena caused by water in solid state), came up with an algorithm which was much better than all the astronomic models used until then to map dark matter. In a question of days, he managed to outdo all the work that had been done over the last 10 years.

Where do these kind of people come from, or, better said, where can we find them? People with a privileged capacity to resolve problems, stowaways who go unnoticed in what Eggers and MacMillan call the solution economy? In this case, we can find them in Kaggle, a competition platform in the world of data analysis and design of predictive models.

In exchange for a fee, Kaggle introduces an incentive for mutual advantage in the gap between those who need better analysis and those who have the adequate skills. The essence of this company based in San Francisco is competition, “solutionism”. It is a platform which hosts competitions between statisticians, mathematicians, computer specialists, economists, scientists and anyone with a high analytical profile who is able to propose the most accurate solution for a certain problem. What is ultimately sought? Finding the perfect predictive model.

Data Science

In addition to the payment for results (the glaciologist in question won a reward of 3,000 dollars), Kaggle’s model also includes the exchanging of public value, the payment in the valuable currency of reputation. On the website, a ranking similar to that of the ATP ranking for tennis players or that of the golfing world’s OWGR, shows the score of the best data analysts. In fact, we will find a Spanish participant in this ranking. He currently stands in the tenth position, but he knows what it is like to be right at the top. Therefore, it is logical for many companies in the field of big data to analyze these competitions or who put out adverts to be able to find data analysts with excellent resolution capacity from over 200,000 participants. One of them is Tim Salimans, a Dutch student whose career changed course when he received a Microsoft Research grant for the originality and precision of his predictive models applied to chess. Since then, Salimans has taken part in 14 competitions, and is commonly to be found in the top positions.

For a data scientists (which is the most widely accepted term in English), it is not only a question of the prize, but it is also a powerful incentive to think that their models can be applied, useful, and that they might have a measurable impact on a product or business. A company with a wealth of data has an amazing amount of possibilities. For example, AXA, the insurance company, has launched a competition to find an algorithm which could revolutionize its sector: opening up data about 50,000 car trips, it looks for the algorithm which is best able to find the telematic monitoring to identify the driving patterns making each one of us a unique driver, so predicting risks and offering a customized insurance policy in accordance with this information.

What can I do with these data?

Can the early care and hospitalization process be improved by cross-referencing the data at the disposal of insurance companies? Is it better to improve the Microsoft Kinect gesture detection system? What can you contribute in the search for Higgs’ Boson? It is not easy or intuitive to answer these questions because not all the companies know what to do with their data. This is why Kaggle has introduced a new twist: from competing for the best solution to competing to identify the best problem which should be solved. At the same time, in order to diversify its strategy, Kaggle is beginning to introduce its big data analysis in lucrative sectors such as the energy industry, which it offers solutions to help producers generate more while also bringing down their extraction costs.

On other occasions, the challenge is more simple to grasp: a restaurant chain wants to know what factors play a part in making some establishments successful, while others fail; or large department stores such as Macy’s want to know what facts have an impact on sales. These are examples of challenges which we find in another community, that of the CrowdAnalytics.com website, which brings together over 5,000 data scientists from over 50 countries, most of whom have a Ph.D or MBA, and who propose that their models be used to resolve a business’s needs. The tests on all the models are first tested using public and open data, not using the company’s own information. Sometimes the solution knocks on the door in less than 24 hours.

Data Science

Image: crowdAnalytics.com

Big data for a good cause

The big data environment does not only consist of tech companies, bank product providers, financial services or business intelligence, in fact now more and more figures are emerging from the fields of civil technologies and non-governmental organizations.

DataKind, for example, puts data scientists in contact with third sector organizations to apply their knowledge in humanitarian and social problems. All their work is done for free, and from their base in New York, they coordinate data expert teams in Bangalore, Dublin, San Francisco, Singapore, the UK and Washington. What do they apply their skills to? The help small NGOs to work better on the ground, analyzing social, child-related and educational policies in places such as England and the United States, or, using statistical models applied to legal proceedings, they identify certain patterns followed by judges of the European Court of Human Rights when ruling on different causes. The Economist has coined a term for these geek philanthropists: data huggers.

Bayes impact is working along these same lines, by creating a data model to shorten the response time an ambulance needs to deal with an emergency in the city of San Francisco, and another to understand how a recipient will accept a kidney transplant. It has also collaborated with the Michael J. Fox Foundation by improving systems to diagnose the progress of Parkinson’s disease in patients.

For every solution, every response, every success story in the big data eco-system, a wealth of new questions arise. We have looked at examples of companies, organizations and communities which create new forms of exchange, which perform a type of barter between big data on one hand and the algorithms on the other; reputation in exchange for brilliance and originality. If you don’t know what to do with your data, perhaps you can try to open them in a highly specialized and competitive environment, and see what happens. More and more people are taking the plunge, either favoring collaboration or by encouraging competition, and achieving surprising results.

It may interest you

What are fintechs and how do they work?

Fintechs are financial platforms that democratize finance, as well as the ecosystem, technology and companies on which they rely Fintechs are the next iteration of the financial world. What are these financial platforms and what types are there? In Spain, fintech companies are creating a mature and growing market thanks to the inherent advantages of […]

Startups / 03 October 2022
What is a broker and what is it for?

Brokers are tools that allow active trading on financial markets, and they are also the people who execute those orders. In one way or another, brokers have been with us for more than half a millennium. Although they are now known as trading platforms which can be used at different levels, from beginner through to […]

Startups / 29 March 2022
The fintech industry is growing in Spain, with the help of open banking

In 2020, the fintech industry consolidated in Spain, with a sector growth of +15% in the year

Startups / 26 August 2021

Name	Owner	Duration	Description
gobp.lang	BBVA	1 month	Language preference
aceptarCookies	BBVA	1 year	Configuration Accepted Cookies
_abck	BBVA	1 year	Helps protect against malicious website attacks
bm_sz	BBVA	4 hours	Helps protect against malicious website attacks
ADRUM_BTs	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
ADRUM_BT1	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
ADRUM_BTa	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
ADRUM_BT	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
xt_0d95e	Salesforce Marketing Cloud	Session	Remember user preferences (if any)
__s9744cdb192d044faa1bf201d29fafd1e	Salesforce Marketing Cloud	Session	Remember user preferences (if any)
wpml_browser_redirect_test	WPML	Session	Text translation in the portal
wp-wpml_current_language	WPML	24 hours	Text translation in the portal

Name	Owner	Duration	Description
AMCV_***	Adobe Analytics	Session	Unique Visitor IDs used in Cloud Marketing solutions
AMCVS_***	Adobe Analytics	2 years	Unique Visitor IDs used in Cloud Marketing solutions
demdex (safari)	Adobe Analytics	180 days	Create and store unique and persistent identifiers
sessionID	Adobe Analytics	Session	Launch's internal cookie used to identify the user
gpv_URL	Adobe Analytics	Session	Adobe Analytics plugin: getPreviousValue Capture the value of a certain variable in the following page view, in this case the prop1
gpv_level1	Adobe Analytics	Session	Cookie used to store the DataLayer levl1 of the previous page.
gpv_pageIntent	Adobe Analytics	Session	Cookie used to store the pageIntent of the previous page.
gpv_pageName	Adobe Analytics	Session	Cookie used to store the pagename of the previous page.
aocs	Adobe Analytics	Session	Cookie that stores the first values collected at the beginning of a process.
TTC	Adobe Analytics	Session	Cookie used to store the time between the App Page Visit event and the App Completed event.
TTCL	Adobe Analytics	Session	Cookie used to store the time between the LogIn event and App Completed.
s_cc	Adobe Analytics	Session	Determine if cookies are active
s_hc	Adobe Analytics	Session	Cookie used by Adobe for analytical purposes
s_ht	Adobe Analytics	Session	Cookie used by Adobe for analytical purposes
s_nr	Adobe Analytics	2 years	Determine the number of user visits
s_ppv	Adobe Analytics	Permanent	Adobe Analytics plugin: getPercentPageViewed Determine what percentage of the page a user views
s_sq	Adobe Analytics	Session	ClickMap/ActivityMap features
s_tp	Adobe Analytics	Session	Cookie used by Adobe for analytical purposes
s_visit	Adobe Analytics	2 years	Cookie used by Adobe to know when a session has been started.

Name	Owner	Duration	Description
OT2	VersaTag	90 days	VersaTag Cookie used to store a user id and the number of user visits.
u2	VersaTag	90 days	VersaTag Cookie where the user ID is stored
TargetingInfo 2	MediaMind	1 year	Cookie that serves to assign a unique random number that generates MediaMind.

Name	Owner	Duration	Description
mbox	Adobe Target	9 days	Cookie used by Adobe Target to test user experience customization.

Big data for big solutions

It may interest you

What are fintechs and how do they work?

What is a broker and what is it for?

The fintech industry is growing in Spain, with the help of open banking