Dark Data: what it is and examples

Dark Data

Dark data is a new, old problem that has been known to analysts and statisticians for years. This refers to all the data and metadata not being gathered, structured and analyzed, and which form a constant waste of potentially valuable information that businesses are letting slip away. Developing our own utilities that perfectly fit what we need can be an excessively intensive task.

2 min reading

16 July 2018

It is a complicated task to manage to take advantage of Dark Data. The first step is to identify what data a business has stored that are not being analyzed, the second is to try to foresee the potential of such data before endeavoring on development work to extract them.

Developing our own utilities that perfectly fit what we need can be an excessively intensive task, especially if we are not able to see what the final value that we will be able to get will be, either in terms of immediate monetary revenue or added value for other parts of the business. Fortunately there are multiple tools and APIs to work with and immerse yourself in this mass of data.

IBM OpenWhisk

A clear example of Dark Data is the content of videos that many platforms host. Usually the analysis focuses on the metadata surrounding the video such as the title, date, duration or tags generated or applied by humans.

With OpenWhisk you can analyze the content within each of the videos’ scenes. It does this by extracting individual shots and, in parallel, it identifies what happens in each of them: who appears, what texts there are, what is represented, what objects can be seen, and so on.

This is what IBM calls Dark Vision. Once the data concerning each of the video’s scenes is obtained, the level of improvements and possibilities increases exponentially.

Stanford’s DeepDive

Scholars from Stanford University in California created DeepDive, another system to extract data in a structured way. The main advantage of DeepDive is that it creates SQL tables with data extracted from documents. The platform has been used to categorize a totally disorganized corpus of data by several universities and research groups, with surprising results.

It is a qualitative leap compared to other platforms and software based on the initial manual identification of the data. DeepDive automates much of the process with machine learning. It allows the group in charge of the analysis to define the objectives to be achieved instead of scheduling concrete and specific tasks. Once these objectives are clear, the system will begin analysis and extraction.

The developers of DeepDive have left room for inaccuracies and to understand ambiguous data. For example, it understands that two terms are the same even though one contains spelling mistakes.

Background

Experts in Dark Data say the first step is “restoring the context”. Starting to analyze each piece of data by emulating the situation prior to it being stored. These techniques can serve to greatly improve the future success of the analysis.

Each business is different, and the Dark Data generated by a bank is very different from a law firm or anyone with a social network or an e-commerce site. Managing to “light up” dark data has many challenges at a technical level, and solutions can range from applying a better methodology to the existing development to hiring a specific disciplinary team if it is predicted that the hidden value is huge.

In fact, the best situation is for the data to always remain structured from when they are gathered and preventing them from becoming Dark Data due to technical negligence. If the technical resources are in place, no data should be given as lost once stored.

These are the advanced settings for first-party and third-party cookies. Here you can change the parameters that will affect your browsing experience on this website.

Technical Cookies (required)

These cookies are used to give you secure access to areas with personal information and to identify you when you log in.

View

Name	Owner	Duration	Description
gobp.lang	BBVA	1 month	Language preference
aceptarCookies	BBVA	1 year	Configuration Accepted Cookies
_abck	BBVA	1 year	Helps protect against malicious website attacks
bm_sz	BBVA	4 hours	Helps protect against malicious website attacks
ADRUM_BTs	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
ADRUM_BT1	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
ADRUM_BTa	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
ADRUM_BT	Salesforce Marketing Cloud	Session	Required for monitoring of the service, inherent to SFMC
xt_0d95e	Salesforce Marketing Cloud	Session	Remember user preferences (if any)
__s9744cdb192d044faa1bf201d29fafd1e	Salesforce Marketing Cloud	Session	Remember user preferences (if any)
wpml_browser_redirect_test	WPML	Session	Text translation in the portal
wp-wpml_current_language	WPML	24 hours	Text translation in the portal

I don't want analytics cookies

They are used to track the activity or number of visits anonymously. Thanks to them we can constantly improve your browsing experience

With your selection, we cannot offer you a continuously improved browsing experience.

See complete listing for this type of cookie

Name	Owner	Duration	Description
AMCV_***	Adobe Analytics	Session	Unique Visitor IDs used in Cloud Marketing solutions
AMCVS_***	Adobe Analytics	2 years	Unique Visitor IDs used in Cloud Marketing solutions
demdex (safari)	Adobe Analytics	180 days	Create and store unique and persistent identifiers
sessionID	Adobe Analytics	Session	Launch's internal cookie used to identify the user
gpv_URL	Adobe Analytics	Session	Adobe Analytics plugin: getPreviousValue Capture the value of a certain variable in the following page view, in this case the prop1
gpv_level1	Adobe Analytics	Session	Cookie used to store the DataLayer levl1 of the previous page.
gpv_pageIntent	Adobe Analytics	Session	Cookie used to store the pageIntent of the previous page.
gpv_pageName	Adobe Analytics	Session	Cookie used to store the pagename of the previous page.
aocs	Adobe Analytics	Session	Cookie that stores the first values collected at the beginning of a process.
TTC	Adobe Analytics	Session	Cookie used to store the time between the App Page Visit event and the App Completed event.
TTCL	Adobe Analytics	Session	Cookie used to store the time between the LogIn event and App Completed.
s_cc	Adobe Analytics	Session	Determine if cookies are active
s_hc	Adobe Analytics	Session	Cookie used by Adobe for analytical purposes
s_ht	Adobe Analytics	Session	Cookie used by Adobe for analytical purposes
s_nr	Adobe Analytics	2 years	Determine the number of user visits
s_ppv	Adobe Analytics	Permanent	Adobe Analytics plugin: getPercentPageViewed Determine what percentage of the page a user views
s_sq	Adobe Analytics	Session	ClickMap/ActivityMap features
s_tp	Adobe Analytics	Session	Cookie used by Adobe for analytical purposes
s_visit	Adobe Analytics	2 years	Cookie used by Adobe to know when a session has been started.

I don't want advertising cookies

They allow the advertising shown to you to be customized and relevant to you. Thanks to these cookies, you will not see ads that you are not interested in.

Your choice means you will not see customized ads, only generic ones.

See complete listing for this type of cookie

Name	Owner	Duration	Description
OT2	VersaTag	90 days	VersaTag Cookie used to store a user id and the number of user visits.
u2	VersaTag	90 days	VersaTag Cookie where the user ID is stored
TargetingInfo 2	MediaMind	1 year	Cookie that serves to assign a unique random number that generates MediaMind.

I don't want customization cookies

These cookies are related to general features such as the browser you use.

With your selection, we cannot offer you a continuously improved browsing experience.

See complete listing for this type of cookie

Name	Owner	Duration	Description
mbox	Adobe Target	9 days	Cookie used by Adobe Target to test user experience customization.

Dark Data

IBM OpenWhisk

Stanford’s DeepDive

Background

It may interest you

Buy Now Pay Later B2B: What is it and how can it benefit your company?

BBVA and Vecttor (Cabify): an innovative arrangement for driver cash management

BBVA Best Bank in World for its Open Banking Offer, according to Global Finance