Is it Possible to Standardize Big Data?

  by    1   0

I was recently asked a question that I had heard before:

“Could we provide the raw data results from Big Data into a usable format so that it could be analyzed and validated using traditional methods?”

The question was posed by a traditional researcher who had minimal exposure to Big Data Analysis. In essence what he was asking was: “Could we transpose the filtered data into a standard format?” It got me thinking about the larger question:

Could we fit big data into the existing practices that we’ve all conformed to and have accepted?

To me, right now, at a time when use of Big Data is still in its infancy, I compare these beginnings to Chaos. Consider this:

Chaos theory is the study of nonlinear dynamics, in which seemingly random events are actually predictable from simple deterministic equations. In a scientific context, the word chaos has a slightly different meaning than it does in its general usage as a state of confusion, lacking any order.

Traditional Research has ALWAYS been controlled

When I stop to think about it, I realize how much things have changed in the last decade.

The way business has defined itself, however, hasn’t really changed. Business has typically created markets for products, identified the right customer for their product; it has been able to control this with successfully over the years. So, it stands to reason that research methodology has been designed to meet the needs of the business.

Consider this:

Research has always been managed in a controlled environment. This article provides an example. I’ve used the same methodology to portray the following scenario:

What if you were trying to gauge whether a football player was a good field goal kicker? You set your goal to 80%. If the goal kicker took 10 shots, then a lucky or unlucky shot may make a huge difference in the results. In order to eliminate the chances of anomalies, the goal kicker would have to kick the football more times –> 100. The chances of accuracy at this number =  95% confidence level  with a margin of error +/-5%.

If you want to compare this field goal kicker to other players, you will need to ensure the conditions are the same: the distance to the goal post, the ball standards, the wind conditions, the time of day, etc. The more varied the conditions, the less accurate the results.

Once the research is complete, you can state the results as facts with a degree of confidence.

Controlled Conditions + Statistical Significance = Facts.

In comparison…

Big Data is Unwieldy… controlling it continues to be a challenge

Big Data’s unstructured nature makes it increasingly difficult to capture let alone develop any real standards. Because it’s aggregated from thousands of sites, channels, and devices with varying levels of fields and types of data, there is no one system or standard that could possibly make sense of it all:

Big data is an all-encompassing term for any collection of data sets so large or complex that it becomes difficult to process using traditional data processing applications. The challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and privacy violations.

So imagine if we take the same example of the field goal kicker, but this time we assume that everyone is an average scorer. We also determine that we adjust our results for every player. So if a third goal kicker came in and outperformed the other two, we may make the assumption he had a lucky day.  However, as he continues to kick more and as we add more goal kickers, and our assessment may change.

As per this article,

Now imagine you had billions of data points all being collected and analyzed in real time and real conditions.

That’s the secret to the transformative power of big data.  By vastly increasing the data we use, we can incorporate lower quality sources and still be amazingly accurate.  What’s more, because we can continue to re-evaluate, we can correct errors in initial assessments and make adjustments as facts on the ground change.

That is the value of big data. We don’t need to manufacture controlled environments to accurately report events that have enough volume to increase accuracy over time.

Beyond the File Format

I’m not a Big Data analyst, nor am I traditional researcher, but I have been exposed to both. The methodologies are very different – right down to audience segmentation and how the information is extracted. They are two entirely different beasts….

Traditional research gives you:

  • The power to choose the audience
  • A finite universe based on company-defined criteria
  • The power to choose the questions
  • The power to moderate the answers in a controlled setting
  • The likelihood of the target audience’s receptiveness based on statistical significance

Big data analysis gives you:

  • An audience that has been organically defined
  • An audience already predisposed to a brand or category – hand-raisers
  • The power to gain deep insights into specific queries
  • Raw, un-moderated, unfiltered insights
  • Scale – this means volumes of discussion in the thousands and even millions does not require statistical significance
  • Real-time ability to leverage ongoing insights
  • The power to inform traditional research based on the most recent results

I have often written about the amount of guesswork that business has traditionally done to create hypotheses about their ideal customer, their purchase triggers and propensities.

With the advent of social intelligence technologies, we have removed the guesswork from identifying customers and their motivations. In fact, we now have the ability to tap even deeper into their behavior to draw out inferences  (in real time) beyond just the purchase.  We let the pieces fall where they may. I have personally seen the merits of social intelligence and its ability to influence business decisions in a very short amount of time.

My early exposure to Big Data:

When I was first exposed to Social Data, I was doing some random searches for my client, Taco Bell.  I found a couple of key forums that bashed the brand about its “mystery meat” with counter claims about the health benefits. As I delved deeper I also found some creative forums where people were willingly posting their “breakfast experiences,” or why they loved the brand. I also found one forum where fans would upload pics depicting creative ways to replicate Taco Bell recipes at home.

Remeber “I love Taco Bell” campaign that was launched sometime ago? This propelled a host of “I love Taco Bell” forum threads. If you search “I love Taco Bell” there are over 27.6 million search results. These results come from sites like DeadspinTumblrGrubstreet to name a few.

At the time these user-generated sites were driving much of the organic traffic for “I love Taco Bell,” not the brand itself. The client was unphased by these findings, convinced by the media team that their core audience did not frequent these sites. The client’s failure to see what was going on below the fold has been a common experience of mine.

But I digress…

Social Intelligence yields different results

The client was conducting their own focus group research at the time and had determined that males 18-28 were the key target demographic for their new products.

I conducted a social audit using multiple algorithms to scrape a multitude of social platforms for answers; my results turned up very different findings. What I found was that females were also strong proponents for the brand.

This was based on the findings after the product had been launched in the U.S. In particular, the intelligence had uncovered influencing discussions from Moms who were advocates for the new products as well as Taco Bell. In addition, we were able to extract verbatims (both good and bad) as they related to the campaign launch in the US as well as the receptivity to the product itself.

The ancillary findings revealed customer sentiment as it related to overall service, experience as well as potential communities and discussions where strong advocacy existed.

What social intelligence had been able to do, in this case, was inform the business about current perception of its brand, where the current customers of the product were and why they mattered.  It impacted the marketing decisions when it came to customer definition and messaging strategy.

While Taco Bell was uncomfortable with some of the negative comments about its service and brand, it was able to address franchise service issues that had existed for some time. By understanding current perception as it related to specific products, campaigns, locations, the audit became a resource to arm the business…. beyond just marketing

Can we bottle up social data that easily [so it fits into nicely into a box] ?

Let’s get back to the question that was asked in the beginning, “How do we standardize big data?”

This is what our social intelligence researcher said,

Our data consists of integration of disparate dataset types/sources and collection protocols. In order to provide a singular file of this aggregated data/protocols it would be necessary to develop a standardization of output across and each of our technology vendors in a consistent reporting format.   This is essentially one of the greatest challenges in today’s data economy: harnessing “big data”.  While this is a worthy and long term goal for anyone in an industry leveraging data, standardizing “big data” is a challenge we are ultimately not equipped to undertake.

Like everything else these days, you can’t change things in a vacuum. Eventually it will have impact on other parts of the business as well.

Big data cannot conform to existing research methodologies that easily. The research must transform to accommodate the complexities of this wild west.  Let’s not kid ourselves, it’s because of the current social dynamics that every facet of the business must adapt. Business must also transform.  But that’s another story for another day….

Image Reference: Pixaby

One thought on “Is it Possible to Standardize Big Data?

  1. Joe Cardillo says:

    Rather interesting chicken or egg thing going on. I think there is validity to each approach and we need both, but more importantly we need to access and structure what’s in between. The section where you compare / contrast approaches is the most intriguing, would be interesting at some point to see you lay that out with an example or two.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.