Skip to main content

Democratizing Data: Bringing Data Harvesting and Analytics to the Masses

- Democratizing Data: Bringing Data Harvesting and Analytics to the Masses

To date, access to this kingmaking power is very unequal. It has been concentrated in the hands of a select few companies, most of which were already quite powerful to begin with. They alone possess the enormous resources required to collect quality data and turn it into value. 

Meanwhile, the rest of the world is drowning in data, but that data and the power it confers remain out of our reach. By extension, the ability to leverage that data in ways that maximize its potential for serving the collective good of society as a whole is limited.

Fortunately, that is now changing. New means of providing access to data, as well as novel tools for correlating data, contextualizing data and analyzing data in real time, promise to usher in an age of democratized data. 

In this article, Flux provides a look at how data is being democratized. It focuses on the specific use case of collecting and analyzing environmental data, although the trends discussed below have the potential to play out anywhere that data leads to insight. 

The Challenges of Data Democratization

In theory, anyone can collect data or access the various open data sets that are collected by government agencies and other organizations committed to open data.

Anyone can theoretically analyze and process data, too; after all, most of the major frameworks for big data processing, like Hadoop and Spark, are open source. There's no technical or legal barrier stopping someone from downloading and running them. 

At a practical level, however, actually collecting, transforming, storing and/or analyzing data on a large scale is unfeasible for most individuals and organizations today. That's true for a number of reasons:

  • Data is
    harvested at high, broad levels
    . Most open-access environmental data sets focus on broad regions.
    If you want to study a particular place or microclimate, it can be hard to
    find the data you need. And while you could theoretically collect the data
    yourself, deploying and managing your own sensors or other data collectors
    is often not realistic because of a traditional lack of affordable, open
    data sensors.
  • Data may
    be biased
    . Even
    if you have access to raw open-access data, you can’t be certain that the
    data is not presented in ways that skew your ability to interpret it
    accurately or fairly.
  • Data
    storage is expensive
    . Sure,
    you can now store data in the public cloud for just pennies per gigabyte.
    But when your data sets reach terabytes in size, those costs add up and
    few organizations have the budget to sustain them over the long term. (And
    if you don't collect data over the long term, you are likely to miss out
    on important insights, especially in contexts like environmental data
    where change typically results from infrequent,
    sudden events
  • Pre-collected
    data is outdated data
    . If you
    rely on data that was collected by someone else, chances are that the data
    will be stale by the time you access it. Plus, it will take you more time
    to transform the data, clean up data-quality issues and run analysis. By
    the time all of this is done, the insights you can glean from the data may
    be outdated. The only way to solve these problems is to collect and analyze
    data in real time, but again, most organizations lack the resources to do
    this on their own.
  • Lack of
    data interpretation and artificial intelligence (AI) tools
    . Again, many frameworks for
    collecting and processing large data sets are open source. But advanced
    tools for making sense of data, like AI algorithms, tend to be
    proprietary. The companies that develop these tools invest huge amounts of
    money in them and rarely make them available to third parties.
  • Poor
    incentives for data and AI sharing
    . Part of the reason for the challenge described in the preceding
    point is that few organizations have strong incentives to share their data
    and proprietary AI tools. To date, most companies that benefit from data
    monetize it through advertising or internal research; there has been
    little reward for them in sharing their data and data tools with third

What all of the above means is that data has tended, so far, to be very undemocratic. It increases the power of organizations that are powerful to begin with and therefore have the resources to undertake large-scale, proprietary data collection and analysis programs. It leaves everyone else struggling to make sense of the tidbits of data that are available from open data sets, which are usually of limited value for gaining real-time insights. And even if you find access to meaningful, relevant data, you may not have the advanced AI tools that are necessary to turn the data into value.

This is why we struggle to maximize the value of all of the data that is being generated around us. As Microsoft’s Lucas Joppa noted in Nature:

“Today, we know more than ever about human activity. More than one-quarter of the 7.6 billion people on Earth post detailed information about their lives on Facebook at least once a month. Nearly one-fifth do so daily. ... Yet we are flying blind when it comes to understanding the natural world.” 

Joppa continued by pointing to some of the reasons why we do such a poor job of transforming all of the environmental data surrounding us into insight. The problem is not only that scientists “don’t have the kinds of data needed to make such predictions” but also that they “lack the algorithms to convert data into useful information.” 

When all but a handful of organizations have the power to glean meaningful insight from environmental data, and they are not actually doing it, people who interact in critical ways with the environment — like foresters and builders — cannot make data-driven decisions that are in the best interests of all stakeholders. Nor can anyone hold them to account.

What It Takes to Democratize Data

It doesn’t have to be this way. Data can be democratized in ways that make it practical for any person or group to derive insights from the data surrounding us.

Doing so requires:

  • The ability to store and share data in
    an open, decentralized way
    . Shared data would not only make more data available to people who
    need it but would also — and this is a really important part of data
    democratization — allow us to place data from many different sources on
    the same plane and in the same context, so that we maximize visibility and
  • An incentives system that rewards
    people for sharing data with each other and makes it feasible to monetize
    data in ways that are not purely self-serving
  • Access to AI-powered data analysis
    tools that anyone can use
  • Open, affordable data harvesters that
    anyone can deploy

These solutions are all part of the platform that Flux is building. Flux is the antidote to the natural tendency of big data and AI to monopolize power, rather than democratize it. By leveraging blockchain technology for open, affordable data storage, Flux provides the advanced AI tools necessary to reward organizations for sharing and collecting data via open-sourced hardware sensors called MICOs. In summary, Flux is creating a new environmental standard for data collection, storage and intelligence. 

Learn more by downloading the Flux white paper.