Democratizing Data: Bringing Data Harvesting and Analytics to the Masses

- Democratizing Data: Bringing Data Harvesting and Analytics to the Masses

To date, access to this kingmaking power is very unequal. It has been concentrated in the hands of a select few companies, most of which were already quite powerful to begin with. They alone possess the enormous resources required to collect quality data and turn it into value. 

Meanwhile, the rest of the world is drowning in data, but that data and the power it confers remain out of our reach. By extension, the ability to leverage that data in ways that maximize its potential for serving the collective good of society as a whole is limited.

Fortunately, that is now changing. New means of providing access to data, as well as novel tools for correlating data, contextualizing data and analyzing data in real time, promise to usher in an age of democratized data. 

In this article, Flux provides a look at how data is being democratized. It focuses on the specific use case of collecting and analyzing environmental data, although the trends discussed below have the potential to play out anywhere that data leads to insight. 

The Challenges of Data Democratization

In theory, anyone can collect data or access the various open data sets that are collected by government agencies and other organizations committed to open data.

Anyone can theoretically analyze and process data, too; after all, most of the major frameworks for big data processing, like Hadoop and Spark, are open source. There’s no technical or legal barrier stopping someone from downloading and running them. 

At a practical level, however, actually collecting, transforming, storing and/or analyzing data on a large scale is unfeasible for most individuals and organizations today. That’s true for a number of reasons:

  • Data is harvested at high, broad levels. Most open-access environmental data sets focus on broad regions. If you want to study a particular place or microclimate, it can be hard to find the data you need. And while you could theoretically collect the data yourself, deploying and managing your own sensors or other data collectors is often not realistic because of a traditional lack of affordable, open data sensors.
  • Data may be biased. Even if you have access to raw open-access data, you can’t be certain that the data is not presented in ways that skew your ability to interpret it accurately or fairly.
  • Data storage is expensive. Sure, you can now store data in the public cloud for just pennies per gigabyte. But when your data sets reach terabytes in size, those costs add up and few organizations have the budget to sustain them over the long term. (And if you don’t collect data over the long term, you are likely to miss out on important insights, especially in contexts like environmental data where change typically results from infrequent, sudden events.)
  • Pre-collected data is outdated data. If you rely on data that was collected by someone else, chances are that the data will be stale by the time you access it. Plus, it will take you more time to transform the data, clean up data-quality issues and run analysis. By the time all of this is done, the insights you can glean from the data may be outdated. The only way to solve these problems is to collect and analyze data in real time, but again, most organizations lack the resources to do this on their own.
  • Lack of data interpretation and artificial intelligence (AI) tools. Again, many frameworks for collecting and processing large data sets are open source. But advanced tools for making sense of data, like AI algorithms, tend to be proprietary. The companies that develop these tools invest huge amounts of money in them and rarely make them available to third parties.
  • Poor incentives for data and AI sharing. Part of the reason for the challenge described in the preceding point is that few organizations have strong incentives to share their data and proprietary AI tools. To date, most companies that benefit from data monetize it through advertising or internal research; there has been little reward for them in sharing their data and data tools with third parties.

What all of the above means is that data has tended, so far, to be very undemocratic. It increases the power of organizations that are powerful to begin with and therefore have the resources to undertake large-scale, proprietary data collection and analysis programs. It leaves everyone else struggling to make sense of the tidbits of data that are available from open data sets, which are usually of limited value for gaining real-time insights. And even if you find access to meaningful, relevant data, you may not have the advanced AI tools that are necessary to turn the data into value.

This is why we struggle to maximize the value of all of the data that is being generated around us. As Microsoft’s Lucas Joppa noted in Nature:

“Today, we know more than ever about human activity. More than one-quarter of the 7.6 billion people on Earth post detailed information about their lives on Facebook at least once a month. Nearly one-fifth do so daily. … Yet we are flying blind when it comes to understanding the natural world.” 

Joppa continued by pointing to some of the reasons why we do such a poor job of transforming all of the environmental data surrounding us into insight. The problem is not only that scientists “don’t have the kinds of data needed to make such predictions” but also that they “lack the algorithms to convert data into useful information.” 

See Also
Pioneering cryptographer Hal Finney saw the need for an untraceable form of digital cash, and his work ultimately fostered the creation of Bitcoin.

When all but a handful of organizations have the power to glean meaningful insight from environmental data, and they are not actually doing it, people who interact in critical ways with the environment — like foresters and builders — cannot make data-driven decisions that are in the best interests of all stakeholders. Nor can anyone hold them to account.

What It Takes to Democratize Data

It doesn’t have to be this way. Data can be democratized in ways that make it practical for any person or group to derive insights from the data surrounding us.

Doing so requires:

  • The ability to store and share data in an open, decentralized way. Shared data would not only make more data available to people who need it but would also — and this is a really important part of data democratization — allow us to place data from many different sources on the same plane and in the same context, so that we maximize visibility and insights.
  • An incentives system that rewards people for sharing data with each other and makes it feasible to monetize data in ways that are not purely self-serving.
  • Access to AI-powered data analysis tools that anyone can use.
  • Open, affordable data harvesters that anyone can deploy.

These solutions are all part of the platform that Flux is building. Flux is the antidote to the natural tendency of big data and AI to monopolize power, rather than democratize it. By leveraging blockchain technology for open, affordable data storage, Flux provides the advanced AI tools necessary to reward organizations for sharing and collecting data via open-sourced hardware sensors called MICOs. In summary, Flux is creating a new environmental standard for data collection, storage and intelligence. 

Learn more by downloading the Flux white paper.