The potential impact of the ongoing global data explosion continues to spark the imagination. A 2018 report estimated that every second every day produces an average of 1.7MB of data – and annual data production has more than doubled since then and is expected to more than double again by 2025. A report by the McKinsey Global Institute estimates that clever use of big data could generate $ 3 trillion in additional economic activity and enable applications as diverse as self-driving cars, personalized healthcare, and traceable food supply chains.
However, adding all of this data to the system also creates confusion about how to find, use, manage and share it legally, safely and efficiently. Where does a particular data set come from? Who owns what? Who can see certain things? Where does it live Can it be shared? Can it be sold? Can people see how it was used?
As the uses of data grow and become more ubiquitous, manufacturers, consumers, and data owners and administrators find that they don’t have a playbook to follow. Consumers want to connect to data they trust so they can make the best decisions possible. Manufacturers need tools to securely share their data with those who need it. But technology platforms fall short, and there are no real common sources of truth to bind the two sides together.
How do we find data? When should we postpone it?
In a perfect world, data would flow freely like an accessible utility. It could be packaged and sold like raw materials. It could be viewed easily and without complications by anyone who was authorized to do so. Its origins and movements could be traced, eliminating any concerns about nefarious uses anywhere on the line.
Today’s world doesn’t work that way, of course. The massive data explosion has created a long list of problems and opportunities that make it difficult to share information.
Since data is created almost everywhere inside and outside a company, the first challenge is identifying what is being collected and how it can be organized so that it can be found.
A lack of transparency and sovereignty over stored and processed data and infrastructures leads to trust problems. Moving data from multiple technology stacks to central locations is expensive and inefficient today. The lack of open metadata standards and generally accessible application programming interfaces can make it difficult to access and use data. The existence of sector-specific data ontologies can make it difficult for people outside the sector to benefit from new data sources. Multiple stakeholders and difficulties in accessing existing data services can make sharing difficult without a governance model.
Europe takes the lead
Despite the problems, data sharing projects are being carried out on a large scale. One, backed by the European Union and a non-profit group, is creating an interoperable data exchange called Gaia-X, where companies can share data under the protection of strict European data protection laws. The exchange is intended as a vessel for cross-industry data exchange and as a repository for information about data services relating to artificial intelligence (AI), analytics and the Internet of Things.
Hewlett Packard Enterprise recently announced a solution framework to help businesses, service providers and public organizations participate in Gaia-X. The Dataspaces platform, currently under development, based on open standards and cloud native, democratizes access to data, data analytics and AI by making it more accessible to domain experts and general users. It provides a place where experts from domain areas can more easily identify trustworthy data sets and carry out analyzes of operational data securely – without the data always having to be costly to move to central locations.
By using this framework to integrate complex data sources in IT landscapes, companies can provide data transparency on a large scale so that everyone – whether data scientist or not – knows what data they have, how they can access it and how it can be used in real time .
Data sharing initiatives are also high on the corporate agenda. A major priority for businesses is reviewing data used to train internal AI and machine learning models. AI and machine learning are already widely used in business and industry to drive continuous improvement in everything from product development to recruiting to manufacturing. And we’re just getting started. IDC predicts that the global AI market will grow from $ 328 billion in 2021 to $ 554 billion in 2025.
To unlock the true potential of AI, governments and corporations need to better understand the collective heritage of all of the data that powers these models. How do AI models make their decisions? Do you have prejudices? Are you trustworthy? Were untrustworthy people able to access or change the data a company trained its model on? The more transparent and efficient connection between data producers and data consumers can help answer some of these questions.
Build data maturity
Companies will not be able to unlock all of their data overnight. But they can prepare to take advantage of technologies and management concepts that contribute to a data-sharing mentality. You can ensure that they develop the maturity to consume or share data strategically and effectively rather than doing so on an ad hoc basis.
Data producers can prepare for a wider distribution of data by taking a number of steps. You need to understand where your data is and how they collect it. Then they need to ensure that the people who will be consuming the data have the ability to access the right records at the right time. That is the starting point.
Then comes the harder part. When a data producer has consumers – who can be inside or outside the company – they need to connect to the data. This is both an organizational and a technological challenge. Many organizations want governance over how data is shared with other organizations. The democratization of data – at least being able to find it across organizations – is a problem of organizational maturity. How do you deal with that?
Companies that contribute to the automotive industry actively share data with vendors, partners and subcontractors. It takes a lot of parts – and a lot of coordination – to assemble a car. Partners willingly share information on everything from engines to tires to internet-enabled repair channels. Automotive Dataspaces can serve more than 10,000 providers. In other industries, however, it can be more isolated. Some large companies may not want to share sensitive information, even within their own network of business units.
Create a data mentality
Companies on both sides of the consumer-producer continuum can improve their data-sharing mentality by asking themselves these strategic questions:
- When companies develop AI and machine learning solutions, where do teams get their data from? How do you connect to this data? And how do you follow this history to ensure the trustworthiness and origin of the data?
- If data has value to others, what monetization path is the team taking today to increase that value, and how is it being managed?
- If a company is already exchanging or monetizing data, can it authorize a wider range of services on multiple platforms – on premise and in the cloud?
- How does the coordination of these providers with the same data sets and updates work today for companies that have to exchange data with providers?
- Do producers want to replicate their data or do they want to force people to bring models to them? Datasets can be so large that they cannot be replicated. Should a company host software developers on their platform, where their data resides, and store and outsource the models?
- How can employees in a data-consuming department influence the practices of upstream data producers within their organization?
To become active
The data revolution creates business opportunities – along with a great deal of confusion about how to strategically find, collect, manage, and learn from that data. Data producers and data consumers are increasingly separated from one another. HPE is building a platform that supports both local and public clouds, using open source as the foundation and solutions like the HPE Ezmeral Software Platform to create the common ground both sides need to make the data revolution run for them bring to.
Read the original article on Enterprise.nxt.
This content was created by Hewlett Packard Enterprise. It was not written by the editorial staff of the MIT Technology Review.