Image via Wikimedia Commons.
The massive upswell of data will only increase in the decade ahead, a force set to transform industry and infrastructure much as it has in consumer internet services. This is a fundamental aspect of the “the industrial internet” vision that Pivotal CEO Paul Maritz and Bill Ruh, VP of General Electric articulated at the company’s launch. Concurrently, the release of the White House’s new open data mandate, and advances in sophisticated analytics tools and machine learning, could accelerate massive shifts in critical infrastructure and core industries such as healthcare, energy, and agriculture.
During a talk at the G-8 Open Data for Agriculture Conference, Secretary of Agriculture Tom Vilsack spoke of the potential impact of these efforts and technologies. He stated, “The digital revolution fueled by open data is starting to do for the modern world of agriculture what the industrial revolution did for agricultural productivity over the past century.” To support this goal, Vilsack announced the USDA’s new Food, Agriculture, and Rural data portal, featuring over 200 open datasets related to global satellite data mapping agricultural preparedness and crop conditions, genetics and genomics resources, and statistical research undertaken by the agency.
While increasing access to government data is a laudable goal, such releases need to also be useful, lest they become what open data advocates derisively refer to as a “data dump”. Government data is notoriously messy, often stored in legacy systems or difficult-to-scrape formats such as PDF. The White House’s open data mandate aims to address this issue, calling for the release of specifically machine-readable datasets. Paired with sophisticated machine learning and analytics tools, such releases can transform industries and accelerate innovation and even new businesses.
The machine learning component of this is detailed in a recent interview with Nigel Shadbolt, a Professor at the University of Southampton and co-founder of the Open Data Institute, by Alex Howard at O’Reilly Radar:
What the web has done is finish this with something that looks a lot like a supremely distributed database. Now, that distributed knowledge base is one version of the Semantic Web. The way I got into open data was the notion of using linked data and semantic Web technologies to integrate data at scale across the web — and one really high value source of data is open government data.
Shadbolt characterizes the twin pillars of open data advocacy — economic and increased government transparency — as complimentary, rather than separate or in opposition, as they are often characterized:
The demand side [of open data] can be characterized. It’s not just economic. It will have to do with transparency, accountability and regulatory action. The economic side of open data gives you huge room for maneuver and substantial credibility when you can say, “Look, this dataset of spending data in the UK, published by local authorities, is the subject of detailed analytics from companies who look at all data about how local authorities and governments are spending their data. They sell procurement analysis insights back to business and on to third parties and other parts of the business world, saying ‘This is the shape of how the UK PLC is buying.’”
This burgeoning “open data economy” is investigated in a series of articles at O’Reilly detailing how open data can spur innovation in existing industries and create new economic opportunities. This was supported by a Forrester report released in February which outlined the many sectors already benefiting from increased access to government data, including healthcare services, banking, and personal finance. As such government releases become more ambitious, and large industries embrace data-driven development and innovation, the impact of open data is set to increase significantly in years ahead.
About the Author