Open data for AI

Time to invest, again and always

6 min readMar 6, 2020

I love open data!
Today, I want to offer a pitch: invest in open data because it makes government technology better. Better gov tech means better services to people, which means better lives.
Worth thinking about, eh? Today I’ll focus on open data for artificial intelligence.

Context: the robots are here

In the digital age, algorithms are more common than ever. Algorithms and new ways of analysing data offer opportunities for great digital services and insights. But some algorithms also create risks.

Cute greet cartoon robot — Automation is increasingly common… and that can be a good thing!

Open data can help address some of the risks that come with algorithms. It can make the algorithms more effective and ethical. That means better outcomes for our communities and the people affected by new technologies.

I’d like to offer a few considerations for building and using algorithms, especially in the public sector. I’d also like to offer a connection to open data and encourage governments to continue to invest in open data as an enabler for emerging technology.

What’s open data?

Open data is data that anyone can easily access, use, manipulate and share for any reason. Usually, it’s machine-readable government data. Governments take their data, clean it, review it for security and privacy. They then release it on portals like the Government of British Columbia’s and the Government of Canada’s.

Open data is a critical enabler for digital government. It supports transparency, accountability and collaboration across an ecosystem. This kind of engagement is important for digital government to be effective and sustainable.

Increasingly, open data also enables breathtaking innovation. And it can help make algorithms more useful, legitimate and ethical.

AI is here, in your government

Computers have never been faster or cheaper. As this trend accelerates, people are innovating, including in simulating human intelligence. Artificial intelligence (AI) is the use of machines to do things that human minds would traditionally do. This includes things like applying rules at scale, identifying patterns, reasoning and using data to learn and improve.

AI is becoming more common in government. I see a lot of innovation in machine learning, which is the basis for systems that automatically learn as they consume and analyse data. Exciting stuff!

AI offers lots of benefits, including in the public sector where I work. In the Government of BC, we use it to estimate bus times, improve how we regulate safety systems and equipment, and facilitate the work of multilingual, interactive court interpreters. These are great examples from the west coast of Canada, but there are many more around the world. AI helps people do things like file their taxes, identify companies that are at risk of bankruptcy, predict opioid-related deaths, and find regulatory infractions.

Don’t sleep, there are snakes: the risks of AI

I’m excited about AI playing a big role in the public sector. Digital government is about using new technology from the Internet age to provide people with great services that they can trust. AI can help do this efficiently and in a way that serves the needs of our diverse communities.

That said, AI brings lots of risks. For individuals, we’re seeing AI for facial recognition. These use cases can create privacy risks. For example, facial recognition can provide access to services (including students foregoing roll call in Shenzhen!) but it can also help entities collect reams of intimate data. This data may enable insights that individuals haven’t anticipated or consented to.

AI can also undermine legitimacy and fairness. There are thousands of cases of mistaken identity though inaccurate facial recognition. These errors can affect diverse folks from soccer hooligans to school children. Check out AI-based courts in China. Some of the risks are horrific. The data we feed into algorithms is often biased, and they spit out bias. People deserve better.

Now that we can do anything, what will we do? (Bruce Mau)

It’s reassuring to see governments talking and acting to combat some of the ethical risks associated with algorithms. The Government of Canada created a Directive on Automated Decision-Making. Europe’s GDPR and Brazil’s Civil Rights Framework get us closer to a healthy way of managing the risks of AI. Governments are embracing ethics, including the City of Edmonton that recently appointed a Data and Analytics Ethics Advisor. Kudos.

These are good moves. But across the board, the public sector needs to double down on data protection and digital rights. It needs to build these things based on citizen views. In Canada, we know that the more intimate the effects, the less people like AI (check out the wonderful Ipsos work on the topic, especially slides 22–24). Governments should act accordingly.

Open data can help

In a brave new world where we use AI and confront its risks, there are tremendous opportunities to systematically apply ethical guidelines. AI Global and others are embracing fantastic approaches in this space. The Government of BC’s new digital principles reinforce our commitment to ethics in applying emerging technology.

As a complement to these ethical commitments and guides, governments need to double down on open data. Data is the lifeblood of AI. Without data, algorithms are hypothetical reflections of society with little actionable insight. Open data offers excellent opportunities for AI:

Training — To be useful, AI needs to ingest lots of data. Open data is often really rich, offering untapped potential. For example, in developing Canada’s 4th National Action Plan on Open Government, officials engaged over 10,000 Canadians, and released their anonymized input as open data. Cool, eh? That content can serve as training data for algorithms that analyze unstructured consultation data.
Explainability and replicability — To be legitimate, entities that apply algorithms need to be able to explain how they use them. They also need to be able to produce the same results consistently given the same inputs. In a world where so many valuable insights come from data that may have privacy implications, it’s useful to be able to show the viability of an algorithm using openly available data.
Transparency — We’ve all heard the stories of Heidi and Howard, and the implicit bias that we often bring to decision-making. When we build algorithms using open data, we make the bias in our legacy data more apparent. And that allows us to correct for it. On a societal basis, biased algorithms may be easier to fix than biased people. Easy, public access to the data feeding the algorithms makes for more eyes scrutinizing the data.
Legitimacy — Data is rife with privacy and security risks, and algorithms can expose and exacerbate those risks. But thanks to robust protocols and data standards, open data is systematically subjected to reviews that anonymize the data and aggregate it to a level that mitigates risks. So training algorithms on open data can obviate concerns around data protection.
Ease — Data is messy. Data scientists have lots of techniques to clean data and make it easier to analyze, but it’s always easiest to draw insights from data that already aligns with established data standards. Open data does that.

Technology is cool, but you’ve got to use it as opposed to letting it use you (Prince)

Our brave, new, technologically enabled world is rich with algorithms, and that’s a good thing. But for investments in digital change to be legitimate and sustainable, those algorithms need to align with ethical standards. Open data can help.

The case for open data is strong. It facilitates innovation. It contributes to great businesses and enables communities to hold government to account. In a world where technology is evolving and improving at an exhilarating pace, open data is a powerful enabler for ethical algorithms.

Let’s double down: it’s time to invest in open data, now and always.

Meme saying “Yeah so if you could just go ahead and make your data open that would be great” — Courtesy of @OA_Button

I was inspired to write this article from a panel at the Fourteenth Annual Meeting of the Internet Governance Forum, a United Nations sanctioned multi-stakeholder forum for policy dialogue on issues of Internet governance. The panel was entitled Human-centered Design and Open Data: How to Improve AI.
Many thanks to colleagues at the Brazilian Center for the Study of Web Technologies (Ceweb.br) for convening such an important conversation.