Data inputs must be the focus of AI and its regulation

The world’s first ever law on artificial intelligence or AI looks set to be adopted by the European Union parliament this year. The AI Act¹ is the first comprehensive government legislation to oversee how the technology will be used. It takes a risk-based approach to prevent harmful outcomes and could have an impact beyond Europe. New research suggests a different approach should be taken.

The EU AI Act has grabbed the global spotlight because it will set the guardrails for fast-moving AI tools, which are becoming ubiquitous. The rules will also regulate foundation models or generative AI like ChatGPT created by OpenAI and Google’s Gemini. These AI systems are trained on large data sets, with the ability to learn from new data in order to perform various tasks.

The EU regulatory approach, which will become law in April, focuses on assessing the risks associated with each AI use case and their output. But this can be difficult to judge, especially for cases that are yet to be discovered. The EU AI Act will also mandate organisations to use high-quality input data, but how to implement this is not well understood.. A different method is needed to make sure organisations are compliant with the law.

“We haven’t seen much language about liability in this law. It’s an issue. AI is complex. It’s continually evolving. Regulation should focus on ensuring high-quality inputs to AI, not trying to anticipate all possible outputs. This can be more easily understood and controlled before AI is deployed,” explains Christian Peukert, Associate Professor of Digitisation, Innovation and Intellectual Property at HEC Lausanne.

Peukert’s research along with his colleagues² proposes that organisations take a closer look at data inputs for AI models to reduce compliance risks. In particular, it is helpful to think about who is responsible for low-quality data sources in order to allot liability.

The reasons why some datasets are incomplete are beyond the control of the provider’s or the AI users’ actions. Such exogenous reasons for low data quality can exist, for instance, because of privacy regulations. Other times data is of poor quality because someone has effectively altered the information in their favour. This can lead to issues with AI bias. Carefully allocating liability between the developer or the deployer of the AI can help incentivise all parties to work hard to achieve high data quality.

The other consideration is whether the AI is being installed once and is fixed in how it works, such as an AI system trained on vehicle number plate recognition, or is continuous in its deployment and learning. For instance, a chat bot trained on a large language model. On this basis Peukert’s team created a liability framework.

Liability framework focused on AI inputs and deployment models

	One shot deployment	Continuous deployment
Exogenous data issues	Liability of developer	Liability of developer
Endogenous data issues	Liability of deployer	Joint liability of developer and deployer

Allocating liability to developers and deployers in different situations based on data inputs allows organisations to codify responsibilities. This framework also specifies when developers of AI need to work together with deployers.

“If society wants safe AI we need to think seriously about the data inputs that go into these models. Let’s design a framework where the actors that build and deliver the system, are legally liable and let’s specify where that liability lies,” states Peukert.

“Regulators cannot anticipate all potential problems in the future concerning AI outputs. Therefore, focusing on data inputs and allocating liabilities makes sense. This can also provide the right incentives to improve data quality, reduce risk and regulate AI effectively.”

References: