Skip to Content

AI Policy & Governance, CDT AI Governance Lab

Assessing AI: Surveying the Spectrum of Approaches to Understanding and Auditing AI Systems

With contributions from Chinmay Deshpande, Ruchika Joshi, Evani Radiya-Dixit, Amy Winecoff, and Kevin Bankston

Graphic for CDT AI Gov Lab's report, "Assessing AI: Surveying the Spectrum of Approaches to Understanding and Auditing AI Systems." Illustration of a collection of AI "tools" and "toolbox" – a hammer and red toolbox – and a stack of checklists with a pencil.
Graphic for CDT AI Gov Lab’s report, “Assessing AI: Surveying the Spectrum of Approaches to Understanding and Auditing AI Systems.” Illustration of a collection of AI “tools” and “toolbox” – a hammer and red toolbox – and a stack of checklists with a pencil.

What do we mean when we talk about “assessing” AI systems?

The importance of a strong ecosystem of AI risk management and accountability has only increased in recent years, yet critical concepts like auditing, impact assessment, red-teaming, evaluation, and assurance are often used interchangeably — and risk losing their meaning without a stronger understanding of the specific goals that drive the underlying accountability exercise. Articulating and mapping the goals of various AI assessment approaches against policy proposals and practitioner actions can be helpful in tuning accountability practices to best suit their desired aims. 

That is the purpose of this Center for Democracy & Technology report: to map the spectrum of AI assessment approaches, from narrowest to broadest and from least to most independent, to identify which approaches best serve which goals.

Executive Summary

Goals of AI assessment and evaluation generally fall under the following categories:

  • Inform: practices that can facilitate an understanding of a system’s characteristics and risks
  • Evaluate: practices that involve assessing the adequacy of a system, safeguards or practices
  • Communicate: practices that help make systems and their impacts legible to relevant stakeholders
  • Change: practices that support incentivizing changes in actor behavior

Understanding the scope of inquiry, or the breadth or specificity of questions posed by an assessment or evaluation, can be particularly useful in determining whether that activity is likely to surface the most relevant impacts and motivate the desired actions. Scope of inquiry exists on a spectrum, but for ease of comprehension the following breakdown can be a useful mental model to understand different approaches and their respective theories of change:

  • Exploratory: Broad exploration of possible harms and impacts of a system, generally informed but unbounded by a set of known risks. 
  • Structured: Consideration of a set of harms and impacts within a defined taxonomy. 
  • Focused: Evaluation of a specific harm or impact or assessment against a procedural requirement. 
  • Specific: Analysis of a specific harm or impact using a defined benchmark, metric, or requirement.

Meanwhile, recognizing the degree of independence of particular assessment or evaluation efforts — for instance, whether developer or deployer of the system in question has control over the systems that will be included in a given inquiry, what questions may be asked about them, and whether and to what extent findings are disclosed — is important to understanding the degree of assurance such an effort is likely to confer. 

  • Low Independence: Direct and privileged access to an organization or the technical systems it builds 
  • Medium Independence: Verification of system characteristics or business practices by a credible actor who is reasonably disinterested in the results of their assessment 
  • High Independence: Impartial efforts to probe and validate the claims of systems and organizations — without constraint on the scope of inquiry or characterization of their findings

Assessment and evaluation efforts can shift up and down each of these two axes somewhat independently: a low-specificity effort can be conducted in a high-independence manner, while a highly specific inquiry may be at the lowest level of independence and still lead to useful and actionable insights. Ultimately though, the ability of different efforts in driving desired outcomes relates to where they sit on this matrix.

Recommendations

  • Evaluation and assessment efforts should be scoped to best support a defined set of goals. Practitioners and policymakers should be particularly attentive to whether the independence and/or specificity of their intended assessment and evaluation activities are well-matched to the goals they have for those efforts. 
  • Stakeholders involved in evaluation and assessment efforts should be transparent and clear about their goals, methods, and resulting recommendations or actions. Auditors and assessors should clearly disclose the methods they have employed, any assumptions that shaped their work, and what version of a system was scrutinized. 
  • Accountability efforts should include as broad an array of participants and methods as feasible, with sufficient resources to ensure they are conducted robustly. AI assessment and evaluation activities must include a pluralistic set of approaches that are not constrained to practitioners with technical expertise but rather encompass a sociotechnical lens, (i.e., considering how AI systems might interact in unexpected ways with one another, with people, with other social or technical processes, and within their particular context of deployment).

Ultimately, no one set of accountability actors, single scope of assessment, or particular degree of auditor independence can accomplish all of the goals that stakeholders have for AI assessment and evaluation activities. Instead, a constellation of efforts — from research, to assurance, to harm mitigation, to enforcement — will be needed to effectively surface and motivate attention to consequential impacts and harms on people and society. 

Read the full report.