AI Policy & Governance, Privacy & Data
HireVue “AI Explainability Statement” Mostly Fails to Explain What it Does
The increasing prevalence and prominence of automated employment decision tools in recruitment and hiring has led regulators and advocacy organizations to demand greater transparency and accountability from the vendors of such tools. CDT supports efforts to make AI systems more explainable; in the context of hiring assessment technologies, explainability means ensuring that workers are “meaningfully notified about how they will be assessed so they can seek redress under existing civil rights protections or request a reasonable accommodation.”
This spring, one of the most prominent vendors of AI-based assessment tools, HireVue, released what it billed as a “first of its kind” explainability statement “intended to provide information on how [HireVue’s] Artificial Intelligence (AI)-based assessments” work and how it makes employment decisions and recommendations. The statement provides a general overview of the types of AI assessments that HireVue uses, including some details about how HireVue developed, trained, and now monitors its assessments.
HireVue’s explainability statement offers a concrete opportunity to assess what works well and what doesn’t in providing transparency about the use of AI in hiring tools. While its statement sheds some useful light on how HireVue’s technology works, it is also incomplete in important respects. Moreover, the information it does provide suggests crucial deficiencies in the fairness and job-relatedness of HireVue’s approach to assessments.
Overview
HireVue sells AI-based assessment tools that employers can use when making hiring/employment decisions. HireVue claims that its AI assessment technologies can be used to assess more than 20 different “competencies” relevant to different jobs, such as adaptability, teamwork, problem-solving skills, communication, and “drive for results.” HireVue uses two basic types of AI assessments to assess these competencies:
- Video interview assessments that use AI to both transcribe and analyze the content of candidates’ recorded responses to preselected questions; and
- Game-based assessments in which candidates play a “series of short online games” and HireVue uses AI to analyze their performance.
To evaluate a candidate for a particular job using HireVue’s technology, an employer chooses from the competencies HireVue claims to measure to build a “competency model” for that job. HireVue evaluates the candidate against that competency model using a series of video interviews and game-based questions. Per the explainability statement, “[a] typical assessment will consist of 3-6 video interview questions … and 2-3 game questions.”
The statement provides information – albeit in sometimes paltry levels of detail – regarding how these AI assessments were constructed, how they are tested for job-relatedness and discriminatory impact, and how they are monitored and updated over time. Below are some of the notable takeaways (both positive and negative) from the explainability statement.
The Good
Informative description of the development and structure of HireVue’s video interview assessments
The explainability statement provided a reasonable level of detail on how HireVue developed its video interview assessments. The document:
- Names the speech-to-text transcription system (Rev.ai) that HireVue uses to convert candidate’s spoken responses into text;
- Provides a fair amount of detail regarding the natural language processing (NLP) model that HireVue built to analyze the meaning of candidates’ responses; and
- Describes in general terms how HireVue trained its evaluation model by using “expert interviewers” to score candidates’ responses during the development phase.
This overview strikes a good balance between concision and detail – it provides enough information to give a fair understanding of how HireVue developed and built the key components of its video interview assessments, but not so much that regulators, employers, and other educated laypeople would be overwhelmed in trying to understand the assessments’ development process or basic architecture.
That, in turn, allows for a well-informed critique of both the positive and negative aspects of how the development process may have affected the fairness and validity of the assessment technology. Enabling such analysis and criticism is one key goal of explainability in the hiring technology space. The document does well in that regard with respect to the video interview (as opposed to its game-based) assessments.
Video interview assessments were tested on a sample of workers that was representative in terms of race and gender
The explainability statement also provided demographic information regarding the population of workers whose data HireVue used to train its video interview assessments. [1] According to the tables HireVue provided, the sample included a representative sample of women (52%) compared to the labor force at large, and actually slightly oversampled black (17%) and Hispanic (33%) workers. Such representative sampling helps reduce one avenue through which systemic biases are reinforced by ensuring that the training data includes sufficient examples of historically underrepresented groups.
Reasonable overall approach to ongoing monitoring and review
Finally, HireVue’s overall approach to monitoring and reviewing its assessments after deployment makes sense. The Civil Rights Principles for Hiring Assessment Technologies, of which CDT is a signatory, calls for organizations to “engage in rigorous self-testing of their own hiring assessment technologies before and after deployment” and “continually audit[]” the technologies for disparities once deployed. HireVue’s explainability statement states that after employers begin using its assessments, HireVue (1) monitors them continuously and is alerted “if a particular metric goes ‘out of bounds’” (although the statement does not provide examples of such metrics or alerts) and (2) does deeper checks for performance and adverse impacts on a regular (typically annual) basis. At a high level, this combination of dynamic/real-time monitoring plus more thorough reviews at standardized intervals is a good approach that generally tracks with the Civil Rights Principles.
The Bad: Lack of Explanation
Lack of explanation regarding game-based assessments
The explainability statement provides virtually no explanation of how HireVue devised, operated, or tested its game-based assessments. What little information the document does provide on the games is confusing – in the first mention of game-based assessments, the document states that the games measure both “cognitive” and “non-cognitive abilities,” but the description of the games’ design states only that it measures “cognitive ability.” It is not at all clear from the document how the games were validated or whether and how they were tested for issues relating to accessibility or bias.
HireVue kicks disclosure responsibility to employers under EU/UK data regulations
HireVue claims that because employers make the ultimate hiring decision, they are responsible for providing explanations to candidates regarding hiring decisions involving the use of HireVue’s tools under prevailing UK and EU data regulations. The problem is that, as described further below, HireVue does not tailor its products to individual employers or provide employers with enough information to provide an adequate explanation as to how HireVue’s assessments work. The various explanatory products that HireVue offers (samples of which are included in the explainability statement) are decidedly generic, with no explanation of how different competencies were measured. By disclaiming disclosure responsibilities, HireVue essentially ensures that candidates in Europe remain in the dark when its assessments influence employment decisions.
HireVue’s approach to “competency” assessment is unclear
HireVue provides a similarly incomplete explanation of the competencies that its interview and game-based assessments test – and the information it does provide is troubling. While the explainability statement mentions or alludes to several of the competencies its assessments supposedly measure, HireVue does not actually provide a list of all such competencies. Failing to list the competencies deprives workers and their advocates of the information needed to determine what sorts of candidates, particularly those with disabilities, might be disadvantaged by HireVue’s assessment and/or require accommodation to demonstrate the competencies that it measures.
The Bad: Concerning Disclosures
Some disclosed “competencies” that HireVue assesses pose a risk of discrimination to disabled workers
The few competencies the statement mentions may actually lead to discrimination. Among the broad competencies the statement does mention are “interpersonal skills,” “empathy,” “influence,” and “personality traits.” As discussed in CDT’s December 2020 report on disability discrimination in automated hiring tools, assessments that measure personality traits may screen out candidates with depression or anxiety, and game-based assessments may disadvantage workers with a wide variety of disabilities, including ADHD, autism, and visual impairments.
While the explainability document describes efforts to detect and minimize adverse impacts based on race and sex, it does not indicate that HireVue paid comparable attention to how its assessments could unfairly disadvantage disabled workers. The “accessibility” section of the statement does not suggest that it took any steps to actually design its assessments in a manner that ensures they can measure the competencies of disabled workers. Instead, HireVue merely provides a general sense of what the tests involve and leaves it to applicants and employers to determine whether a particular candidate requires accommodations.
The only HireVue accommodation mentioned in the explainability statement is that candidates can request additional time to complete assessments. Given that HireVue delivers its assessments through its integrated “end-to-end hiring platform,” it is not clear if or how employers could offer any other accommodations to disabled workers, aside from having them assessed by completely different means.
HireVue assesses only generic candidate qualities, and makes no effort to alter or supplement its assessments to match the essential functions of any actual jobs
The “competencies” that HireVue claims to measure through its assessments are not moored to the actual responsibilities and functions of specific jobs, and HireVue does not allow employers to incorporate more job-specific content into its assessments. All the examples of competencies that HireVue purports to assess – such as “empathy,” “influence,” “personality,” “attention,” “communication,” and “problem-solving” – are highly abstract qualities, not specific knowledge, skills, abilities, or other characteristics that are tailored to particular jobs. HireVue links these competencies to somewhat more specific “behaviors” – for example, an appendix showing the rating skill for the “Communication” competency includes “Shares Information” and “Engages Others” as key behaviors associated with that competency. But even these behaviors are highly generic, not accounting for the ways different behaviors and skills manifest themselves in different jobs, settings, or sectors.
HireVue’s claim is that it can build – in its own words – a “single comprehensive assessment of each candidate” or a test of “overall job aptitude” simply by asking a small number of questions that supposedly shed light on a comparably small number of abstract candidate qualities. In reality, no job’s essential functions can be reduced to a few items from an assortment of generic competencies.
Moreover, competencies themselves manifest differently in different jobs – and even different workers. Problem-solving in engineering is quite different from problem-solving in social work; some people may convey empathy through the substance of their words, while others rely more on body language and tone of voice. While HireVue allows employers to choose competencies and adjust how they are weighted relative to each other, it is not clear whether and how it takes such variation into account. HireVue’s explainability statement emphasizes that its competency models are trained solely on HireVue’s own data drawn from a series of “Rater Studies” it conducted using interview transcripts.
The explainability statement provides no information regarding what sorts of industries and positions the interview transcripts came from, but the description of the Rater Studies states that more than 30,000 video interviews were aggregated together, suggesting that the resulting models are not tailored to specific employers, occupations, or industries. This generic training data, combined with the similarly generic competencies, means that HireVue’s assessments measure candidates against an abstraction of an abstraction. That approach cannot, contrary to HireVue’s claims, provide employers with even a glimpse of a candidate’s “overall job aptitude” for any actual, specific job.
Conclusion
HireVue’s explainability statement, while showing some promise, is ultimately disappointing, particularly because transparency is such a basic and fundamental aspect of developing artificial intelligence systems. It’s clear that policy action is needed so that workers, enforcement agencies, and other stakeholders can access the information needed to understand the tools that make or influence employment decisions and hold vendors and employers alike accountable for ineffective, inaccessible, or discriminatory tools.
[1] It is not clear from the explainability statement whether the same worker population was used to develop/test its game-based assessments.