From Our Fellows: A Perspective on Query Recommendation in Search Engines
By Sucheta Soundarajan, Associate Professor, Syracuse University, and CDT Non-Resident Fellow
Disclaimer: The views expressed by CDT’s Non-Resident Fellows and any coauthors are their own and do not necessarily reflect the policy, position, or views of CDT.
Online search engines have become important tools for individuals seeking information. However, it has been known for several years that results (or the ordering of results) returned by these search engines may exhibit socially harmful forms of bias: for example, in a variation on a classic example given by Bolukbasi et al., a query for “computer science student” may produce disproportionately more search results corresponding to men than women, or may rank search results corresponding to men higher than those corresponding to women. This sort of systemic bias can stem from a number of sources, including underlying bias in the data used to generate these results.
Modern search engines use so-called word embeddings to mathematically represent words and phrases. A word embedding is, effectively, a numerical representation of a word or phrase, and is learned by observing which words tend to appear in close proximity in search results. Pairs of words that commonly appear close to one another in search results like web pages or articles are near one another in the word embedding space. When a document search is performed on some query (word or phrase), a document ranks higher if it has words from the query or if it has words that are close to the query words in the embedding space. For example, synonyms like “autumn” and “fall”, or related words like “brother” and “sister,” often appear in similar contexts, and so will be near each other in the embedding space. Because a greater proportion of computer science students and practitioners are male than are female, male-related words (such as `him,’ `his,’ etc.) more frequently appear near computer science-related words (such as `computer,’ `technology’, etc.) than do female-related words (such as `her,’ `hers,’ etc.). Those male-related words are closer to the computer science-related words in the embedding space, and so search results with male-related words will score more highly with respect to a computer science query than will search results with female-related words. Such bias has potentially major societal implications, particularly in areas like hiring, as existing prejudices are then reinforced.
This problem can be addressed in different ways. One method debiases the embedding itself. In this approach, specific biases (e.g., gender or racial biases) are directly and automatically addressed: for instance, gender-related words might be shifted so that they are equidistant in the embedding space from profession-related words. Another method re-ranks search results with respect to some fairness criterion: for example, one might require that an equal proportion in the top 100 results be male and female-coded.
In our work, we instead consider the problem of balanced query recommendation, in which an algorithm suggests less or oppositely-biased alternatives to a query. As a hypothetical example, note that the terms `secretary’ and `administrative assistant’ are often used interchangeably. However, because of sexist connotations, men may be unlikely to use the term `secretary’ to refer to themselves; in contrast, the term `administrative assistant’ may be more likely to return less gender-biased results. Our approach was originally motivated by conversations with an academic administration recruiter who recounted her experiences with searching for job candidates online: when searching for individuals with a particular qualification, she noticed that the returned results were primarily white men. Deeper investigation suggested that women and non-white candidates tended to use different keywords to reflect the same type of qualifications. In such cases, a recruiter searching for one term may wish to know of similar but less biased keywords. Additionally, job candidates selecting keywords for their resumes may wish to know whether their choice of keyword is encoding some sort of bias.
Our work presents BalancedQR, an approach for recommending balanced query keywords. BalancedQR works on top of an existing search algorithm. It uses word embedding to identify terms related to the original query and then measures the bias and relevance of those identified terms. Bias can be computed in whatever method is appropriate for the context: for example, if searching for candidate profiles on a hiring website, one could examine the fraction of male and female profiles that are returned; if searching for news articles, one could use external annotations of platform bias.
Initial tests on data using data from Reddit and Twitter produced interesting results: for instance, terms such as `longing’ and `sorrow’ were more likely to be found in posts on /r/AskMen, while /r/AskWomen posts were more likely to use `grief’ and `sadness.’ In this experiment, bias was measured by examining which subreddit a particular post/comment came from. A user searching for, e.g., `grief’ on Reddit would disproportionately receive posts on /r/AskWomen, while similar concepts (‘longing’ and ‘sorrow’) would return more posts from /r/AskMen. In this example, BalancedQR would recommend `loneliness,’ which would produce results with high relevance but very little bias, as results were returned more equally across both subreddits. Similar results were seen for political bias on political subreddits when searching for the term `rioting’ vs. `protests’ (the former of which was disproportionately represented on r/Republicans, and the latter of which is BalancedQR’s recommendation, which produces high-relevance, low-bias results).
There are a number of important use cases for BalancedQR. For instance, BalancedQR could be implemented as a browser plug-in or as part of a search engine’s recommended queries, and potentially reduce echo chambers and information segregation. In the context of hiring, it could be used to reduce gender bias at the search stage. In our future work, we look forward to conducting user studies to observe how recommendations produced by BalancedQR (or other alternatives) are used in practice. Additional work on developing automated bias metrics—particularly those that measure bias across multiple intersectional dimensions—would also be of practical significance to the implementation of BalancedQR.