Share good ideas and conversation.   Login, Join Us, or Take a Tour!
comment by rene
rene  ·  549 days ago  ·  link  ·    ·  parent  ·  post: Machine learning algorithms exhibit racial and gender biases, research reveals

Link to Paper (ScienceMag Paywall):

Author's Homepage: Anthony G. Greenwald, PhD

Algorithm used for analysis: GloVe - Global Vectors for Word Representation

    GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

    GloVe is essentially a log-bilinear model with a weighted least-squares objective. The main intuition underlying the model is the simple observation that ratios of word-word co-occurrence probabilities have the potential for encoding some form of meaning . . . The training objective of GloVe is to learn word vectors such that their dot product equals the logarithm of the words' probability of co-occurrence. Owing to the fact that the logarithm of a ratio equals the difference of logarithms, this objective associates (the logarithm of) ratios of co-occurrence probabilities with vector differences in the word vector space. Because these ratios can encode some form of meaning, this information gets encoded as vector differences as well. For this reason, the resulting word vectors perform very well on word analogy tasks, such as those examined in the word2vec package.

Perhaps more interestingly, Extended Reading:

Implicit Bias: How Should Psychological Science Inform the Law?

Statistically Small Effects of the Implicit Association Test Can Have Societally Large Effects

    OMBJT characterized their average correlation finding for IAT measures (which they estimated as r .148, in the domain of

    intergroup behavior) as indicating that the IAT was a “poor” predictor (pp. 171, 182, 183). This section’s analysis reaches a very different

    conclusion by applying well-established statistical reasoning to understand the societal consequences of small-to-moderate correlational

    effect sizes. The first step of this analysis shows that OMBJT’s and GPUB’s meta-analytic findings had very similar implications for the

    average percentage of criterion-measure variance explained by IAT measures. The second step explains how statistically small effects can

    have societally important effects under two conditions—if they apply to many people or if they apply repeatedly to the same person. In

    combination, the two steps of this analysis indicate how conventionally small (and even subsmall) effect sizes can have substantial

    societal significance . . .

    Small effect sizes comprise significant discrimination. For most of the time since the passage of the United States’ civil rights

    laws in the 1960s, U.S. courts have used a statistical criterion of discrimination that translates to correlational effect sizes that are

    often smaller than r .10. This criterion is the “four-fifths rule,” which tests whether a protected class (identified by race, color,

    religion, national origin, gender, or disability status) has been treated in discriminatory fashion. A protected class’s members

    receiving some favorable outcome less than 80% as often as a comparison class can be treated by courts as indicating an “adverse

    impact” that merits consideration as illegal discrimination (U.S. Equal Employment Opportunity Commission, 1978, §1607.4.D).