DEV Community

Junade Ali
Junade Ali

Posted on

Notes on Ethnic Minority Representation Data (and relevance outside the US)

I was recently speaking to a friend who was considering using their company's US headquarters ethnic minority data and categorisation for setting targets for their UK satellite offices in Manchester and London. I wanted to share some data on why I think using data from another region is a bad idea, in case anyone else is facing a similar dilemma or already doing so and not realising the negative impacts of doing so.

Would be very much interested in hearing your thoughts on this, and if you think I'm drawing the wrong conclusions here.

For those involved in Data Processing or Machine Learning, you may have heard the phrase "Garbage In, Garbage Out". Regardless of the strengths of a given algorithm; if you provide questionable data, you'll get questionable predictions. I think the same applies when using this approach for driving diversity when using Americentric diversity data.

Tech companies Diversity Reports usually include groups and data with similar categorisations to: "Asian", "Black", "Latinx", "Native American", Mixed Race" and "White". For example, refer to the below chart of Uber's diversity data from 2018:

Uber - Global Gender and US Race/Ethnicity

It is common practice for San Francisco based tech companies to only provide racial diversity data for the US and not their overseas offices. In some respects, similar forms of categorisation are used by some arms of the UK government. For example; the chart below of "Average Pay per Hour (£) by Ethnicity - 2017" is split between 5 total groups of ethnic minorities, but more Anglicised towards minority groups. It is worth noting here that the two minority groups occupy both the top (Indian and Chinese) and bottom rank (Pakistani/Bangladeshi) - would ordinarily be grouped together as "Asian" in Silicon Valley metrics.

Average Pay per Hour (£) by Ethnicity - 2017

A more detailed metric is GCSE attainment by race (a school qualification measuring attainment by the age of 16). In this grouping, "White Gypsy/Roma" and "White Irish Traveller" both occupy the bottom rank). This is followed by various "Black" groups, but Pakistani children underperforming the those from the categories of "Black" and "Black African". Chinese and Indian children have the highest attainment.

% British Pupils Achieving A* to C in English and Maths GCSE - 2017

Note that such metrics are not comparable to the US; according to the United States Census Bureau median household income for Pakistani-Americans is $62,848 whilst for Black or African American the same metric is at $38,555. Despite a somewhat similar language between the US and the UK, diversity data is incomparable for both regions.

To conclude; I'm drawn to believe that large and inaccurate ethnic groupings risks further undermining representation of those in particular groups.

With thanks to ethnicity-facts-figures.service.gov.uk for sourcing most data here.

Top comments (0)