After completing the Google Analytics certificate course, I seized the opportunity to conduct a data analytics case study on open source contribution. Despite my background as a tech generalist, I had always been fascinated by the world of open-source contribution. In the past, I had hesitated to pursue it, I was basically making flimsy excuses for why I couldn't get started, majorly because I didn't know who to get started. However, this year I took the plunge and embarked on this journey. So far, it has been an enriching experience for me, as I have had the chance to work and communicate with people from diverse demographics.
In this article, I will be talking about how I conducted a data analytics case study on the open source demographics and the observations I made during the analysis.
The datasets
Obtaining the necessary datasets was a challenge, but after several hours of web scraping and research, I finally found the relevant data. For this case study, I will be using datasets obtained from open source surveys conducted by the following sources:
- Github
- The Linux foundation
Github dataset
In 2017, GitHub collected demographic data by inviting individuals who used open source repositories on their platform to participate in a survey through a dialog box. Furthermore, smaller groups from other open source communities were contacted through various methods, including mailing lists, to participate in the survey. The objective of collecting this data was to track the following:
- Behaviors and preferences around consumption and contribution to open source.
- Attitudes and practices around privacy in online spaces when participating in open source projects.
- Seeking and providing technical help.
- Negative experiences encountered in the open source community, and the resulting consequences.
- Employer policies regarding the use and contribution to open source.
- The demographics of open source participants and their background in technology
Linux foundation dataset
Hilary Carter, a researcher at the Linux Foundation, collected this dataset in 2021. The survey was designed to investigate diversity, equity, and inclusion (DEI) within open source communities. The aim was to gain a better understanding of the demographics of contributors and the dynamics of participation. The objective was to identify areas of improvement in open source communities specifically, in order to promote inclusive cultures in these environments. By identifying sentiments and gaps, this survey can help advance DEI efforts within open source communities
Technologies used
To clean and transform the data, I utilized Microsoft Excel. Additionally, I used Power BI for data visualization.
Insights
Despite the goal of open source communities to provide equal opportunities to all contributors, a notable gap exists across all demographics, particularly in terms of gender, sexual orientation, and race. This discrepancy begs the question of whether open source is genuinely open to all individuals, or if certain demographics face more significant obstacles or lack interest in contributing to open source projects.
Obstacles faced
According to the data, a majority of respondents, 74.63%, reported no difficulties participating in open source projects. However, it is crucial to further look into the percentage of those who reported facing challenges.
According to the data, 53.44% of respondents reported frequently receiving no response when querying about open source projects while only 8.98% did not get ignored.
Up next, we will examine which specific demographics have influenced the open source community.
Age
According to the data visualization above, the highest age group who are contributing to open source are between 35-54 years. It certainly looks like people who have more stability in their career are more likely to contribute to open source projects.
Employment
Let's take a look at the job types of the respondents.
People who are employed full time or have a part-time job may have more opportunities and resources to contribute to open source projects, as they may have more financial stability and access to technology and time. However, it is important to note that open source contributions are not limited to those who are employed, in fact open source contribution has been known to provide a source of employment for people.
Just 13.21% of respondents reported receiving payment for their open source contributions. Of all the people being payed, 39.16% identify as male, 37.07% identify as female while 23.57% identify as non-binary.
Gender
It is important to have gender diversity in open source communities and to create an environment that is inclusive of all gender identities. By having a diverse group of contributors, there can be a wider range of perspectives, experiences, and ideas that can lead to more innovative solutions.
Unfortunately, the open source community has historically been male-dominated, and there have been barriers to entry for women and other underrepresented gender identities. It is important to address these barriers and actively work towards creating a more inclusive environment for everyone.
Of the total number of respondents in the survey, 51.55% identified as male, 24% as female, and 21.18% as non-binary, while other gender identities comprised only 2% of the population. It's important to note that other genders require greater representation.
Efforts to promote gender diversity in open source communities can include creating mentorship programs, offering scholarships or grants for underrepresented groups to attend conferences and events, and implementing codes of conduct to ensure respectful and inclusive behavior. It is also important to highlight and celebrate the contributions of women and other underrepresented gender identities in the community.
Education
Education has always been a significant factor in determining who contributes to open source projects. The survey found that 38% of the respondents had a master's degree or higher, while 23% had a bachelors degree and 16.4% went to college to learn a vocation.
Higher education levels may provide individuals with the technical skills and knowledge necessary to contribute effectively to open source projects. Additionally, formal education can provide individuals with access to mentorship, networking opportunities, and resources that can help them navigate the open source community.
However, it is important to note that formal education is not the only pathway to contributing to open source projects. Many individuals have developed their skills and knowledge through self-study, online courses, and other forms of informal education. There are also resources and initiatives available to support individuals who may not have access to formal education, such as mentorship programs, online communities, and hackathons.
In summary, while education can be a significant factor in determining who contributes to open source projects, it is not the only pathway, and there are resources and initiatives available to support individuals with diverse educational backgrounds.
Diversity
Regrettably, there is a significant lack of diversity in the open source community, with white individuals comprising over half of the population. Only 9% of respondents identified as Black or African American, 9% identified as Asian, and only 7.67% identified as Hispanic or Latino. These statistics underscore the importance of cultivating an inclusive environment that welcomes contributions from individuals of all backgrounds.
Regarding sexual orientation, 41.16% of respondents identified as heterosexual or straight, 12.76% identified as queer, and other sexual orientations comprised the remaining percentage.
It is important to recognize and respect the diverse sexual orientations represented in the open source community, as this can lead to a more inclusive and welcoming environment. The fact that 12.76% of respondents identified as queer highlights the importance of creating a safe space for LGBTQ+ individuals in the open source community.
To support individuals with diverse sexual orientations, open source communities can implement codes of conduct that prohibit discrimination based on sexual orientation or gender identity. They can also promote diversity and inclusivity by featuring LGBTQ+ speakers at conferences and events, and by providing resources and support for individuals who may be facing discrimination or harassment.
Ultimately, creating an open and accepting environment for individuals of all sexual orientations can lead to a more vibrant and innovative open source community.
Conflicts
Building a community of individuals from diverse backgrounds inevitably leads to the occasional conflict. Therefore, I conducted a more in-depth analysis of the issues surrounding conflicts between different demographics and examined which groups are most likely to be affected.
According to the data, only 8% of respondents frequently witnessed or encountered conflicts, while 28.3% reported occasionally witnessing conflicts.
Interestingly, a higher proportion of asexual and bisexual respondents reported experiencing frequent threats compared to heterosexual respondents. However, it is crucial to note that the total number of heterosexual respondents in the survey was significantly higher than that of other groups.
Inappropriate behavior
Data shows that 47.89% of women and 42.6% of non-binary individuals reported experiencing inappropriate sexual advances from other contributors. In contrast, only 5.02% of men reported encountering such behavior.
The survey also included individuals who reported never experiencing inappropriate behavior on open source platforms. Of this population, 58.36% were men, 21.55% were women, and 18.17% identified as non-binary.
This data highlights the need to create a safe space for women and non-binary individuals in open source communities, as they are more likely to encounter inappropriate behavior while contributing to open source.
How do people react to conflicts and inappropriate behavior?
Let's look at how the data answers this question.
According to the survey results of people who spoke up when they witnessed inappropriate behavior, 45.56% of them were male, 26.18 were non-binary and 24.75% of them were women. It is encouraging to see that some members of the community are willing to stand up against inappropriate behavior, but it is also clear from the data below that more needs to be done to encourage more people to speak up and create a safe space for everyone.
While maintaining a professional and safe space for all individuals is a responsibility that falls on the maintainers and sponsors of open source projects, it is equally important for community members to take an active role in identifying and reporting individuals who may be contributing to conflicts and inappropriate behavior. By reporting such behavior to the necessary authorities, members can help to ensure that appropriate measures are taken to safeguard the community and create a more inclusive environment.
Conclusion
The Linux Foundation and GitHub survey provides valuable insight into the demographics of open source communities and contributions. The data shows that open source communities attract individuals from various backgrounds and career paths. Still, there is a significant gender and diversity gap, and more needs to be done to create an inclusive environment for all.
Furthermore, the survey highlights the importance of education, employment, and collaboration in open-source contributions. Open source projects attract contributors from all over the world, making it a global phenomenon.
Overall, the survey confirms what many have always known: open source is vital to the growth of digital technology, and the community needs to work towards creating an inclusive environment for all individuals.
Links to datasets
Linux foundation
Github survey repository
Github survey dataset
Top comments (0)