Computer science (CS) struggles with a lack of diversity and inclusion, despite efforts to broaden participation. While online CS courses have expanded global access, women face othering, lower retention, and have reported feeling less motivated and capable technologically.
Research has also shown that software can inadvertently introduce inclusivity bugs that disadvantage certain user groups. What's more, research on diversity and inclusion in computer science education has mostly ignored the online course pages, which is the main way these courses are delivered. How do the online course pages play a role in the inclusivity in CS-ed? Luckily, there are techniques to find, fix and avoid these. The catch? These methods tend to be cost-intensive as they rely heavily on manual labor.
To address this issue, using design science research:
Investigated how and where online courses create barriers for diverse students
Designed and developed AID - a tool that leverages AI and natural language processing to automatically detect gender-inclusivity bugs in a scalable manner
🔮 New product & collaboration: This was the first product that could support faculty in creating inclusive courses, and paved the way for the application of AID in the context of online education at Oregon State University.
🪙 Secured Funding: This work secured $36,000 in research funding for the team, in addition to funding my graduate studies for 9 months.
🏅Publications: This work was published at ICSE 2021 and ICER 2022, which had 22% and 16% acceptance rates.
🧑💻 Outreach: Communicated the findings with over 100+ individuals, including faculty from Oregon State University, Kean University during a workshop and a guest lecture at Universidad Carlos III de Madrid, Spain
Figma
Qualtrics
Github
Python
Product Owner
UX Researcher
Developer
1 Product Owner
1 UX Researcher
3 Developers
2 Undergraduate students
4 Academic Researchers
Formative Phase: Discovery of the problem
Research Plan
Cognitive Walkthrough
Affinity Diagramming
Triangulation with student data
Results
I conducted stakeholder meetings to discuss and create a research overview so our stakeholders can understand how we plan to implement this study into a tangible outcome. We outlined UX Research objectives, step-by-step process, and a breakdown on the research methodology.
Decision Point: What method would we use to find inclusivity bugs?
We decided on GenderMag method. Why?
Because of its wide use across various domains to identify inclusivity issues: Digital libraries, Machine learning interfaces, Robotics, Search engines, etc. and it's high accuracy, 95% of the bugs it detects arises in real-world scenarios with end users
Snapshot of the persona used
During the ideation phase of the project, we recruited experienced online CS faculty at Oregon State University and then helped them do gender-inclusivity evaluations of their own course, using GenderMag, a specialized cognitive walkthrough. They evaluated 5 use-cases from the perspective of the GenderMag personas.
Decision point: How did we decide on the use-cases?
The participants (CS faculty) chose the use-cases based on problems they thought their own students might encounter or problems they had already seen among their students.
After the cognitive walkthrough sessions, we gathered the session materials including the use-cases, the personas used, completed GenderMag forms and observation notes.
We used affinity diagramming to group the ways in which bugs could hinder a student's completion of the use-case - unveiling 6 categories of inclusivity bugs.
We assigned a different colour for each kind of "inclusivity bug"
So far, we had learnt about potential inclusivity issues from the faculty's perspectives across a limited set of use cases. We needed a deeper insight into the actual lived experiences of students.
What specific challenges are they facing on the ground?
We collected student discussions of online CS courses posted on the Ed discussion board on Canvas,a Learning Management System (LMS).
💡 The Ed discussion board serves as a platform for students to discuss their learning, ask and answer questions, share challenges, give opinions, and post comments on discussions posted by other participating students.
🧑🎓 The instructors obtained consent from students to use their posts by emailing them, resulting in 126 posts. All data was collected during the 2022-2023 academic year.
🔍 We inductively open-coded these posts and grouped the findings into categories, until reaching team consensus.
We found 6 categories of inclusivity bugs from both the cognitive walkthrough and student posts
Throughout this process, we heard some key things from the faculty participants:
⚠️ They need a lower-cost way to accomplish this work
⚠️ They wanted to find bugs in their course without working in a team
⚠️ Learning to do a cognitive walkthrough was also a painpoint
Survey
Building the tool
Contextual Inquiry
Thematic Analysis
Results
Before we attempted to address this need, it was necessary to determine whether such needs are currently unmet by development tools and processes out there.
We turned to the Github platform, a platform frequently used in STEM education and has been shown to have inclusivity bugs in past research. This made it a good starting point for this part of our investigation.
We developed a survey with 8 questions, comprising Yes/No, multiple choice, and open-ended questions - using Qualtrics.
We asked about existing inclusivity evaluation practices. If the participant indicated that their development practices include doing some form of inclusivity evaluation, we asked more about these; otherwise we asked what challenges prevent them from doing so. In 2 weeks, we received 266 responses.
We learned a lot from this survey - 46% of the reasons survey participants gave were all potential fodder for software engineering tools.
(Left): No-respondents' reasons/challenges for not using inclusivity, as a percentage of all challenges.
(Right): Yes-respondents' uses of inclusivity techniques.
[Respondent-130] “...there are no such guidelines or tools ... I hope that just as we have guidelines for code quality we [could] also have inclusivity principles...Lack of knowledge and tools makes developers reinventing the wheels all the time.”
User insight from the survey
We took our first step by automating parts of GenderMag & following a decision rule approach
We analyzed the GenderMag session data - each inclusivity bug, the "whys" behind it, as well as the user interface.
Through inductive reasoning and multiple rounds of negotiated agreement, we abstracted them into a final set of 6 decision rules.
High level instructions should be followed by step-by-step instructions
Tasks should have clear purpose
Assignments should mention at least 1 prior learning
... This is an incomplete set
We implemented these rules using natural language processing methods, leveraging large language models (LLM).
We used Meta’s Llama2 13B chat model, which is open-source.
AID had an accuracy of 74%.
This figure shows the architecture of AID and portions of the graphical user interface we created
Evaluative Phase
How would CS faculty use AID?
Snapshot of interview questions
We conducted a contextual inquiry to understand how online faculty use the Automated Inclusivity Detector (AID) tool to evaluate their course for inclusivity issues.
We recruited 7 faculty and each faculty member dedicated an hour to the study, connecting via Zoom to share their screen while maintaining privacy by keeping their camera off. They used their own equipment and course so we could observe them in their normal work environment (they all worked from home). Participants followed a think-aloud process - which allowed us to directly observe their workflows, challenges, and decision-making processes.
After the evaluation, we conducted semi-structured interviews to gather insights into participants' existing inclusivity practices, their experience using AID, and feedback for improving the tool.
By combining think-aloud protocols and interviews in the participants' actual work contexts, we gained a comprehensive understanding of their needs and pain points related to inclusive course evaluation.
Two researchers transcribed each participant's think-aloud audio recordings and interview sessions. Then, we performed inductive thematic analysis on the transcripts and study notes through multiple rounds of coding.
The analysis began with open coding, where the researchers read the transcripts repeatedly to become immersed in the data. We labeled relevant phrases and sentences using a constant comparative approach, capturing participants' actions while using the tool and their reflections on course designs. As new themes emerged, we revisited previously coded transcripts to ensure consistency.
Next, the researchers performed axial coding, comparing and contrasting the initial themes to identify connections and eliminate redundancies. We conducted this iterative process by merging and refining the themes until deriving a final set of coherent and distinct themes.
💡All 7 faculty found at least 1 inclusivity bug in their course, but their approaches differed:
Some began by thinking through the rules while others were more output-driven and reflected on the results later
😯 They were surprised by some of bugs AID pointed out. It led them to realize that a portion of their course that they had thought to be unproblematic actually did have inclusivity bugs:
[P5] "...having a fresh eye or a fresh automated tool to say, Hey, you think you may have said this but you never did. I think is very useful.''
[P6 interview] "...if I hadn't looked and you said, do you think it's easy for the students to find a schedule, then yeah absolutely. But now I'm not so sure...that was definitely enlightening''
🤨 Some disagreed with the tool’s output
[P8, while going through the evaluation report]" [AID] would have expected to see [the word] `tools' on the homepage, but I'm not sure I necessarily would agree...''
🤔🙂↕️ But even in those cases, they were still led to thinking about course improvements!
[P8, while going through the evaluation report] "I'm not sure I would want to put `tools' here [homepage], but I do think that it would make sense to better include that information on [different page in the course] because [installing the tools] is a key part [for the course].''
💡 Most of them anticipated using AID early while (re-)designing a course artifact & frequent checking thereafter
For those who encountered bugs that were beyond their control (due to the learning management system or university guidelines), said
💡 AID's automated reports could be used as a communication tool to argue for particular changes in the platform [P6] and
💡 Getting instructional designers access to the tool would directly influence university guidelines [P1]
[P3 interview] "...because the time that GenderMag evaluation walkthrough basically took..[AID] just removes all that load from me. And it just gives the results, which is what I'm interested in."
One participant shared their experience of manually evaluating their course materials using the GenderMag method and then using AID.