Last week Twitter published an article titled “Sharing learnings about our image cropping algorithm”. In the article they describe their findings about gender and racial biases in their cropping algorithms.

These biases have affected hundreds of millions of people using the platform, so let’s try to figure out what happened and how they discovered it.

Imagine you want to make a post on Twitter with 3 vertical images. If Twitter showed the images in their full size, your post would take a lot of screen real estate on a mobile device, giving an awful experience. This is why they’d rather crop your images and let users see the full image by clicking on them.

Cropping an image sounds like a trivial task, but it’s not. A crop will inevitably leave some parts of the image out. The question is what do you leave out and what do you emphasise? How do you choose where to crop?

Twitter’s strategy to answer this question is to use “saliency algorithms”. These are AI models that look at a picture and identify the most interesting parts of it to decide how to crop them. From Twitter’s blog:

Saliency models are trained on how the human eye looks at a picture as a method of prioritizing what’s likely to be most important to the most people. The algorithm, trained on human eye-tracking data, predicts a saliency score on all regions in the image and chooses the point with the highest score as the center of the crop.

Bias can creep in cropping algorithms as well. If these algorithms consistently rate a specific gender or skin color as less important than the millions of cropped pictures will under represent millions of people on the platform.

It looks like this may have been the case for Twitter. Probably sparked by some users' reports, Twitter decided to launch a first exploration into their cropping algorithms in October 2020.

The results of this first analysis didn’t seem to show any bias, but it started a conversation within Twitter about different methods to solve the image-cropping problem that don’t rely on algorithms. It also started a more rigorous exploration, the results of which have been published last week.

From Twitter’s blog:

To quantitatively test the potential gender and race-based biases of this saliency algorithm, we created an experiment of randomly linked images of individuals of different races and genders […]. If the model is demographically equal, we’d see no difference in how many times each image was chosen by the saliency algorithm. In other words, demographic parity means each image has a 50% chance of being salient. Here’s what we found:

  • In comparisons of men and women, there was an 8% difference from demographic parity in favor of women.
  • In comparisons of black and white individuals, there was a 4% difference from demographic parity in favor of white individuals.
  • In comparisons of black and white women, there was a 7% difference from demographic parity in favor of white women.
  • In comparisons of black and white men, there was a 2% difference from demographic parity in favor of white men.

Twitter concluded that “not everything on Twitter is a good candidate for an algorithm, and in this case, how to crop an image is a decision best made by people”. They stopped relying on their saliency algorithm, and worked on a new UI to let people see and choose how they wanted their images to be cropped.

I think there’s tons of lessons in this story about how to manage AI bias incidents.

Let’s start from the beginning. The problem arose from a flaw in Twitter’s governance processes: every algorithm that can discriminate based on visual attributes should be tested for bias. Twitter took responsibility over their lack of proper governance in their first blogpost:

While our analyses to date haven’t shown racial or gender bias, we recognize that the way we automatically crop photos means there is a potential for harm. We should’ve done a better job of anticipating this possibility when we were first designing and building this product.

I wonder what would have happened if the algorithm was designed today. Twitter released this technology in 2018, when the attention towards these issues was not as high as today. Hopefully, there are better processes in place. In any case, kudos to Twitter for acknowledging their mistake. If tech companies want to be trusted more (and they should work on this), being open about one’s own mistakes is the first step.

The second lesson we can learn from this episode is about involving users in the hunt for AI biases. As a Data Scientist, I can empathise with the difficulty in testing for any possible issue that can arise from a ML model. Tech companies are working on solving the issue using the tool that they know: tech. Amazon, Microsoft, Google, Facebook, they’re all working on systems that use different degrees of automation to detect biases in ML models.

What if the solution was to involve people instead? We have bounty programs to remunerate hackers for finding security bugs, why not involve users for spotting AI biases? The problem is that a lot of tech companies are scared about asking people to hunt and report their problems, but really enjoy pushing more and more code to hand off all responsibilities for their decisions to computers. If you really like using hammers, everything looks like a nail. I’m skeptical that the solution to flawed algorithms is more algorithms, we need to involve people more.

This is what Twitter has done, launching an hashtag dedicated to asking questions about their ML algorithms: #AskTwitterMETA (ML Ethics, Transparency and Accountability). Instead of shielding themselves from critics, Twitter decided to actually ask people for input.

The last learning is about Twitter’s response and decision. When people started talking (and getting outraged) about the problem, they decided to give an immediate response while starting a more rigorous investigation, all while designing and testing alternatives. This tells people a strong message: “we listen, and we care”.

It took them ~6 months to publish the results into a scientific paper and communicate them to the public, together with a new design solution to the cropping problem. It may sound like a lot, but we’re talking about a decision that led to a major redesign of the UI of an app used by almost 200 million people. I think we can be quite happy with that.

The solution that Twitter found is also brave: stop relying on algorithms, and give people more agency towards the way their pictures look like once cropped:

not everything on Twitter is a good candidate for an algorithm, and in this case, how to crop an image is a decision best made by people.

I strongly believe that the path to a fairer Artificial Intelligence goes through more Human Values. Giving people the opportunity to speak, listening to them, and taking decisions is what every responsible tech company should do.

So for today, good job Twitter 👏