ML, even when perfect, learns the world as it is, not as people want it to be. There's nothing inherently sexist about a machine noticing that the background of an image can be used to identify a class label, e.g. noticing that a mug is more likely to contain coffee than tea in a coffee shop, but given how shitty we have been to each other, some correlations are considered taboo. The only way for machines to learn our taboos is to explicitly teach them.
Machine learning is only as good your training set. Readily available training sets, from instance based on what users volunteer to hand out in social networks, may have certain skews due to how they are collected.
For supervised learning methods another possible source of bias is in the labeling process.
If you see that an algorithm has issues with imbalanced classes, you could try to address that, or you could frame your problem as sexism, write a paper with that in the headline and then call Wired.
And in any case, if you read the paper, they examine a single algorithm (Conditional Random Fields) on two datasets (which Wired extrapolates to the entire field), and their own solution is to add constraints saying that it should preserve the ratio of woman:man cooking as in the original dataset. And while there is no loss of accuracy, it also has no improvement, so it's just shifting the errors around. And it has absolutely no analysis of why a CRF would exhibit these properties anyway.
But this is kind of my point, even after you solve these issues that arise from class imbalance (which ML practitioners & researchers are highly motivated to solve these already because they lead to better average performance), you are still left with a bias that is taboo that society will say must be fixed, which cannot be fixed by simply more accurate ML or more accurate data.
This is something I know very little about, but will machine learning evolve over time?
I mean imagine if this technology was being created before the civil war and it was fed images of slavery the assumptions it would make!
Our world and what is normalized changes over time, will machine learning reflect that?
The world I've experienced is different than the world a generation below me has experienced but if it's my generation feeding the machine the images what happens?
Images from my parents' generation would have shown women who didn't go to college unless it was for an MRS., nursing, or teaching. My Mom's only sport option was cheer leading. This is drastically different from my sister's life experience.
There are really a few problems here that I think the article is getting a little mixed together, and I wanted to lay them out.
First, that the image corpuses used for machine learning have a strong gender bias, perhaps more than exists in the "real world." More images of men than women, more images of men working on computers, more images of women in kitchens, etc.
These images are sourced from http://imsitu.org/ which is sourced from http://image-net.org/, which (after some digging) looks like it gets most of its images from Flickr, stock photography sites, and random corporate sites. Are these representations of the "real world"? I would argue not. Professional photography, stock photography, and photos taken for the purpose of being used in an unknown future context, and/or to appeal to the most people, tends to err on the side of being "universally applicable" and emphasizing the "common idea of a thing" rather than how the thing actually is, with all its variations. An image of a man in a kitchen be perceived as more controversial and may be less universally usable than an image of a woman in a kitchen. So if you want to take a photo with as many possible uses as possible, you'd tend to fall back on established social norms MORE often than they might actually occur.
Second, machine learning tends to emphasize small differences when it has nothing else to go on, or is improperly trained. If you have a dataset featuring people in kitchens where 75% of the time the person in a photo is a woman, you could get an algorithm that is 75% accurate simply by saying "the person in the photo is a woman" every single time. While the dataset reflects 75% women, the algorithm reflects 100% women. It emphasizes small differences in order to gain accuracy.
This isn't just hypothetical. Many times, I've worked on a categorization/labeling dataset that turns out to have no actual underlying pattern, but I wind up, after many hours, getting a best fit algorithm that, say, predicts the dataset correctly 85.166667% of the time... only to realize that my dataset is spectacularly unbalanced and exactly (EXACTLY) 85.166667% of the dataset is in a single category. It's amazing how it just sort of snaps to that when you start layering the machine learning algorithms and you realize that the real problem is that there's no real pattern in the data (something data scientists don't often like to admit).
Third, sometimes the algorithms just get it wrong in ways that seem minor and rare from a data science perspective, but have large social consequences. Like improperly labeling a black couple as gorillas. It might actually be the case that the algorithm was improperly trained because it lacked photos of black people and photos of gorillas and didn't have much to go on (an example of the first issue) but I don't know enough about the situation to say for sure.
And fourth, of course, is that these patterns DO exist to some degree in the "real world," and this is a point that's been hammered on over and over again on Hacker News. The problem is that machine learning is a sort of big leveler that finds these patterns wherever it can and applies them universally (and often while emphasizing the differences for the reasons stated above). I mean, that's the point of it, after all! But knowing this fact, I think it makes sense to be careful where and how it's used.