Another Example of AI’s Issues
If you’ve been following news about artificial intelligence lately, then by now you may already have seen this: an Asian-American computer scientist at MIT, who uploaded a picture of herself into an AI tool and asked it to make her look “more professional.” The software turned the woman white.
The computer scientist in question, Rona Wang, was experimenting with a tool called Playground AI, a free online AI image creator. She wanted a portrait photo suitable for a LinkedIn profile, and uploaded a picture of herself in an MIT sweatshirt. Playground AI returned an image roughly similar to Wang, but with blue eyes and white skin.
Wang posted a before-and-after comparison of the photos on Twitter. You can see them for yourself, below.
was trying to get a linkedin profile photo with AI editing & this is what it gave me pic.twitter.com/AZgWbhTs8Q
— Rona Wang (@ronawang) July 14, 2023
Yikes. Clearly that outcome from Playground AI is undesirable; even the company’s founder admitted, “We’re quite displeased with this and hope to solve it” — but how to solve that problem is the real question here. Wong’s misadventures in photography bring me back to a question about AI that’s been bugging me for months.
How, exactly, are companies supposed to audit this stuff?
Like, is an IT audit team supposed to examine the code in an AI program so they can declare, “Yep, these lines right here — they’re the ones causing the program to conflate ‘professional’ with whiteness”? Because any auditor skilled enough to make that judgment isn’t going to work in IT auditing; he or she will be recruited by Playground AI or Open AI or any other AI startup that comes along to develop the code itself.
Moreover, just about every auditor I know — including some who have spent years specifically auditing code for cybersecurity flaws — privately tells me that they have no idea how to audit artificial intelligence. ChatGPT supposedly uses 1.76 trillion parameters to perform the calculations that drive the AI answers it gives us humans, and it’s only one among scores of AI tools that companies are using these days. Asking auditors to inspect code that complex so that you can root out unwanted behaviors is like asking a doctor to identify which brain cells to zap in a person’s skull so he’ll stop being racist or bullying. You can’t; that’s not how thinking works.
We’re using technology when we don’t understand how it works. If we don’t understand how it works, how can we provide assurance that it will work according to desired risk tolerances or required compliance obligations?
The Evolutionary Challenges of Audit
Think of it this way. If you’re examining an ERP software system to find cybersecurity flaws, that can be done. Sometimes the work is tedious (especially if you’re not using automated tools), but it is straightforward. You can benchmark the company’s ERP code against known software vulnerabilities and get a list of cybersecurity flaws that need to be fixed. Then you fix them.
Along similar lines, if you want to audit code for compliance risks — say, confirming that the app handles personal data in accordance with various data privacy laws — that’s possible too. You audit the code, get a list of weaknesses, and fix them.
 So you can audit security, and you can audit compliance. What Wong demonstrated with Playground AI is something more complex. We need to find ways to audit an AI program’s behavior, without any easy way of understanding the code that generates the behavior.
So you can audit security, and you can audit compliance. What Wong demonstrated with Playground AI is something more complex. We need to find ways to audit an AI program’s behavior, without any easy way of understanding the code that generates the behavior. 
Maybe that’s not so far-fetched. For example, when you audit a business function’s performance, you don’t dwell on whether the employees like to follow certain procedures for vendor payments or onboarding; you dwell on transaction records to understand whether they are following procedure. You audit results, not thinking.
Except, I’m not sure how well that analogy works here, because AI doesn’t think in the way that humans do. It relies on reams of raw data to extrapolate the answer that’s most likely to be correct, given the question it was asked.
Go back to Wong and her newly caucasian photo. Did Playground AI generate that image because the millions of other images it studied were predominantly white? Is that because LinkedIn is primarily for white-collar workers, and those workers also still tend to have white skin? If it’s the latter, that sounds more like structural racism in the labor market to me, and we can’t really fault Playground AI for misunderstanding how to navigate that. Plenty of people can’t navigate it either.
So do we re-calibrate the reams of source data an AI app uses to generate its results? How do we decide which individual pieces of data go into that pool? Because those decisions are a reflection of our own biases and moral priorities — and last time I checked, we humans had no consensus on those issues either.
Look at things that way, and suddenly this challenge is way above IT audit’s pay grade. It touches on compliance obligations, because the IT systems we use do need to adhere to laws, rules, and regulations; and it also raises difficult ethical questions, since moral values are the foundation for every choice we make. And every choice we make is now indexed and stored for future analysis by AI.
AI and Audit Today
Enough theory; let’s bring this back to tangible AI audit challenges already bearing down on large companies.
New York City recently enacted a rule that employers there using AI in their hiring processes must first perform a “bias audit” to assure that the AI isn’t treating protected groups of job applicants unfairly. The bias audits must be performed annually, and the results posted on the employer’s website. How’s that supposed to work?
New York City has published final rules that give us some answers to that question. Broadly speaking, companies will need to study the AI tool’s “selection rate” for each protected class, and assure that no class has such a low selection rate compared to all applicants generally that bias might be the culprit.
So New York City is defining bias based on outcomes. Fair enough, but let’s say you have an AI tool that does raise questions about possible bias. Then what? Does the employer junk that AI tool and move onto another? Does the IT audit team camp out with the software coding team to identify suspicious code in the AI or flawed data sources the AI is using to learn? If you re-calibrate the data sources, will that introduce other forms of bias that might manifest in other ways?
I don’t know the answers to those questions, but you see where I’m going here. Audit and compliance teams are going to have difficult journeys ahead as we try to tame artificial intelligence — a technology we barely understand right now, and that’s only going to get more complex in the future. Brace yourselves.
