What is AI Bias and Why It Matters to IT?

Notice there’s no question whether bias exists; it does. But rather the question is how does it get introduced and why should IT care?

The hype around AI could not be higher right now. Interest is piqued, demand is overwhelming, and everyone is scrambling to find “the killer app” for their market.

But under the hype, there are concerns being voiced, with good reason. It is fairly simple to introduce bias into AI, and that bias is raising alarms in some circles.

A quick introduction to AI learning models

To understand how bias is introduced into AI, it’s necessary to have a basic understanding of how AI models are trained.

Depending on who you ask and how pedantic they want to be, you’ll get different answers on how many different learning methods there are. And indeed, the methods, algorithms, and models used today are extensive and, in many cases, beyond comprehension to those not deeply steeped in the field. But it’s important to understand, at a high level, how models are trained because ultimately that is how bias is introduced. Keeping that in mind, there are three basic ways to train AI models:

  1. Supervised learning. Input data is labeled. The system knows what the output should be based on the data set and labels used to train it and is able to use that knowledge to predict other values. Within this category, there are two major types of algorithms used. One is based on classification, in which data is grouped into categories based on attributes such as colour, size, and shape. Image recognition usually falls into this category, but other common uses are spam detection and email filtering. The second uses mathematical regression to discover patterns that are based on a linear relationship between the input and the output. In this method, output is categorised outside the model, such as measurements of the actual weather. Market and weather trends often use this method.
  2. Unsupervised learning. As the term ‘unsupervised’ implies, there is no guidance given to the system on the nature of the data. It is unlabeled. The system is expected to uncover patterns and relationships on its own and predict a result. Unsupervised learning algorithms rely on two different techniques: clustering and association. With clustering, the system is asked to group data based on similarities such that data in one group has few or no similarities with other groups. Customer buying behaviour is a common use of clustering. With association, the system is asked to find relationships between data, such as dependencies between them. This approach is entirely one of correlation, not causation. Unsupervised systems simply uncover “things that go together with other things”, not “things that cause other things to happen.” Association is often used for web usage mining.
  3. Reinforcement learning. Reinforcement learning is a compromise between supervised and unsupervised training that seeks to minimise the disadvantages of each. With reinforcement learning, systems are given unlabeled data which they explore. They are then either positively or negatively rewarded for output, which the system ‘learns’ from to hone its decisions. This is the model closest to how humans learn, as seen in the use of quizzes and tests in the education process. Video games, robotics, and text mining are common uses of reinforcement learning.

How bias creeps into AI

Okay, now to the real topic—how bias can be introduced into these systems.

The answer, and I’m sure you’ve already figured it out, is based on the fact that humans are often involved in the training process.

The easiest way to bias supervised learning is to poison the data, as it were, by mislabeling data. For example, if I’m classifying animals, mislabeling a ‘dog’ as a ‘cat’ can result in misidentification at high enough scale. A risk with labeling is intentional mislabeling with the goal of corrupting the output. Some mislabeling is merely the product of human judgment, such as deciding whether a panther is a cat or whether a statue of a cat counts as a cat. With reinforcement learning, positively rewarding the wrong answer or move in a game could potentially result in a system that intentionally gives the wrong answers or always loses.

Which for some folks might be an appealing option.

Now obviously this has implications for conversational (generative AI) such as ChatGPT, which was, according to their site, fine-tuned using “supervised learning as well as reinforcement learning” that “used human trainers to improve the model’s performance.” When you choose the “up” or “down” option to rank responses, that data can potentially be used to further fine-tune its model. You, dear reader, I assume are human. Ergo, the potential for further biasing the system exists. The reality is that ChatGPT is often flat out wrong in its answers. Feedback is necessary to further train the system so it can generate the right answer more often.

Now that’s interesting—and we could have a fascinating conversation about the ways in which we could manipulate those systems and the consequences—but the real reason I wanted to explore this topic is because the problem of bias extends to telemetry, the operational data we all want to use to drive automation of the systems and services that deliver and secure digital services.

AI, bias, and telemetry

You may recall I’ve written on the topic of data bias as it relates to telemetry and the insights 98% of organisation’s are missing.

In most cases related to analysing telemetry, models are trained using data that’s been labeled. Bias can be introduced into that system by (a) mislabeling the data or (b) not having enough diversity of data in a specific category or (c) the method used to introduce new data. The reason mislabeling data is problematic should be obvious, because it can, in large enough quantities, result in misidentification. The issue with diversity of data is that data falling outside such a narrow training set will inevitably be misclassified.

A classic example of this was an AI model trained to recognise tanks versus other types of transportation. It turns out that all the tanks were photographed in daylight, but other vehicles were not. As a result, the AI did a great job at tank versus not tank but was actually correlating day versus night. The lack of diversity in the input set caused a biased correlation.

Even if an operational AI is relying on reinforcement learning, the lack of diversity of data is problematic because the system does not have all the variables necessary to determine the next move, as it were.

The reason an AI might not have a diverse set of data or all the variables it needs is, you guessed it, data bias. Specifically, the data bias introduced by selective monitoring, in which only *some* telemetry is ingested for analysis. For example, the impact of DNS performance on the user experience is well understood. But if a model is trained to analyse application performance without telemetry from DNS, it may claim that performance is just fine even if there’s an issue with DNS, because it has no idea that DNS is in any way related to the end-to-end performance of the app. If the next move is to alert someone of a performance degradation, the system will fail due to bias in data selection.

It won’t surprise you if I tell you our annual research discovered that over half of all organisations cite “missing data” as a top challenge to uncovering the insights they need.

Thus, even if organisations were all in on leveraging AI to drive operational decisions, it would present a challenge. Without a diverse data set on which to train such a system, the potential for bias creeps in.

A third way bias can be introduced is in the methods used to introduce data to the model. The most common operational example of this is using the results of synthetic testing to determine the average performance of an application, and then using the resulting model to analyse real traffic. Depending on the breadth of locations, devices, network congestion, etc. that form the dataset from synthetic testing, perfectly acceptable performance for real users might be identified as failure, or vice versa.

The risk to digital transformation

The risk is an erosion of trust in technology to act as a force multiplier and enable the scale and efficiency needed for organisations to operate as a digital business. Because if the AI keeps giving the ‘wrong’ answers, or suggesting the ‘wrong’ solutions, well, no one’s going to trust it.

This is why full-stack observability is not just important, but one of the six key technical capabilities needed for organisations to progress to the third phase of digital transformation: AI-assisted business.

Missing data, whether due to selective monitoring or the opinionated curation of metrics, has the potential to bias AI models used to drive operational decisions.

Careful attention to the sources and types of data, coupled with a comprehensive data and observability strategy, will go a long way toward eliminating bias and producing more accurate—and trustworthy—results.

About the Author

Lori MacVittie is F5 Distinguished Engineer. F5 is a multi-cloud application services and security company committed to bringing a better digital world to life. F5 partners with the world’s largest, most advanced organizations to optimize and secure every app and API anywhere, including on-premises, in the cloud, or at the edge. F5 enables organizations to provide exceptional, secure digital experiences for their customers and continuously stay ahead of threats. For more information, go to f5.com.

Featured image: ©Dilok