In machine learning, bias can occur when a model weights certain features or elements of a dataset more heavily than others or when training data do not properly represent the intended use case, resulting in problematic model output. A well-known example was Amazon’s effort in 2018 to create an ML-based recruiting algorithm, that produced results biased against women candidates. The bias occurred because historical labels of successful hires used to train the ML model were biased towards men.
Bias in machine learning can take different forms and arise from different causes. Common examples, as with the Amazon hiring algorithm, result from “selection bias,” whereby the data used for training a model inherently reflect historical human bias that gets perpetuated by the model.
Bias can significantly affect the performance and results of a machine learning model. Data scientists need to be particularly aware of the potential for bias in training data and model design. Selection bias, for example, is common in situations where prototyping teams are narrowly focused on solving a specific problem without regard to how the solution will be used and how the data sets will generalize. A machine learning modeler must ensure that training data properly represent the population or take alternative steps to mitigate introduction of bias to the model.
The C3 AI Platform and C3 AI Applications provide numerous features enabling data scientists and developers to identify bias in datasets and models. These include tools to thoroughly explore the characteristics of training data, visualize data in numerous ways, and evaluate model output in context of the intended use case.