A note on risks
Across this series of articles, we aim to facilitate a conversation about risks — risks born of the technologies powering AI, as well as risks born of the use of such technologies in an industrial application. By nature, machine learning has the ability to learn from patterns in data and make decisions based on those patterns to complete assigned tasks.
We’ve explored the concepts of “supervised learning” and “unsupervised learning,” which refer more to target activity and resource allocation than to risk management. Supervised learning, for example, is generally used to classify data or make predictions, whereas unsupervised learning is used to help understand the relationships within data to facilitate predictions (or for quantitative research purposes). Supervised learning is more resource-intensive because it requires data to be classified or “labeled.” Labeling can be performed by another algorithm or by a person, but performance is usually best with a “human in the loop.” While algorithms have advanced exponentially in their performance and abilities, guarding against poor decision-making is not something that happens without careful thought. Similarly, human decision-making, particularly “gut checking,” is complex and difficult to model to ensure reliability.
One underlying risk to manage in these instances is bias. Bias is the result of a model being trained on data that skews decision-making toward one particular outcome. A particularly problematic example would be using machine learning to make decisions on credit applications based purely on historical observations of prior decisions. Such an approach would run the risk of reinforcing past issues with unfair credit decisions by learning from and perpetuating historically enacted bias. On the flip side, machine learning could also be used to help identify biases, as well as discriminatory and unfair practices, which may in turn yield advancements in ensuring fairness and equity in future decision-making.
Another common and important category of risk to manage in machine learning and deep learning is overfitting and underfitting. Overfitting happens when a machine learning model is overly reliant on past events that do not represent the reality of its use in making predictions. Conversely, underfitting happens when there is either too little data or data that does not fit the model result particularly well. The disclaimer used across the finance industry that “past performance is no guarantee of future results” essentially describes the issues of overfitting and underfitting — but these challenges can be managed. Techniques to avoid overfitting and underfitting include choosing high-quality data, using cross-validation, and applying regularization as well as hyperparameter tuning.
As this research series expands, we will address specific risk-related issues that emanate from the application of these technologies in specific contexts. We believe this is a clear and actionable way to provide insights on such risks, bound by a cohesive frame around the specific risk to be managed.
Machine learning algorithms are considered the “core” of artificial intelligence applications. They serve a foundational function, and their importance in many downstream processes requires care to ensure properly balanced “risk managed” outcomes. Machine learning’s rich and storied history also entails a balance: huge leaps in technical advancement and promise set against periods of social doubt and relative drought in research funding.
A brief history of AI, from the Turing test to today’s transformers
Discussions of AI history frequently mention the “Turing test,” a thought experiment conceived by (and named after) British mathematician and computer scientist Alan Turing. The test considers one way to determine whether a machine could exhibit human-like intelligence. Specifically, it contemplates the capability of a machine to generate language-based responses that are indistinguishable from those of a human. In the scenario Turing envisioned, scientists would ask a human to have a typed conversation without knowing whether they were communicating with another person or a machine. If the person were to believe they were talking to a human when in fact they were conversing with a machine, the machine would pass the Turing test.
The modern concept of AI is deeply rooted in Turing’s operational definition of machine intelligence — determining success based on how convincingly a machine can replicate human-like results in performing a task, rather than attempting to directly answer the question of whether a machine can “think.” Fast-forward 70-plus years, and this framework gives us an enhanced understanding of the excitement that arose when GPT-3.5 (aka ChatGPT) was released for public trial. Using a unique application of machine learning foundations and advancements in modeling, the technology produces results that, by the parameters of the Turing test, are quite convincing.
In that 70-year span between the Turing era and today, several major phases of machine learning development led to the current precipice of AI ubiquity.
Many of the foundations of machine learning were established in the 1960s and 1970s. With the realization that handcrafting all computing rules would become unsustainable, computer science shifted toward teaching computers to “learn” from data. At that time, the concept of a "neuron" was introduced, and early perceptron algorithms were developed. The perceptron, put simply, is a type of network in which each input is connected with every other, with allocated weights that show the strength of each connection. Changes to the weighted connections would result in different outputs.
Despite such advancements in computational technology, funding and interest dropped for a time in the 1970s and 1980s — and thus began a period sometimes referred to as “AI winter.” Nevertheless, development continued with more and more powerful algorithms, including two now-famous types of algorithms known as “neural networks” developed in the 1980s: neural networks with backpropagation and recurrent neural networks. Neural networks are algorithms that teach computers how to process data in a way that mimics neural processes in the human brain. In the 1980s, when this concept was developed, the work was largely theoretical, as it was difficult to obtain enough data and amass adequate computing resources to develop these concepts in the field.
Roughly 15 years later, the digital era sowed the seeds of further advancements in machine learning models, making them both more viable and increasingly essential. In this phase, ensemble methods were developed, which stacked multiple machine learning models together. Significant examples included “random forests” and “support-vector networks."
In the 1990s and 2000s, with the advent of “big data” and thanks to advancements in computational power, significant breakthroughs occurred in machine learning and AI, especially in neural networks. New processes of machine learning, known as “deep learning,” (because the architecture was deeper and more complex, containing several hidden layers between the input and the output) would enable data processing on a deeper, more accurate, and more flexible level. Benchmarks that had been frozen for decades improved dramatically across almost all the classic applications, such as machine translation in natural language processes and image classification in computer vision.
Most recently, the transformer architecture (encoder–decoder model) has given birth to a growing list of “killer apps” since its introduction in 2017 in a paper by Google Labs researchers titled "Attention Is All You Need." The transformer has become the foundation for many subsequent models in natural language processing, such as BERT, T5, GPT, and more.