Glossary of AI Terms | Acronyms & Terminology
Artificial intelligence (AI) is a hotly discussed topic right now—especially with the release of AI writing tools like ChatGPT. It’s a field that more and more people are interested in, but it involves a lot of terminology that can be intimidating to a non-expert.
To help you get to grips with various commonly used AI terms, we’ve prepared a glossary covering a wide variety of acronyms, technical language, and big names in the world of AI.
Check the table below for a quick definition of the term you’re interested in, and click on the term to navigate to a more in-depth explanation below.
|Algorithm: A finite sequence of instructions followed by a computer system||Alignment: The extent to which an AI’s goals are in line with its creators’ goals|
|AlphaGo: An AI system that plays Go||Anthropomorphism: Attribution of human traits to non-human entities|
|Artificial general intelligence (AGI): AI that surpasses human intelligence||Artificial intelligence (AI): Intelligence demonstrated by machines|
|AI detector: A tool designed to detect when a text was AI-generated||AI safety: The study of how AI can be developed and used safely|
|AI writing: Use of AI technology to produce or edit text (e.g., chatbots, paraphrasing tools)||Automation: Handling a process with machines or software so that less human input is needed|
|Autonomous: Able to perform tasks without human input||Bard: A chatbot developed by Google, released in March 2023|
|Bias: The assumptions that an AI makes to simplify its tasks||Big data: Very large datasets that normal data-processing software can’t handle|
|Bing Chat: A chatbot feature integrated into Bing, released in February 2023||Burstiness: A measurement of variation in sentence structure and length|
|CAPTCHA: A test used online to ensure that the user is human||Chatbot: A software application that mimics human conversation, usually through text|
|ChatGPT: A chatbot released by OpenAI in November 2022||Chinese room: A philosophical thought experiment about AI|
|Computer: A machine that can automatically perform the tasks it is programmed for||DALL-E: An AI image generator released by OpenAI in January 2021|
|Deep Blue: An AI system that plays chess||Deepfake: AI-generated images and videos designed to look real|
|Deep learning: A form of machine learning based on neural networks||ELIZA: An early chatbot developed in the 1960s|
|Emergent behavior: Complex behavior resulting from basic processes||Generative AI: AI systems that generate output in response to prompts|
|Generative pre-trained transformer (GPT): A type of LLM used in ChatGPT and other AI applications||Hallucination: Tendency of AI chatbots to confidently present false information|
|Large language model (LLM): A neural net trained on large amounts of text to imitate human language||Machine learning (ML): The study of how AI acquires knowledge from training data|
|Machine translation: Use of software to translate text between languages||Midjourney: An AI image generator released in July 2022|
|Natural language processing (NLP): The study of interaction between computers and human language||Neural net(work): Computer systems designed to mimic brain structures|
|OpenAI: A leading AI company that developed ChatGPT and DALL-E||Parameter: A variable in an AI system that it uses to make predictions|
|Paraphrasing tool: An AI writing tool that automatically rephrases the text you type into it||Perplexity: A measure of how unpredictable a text is|
|Plagiarism: Unacknowledged use of the words or ideas of others||Programming: The process of giving instructions to a computer (using computer code)|
|Prompt: The input from the user to which the AI system responds||QuillBot: A company offering a popular paraphrasing tool and other AI writing tools|
|Reinforcement learning from human feedback (RLHF): Training method used to fine-tune GPT model responses||Robot: Machine capable of carrying out physical actions automatically|
|Temperature: The level of randomness in an LLM’s output||Training data: The dataset that was used to train an AI system|
|Token: The basic unit of text (a word or part of a word) processed by LLMs||Turing test: A test of a machine’s ability to display human intelligence|
Glossary of AI terms
An algorithm may for example be a series of mathematical steps that are used to calculate some variable. They can also be designed for more complex tasks like deciding what content to promote in a social media feed.
Alignment is the extent to which an AI system’s goals and incentives are in line with the intentions of its creators. AI is said to be aligned if its actions will result in the outcomes the creators intended and misaligned if they lead to some other undesired outcomes.
Alignment becomes more and more important as the capabilities of AI increase. Emergent behaviors can occur, resulting in outcomes we are unable to predict.
This becomes more likely as we move towards long-term goals like artificial general intelligence (AGI), since the way AI systems work becomes harder for humans to understand. AI safety research is strongly concerned with alignment.
But AlphaGo was able to beat the world’s most highly ranked Go player, Ke Jie, in 2017, just as Deep Blue had defeated Garry Kasparov at chess in 1997.
Anthropomorphism is the attribution of human traits, motivations, intentions, and emotions to non-human entities. Humans tend to anthropomorphize all kinds of things, from animals to the weather.
We also increasingly anthropomorphize technology, especially technology like AI that seems to exhibit human intelligence.
For example, people often form emotional attachments to chatbots or believe that a chatbot that hallucinates is lying to them deliberately. They may refer to AI systems or robots using gendered pronouns, giving them human traits that they don’t in fact possess.
Artificial general intelligence (AGI)
Artificial general intelligence (AGI) refers to a hypothetical future AI system whose intelligence equals or surpasses that of humans across all intellectual tasks.
Many AI researchers believe that AI is likely to reach this point and surpass human intelligence sometime this century due to the acceleration of machine learning.
This scenario is frequently discussed in the context of alignment and AI safety: if AI becomes more intelligent than humans, it’s essential that it continues to work towards our goals and doesn’t endanger people.
Artificial intelligence (AI)
Artificial intelligence (AI) is human- or animal-like intelligence exhibited by machines: for example, the ability to perceive, understand, and reason about the world in the same way as people do.
In practice, AI covers a wide range of applications. AI technology is used in chatbots based on natural language processing (NLP), in autonomous (self-driving) vehicles, in search engines, and in game-playing bots like AlphaGo and Deep Blue.
An AI detector is a tool designed to detect when a piece of text (or sometimes an image or video) was created by generative AI tools (e.g., ChatGPT, DALL-E). These tools aren’t 100% reliable, but they can give an indication of the likelihood that a text is AI-generated.
AI detectors work with similar large language models (LLMs) to the tools they’re trying to detect. These models essentially look at a text (or image or video) and ask “Is this something I would have generated?” Specifically, they’re looking for low levels of two criteria: perplexity and burstiness.
AI detectors may be used by universities and other institutions to identify AI-generated content.
AI safety is a field of research concerned with how AI can be developed and used in a safe and ethical way. Issues discussed by AI safety researchers include alignment, bias, the disruptive effects of automation, hallucination, and the use of generative AI to create deepfakes and misinformation.
AI writing is a general term for the use of AI (generally LLMs) to generate or modify text. It encompasses chatbots like ChatGPT as well as paraphrasing tools and grammar checkers like those offered by QuillBot and Scribbr.
Automation involves making a process or system operate automatically—independently, without requiring human input. Automation can reduce the amount of human labor involved in a process or can completely eliminate the need for some kinds of human labor.
Since the Industrial Revolution, more and more kinds of work (e.g., manufacturing) have been automated. This has led to huge gains in efficiency but also to disruption in the affected professions, which can result in unemployment.
The adjective autonomous means “operating without outside control.” It’s commonly used in the world of AI to refer to systems, robots, software, and so on that can operate independently, without direct human input.
Bard is a chatbot developed by Google and released in a limited capacity on March 21, 2023. At its release, it was seen as the company’s answer to OpenAI‘s ChatGPT and to Bing Chat. Like those tools, it’s based on LLM technology.
However, it has received mixed reviews since its release, with users claiming that it is less powerful than ChatGPT when given similar prompts.
In the context of AI, bias refers to the assumptions made by an AI system in order to simplify the process of learning and performing the task(s) it was designed for. AI researchers aim to minimize bias, since it can lead to poor results or unexpected outcomes.
Big data refers to datasets so large or complex that they can’t be handled by normal data-processing software. This is key to the power of technologies like LLMs, whose training data consists of enormous quantities of text.
Bing Chat (also called Bing AI) is a chatbot developed by Microsoft and integrated into their search engine, Bing, from February 7, 2023, onwards. The chatbot was developed in collaboration with OpenAI, based on their GPT-4 technology.
Upon release, Bing Chat was met with some controversy due to its tendency to hallucinate and to give inappropriate answers. Many in the field of AI safety commented on the apparent lack of work put into aligning this technology before release.
Others were excited by the potential of integrating GPT technology into a search engine, since ChatGPT (at the time) was unable to search the internet.
Burstiness is a measurement of variation in sentence structure in length. AI writing tends to display low levels of burstiness, while human writing tends to have higher burstiness.
In combination with perplexity, burstiness is generally what AI detectors look for in order to label a text as AI-generated. Using a higher temperature setting in an LLM will result in text with higher perplexity and burstiness.
A CAPTCHA (“Completely Automated Public Turing test to tell Computers and Humans Apart”) is a test that’s frequently used online to check that the user is human. They are used to prevent spambots from flooding sites with, for example, spam comments.
To complete a CAPTCHA, the user must carry out simple tasks like typing in some text from a distorted image or clicking on images that contain a certain object.
They are designed to be straightforward for humans but difficult for computer systems to solve. But some have succeeded in solving them automatically using machine learning.
A chatbot is a software application that simulates human conversation, usually in the form of a text chat. They are based on deep learning and NLP technology and may be used for tasks such as customer support and content writing.
LLM-based chatbots have gained enormous popularity since the release of ChatGPT, Bing Chat, and Bard in the last few months. Less sophisticated chatbots had existed for a long time, going back to ELIZA, released in the 1960s.
ChatGPT has been a huge success, reportedly becoming the fastest-growing app of all time. It has been widely discussed as a major step forward in AI technology and has brought the topic of AI writing to the attention of the public.
OpenAI has also received criticism from AI safety researchers, who argue that the tool is not effectively aligned and has the potential to be used as a tool for spreading misinformation. Some also argue that the company has failed to live up to its stated values by not publicly disclosing the technical details of its models.
The Chinese room is a philosophical thought experiment proposed by John Searle. The reader is asked to imagine an AI system that behaves as if it understands Chinese. It passes the Turing test, convincing a human Chinese speaker that they are speaking to a human being. Searle asks: Does the AI understand Chinese, or is it just simulating the ability to understand Chinese?
He then proposes a second scenario in which he (Searle) is inside a closed room, receiving the inputs of the Chinese speaker outside the room and producing coherent responses by following the instructions of the AI. The Chinese speaker again believes they are talking to a human being.
Searle, who speaks no Chinese, argues that it would be unreasonable to say that he “understood” Chinese just because he was able to facilitate the conversation by following a program. He states that since the AI is doing just that—following a program to produce responses—it also doesn’t “understand” what it is “saying.”
The argument encourages the reader to think about whether and how artificial intelligence differs from human intelligence—and about where “intelligence” and “understanding” come from.
A computer is a kind of machine that is programmed to carry out various tasks automatically. The term can refer to a wide variety of devices, including personal computers (PCs), smartphones, games consoles, calculators, and supercomputers.
Charles Babbage is usually named as the “father of the computer”; he invented the first mechanical computers early in the 19th century. Since the middle of the 20th century, computers have become more and more widespread and accessible to the general public.
Computers, often enormously powerful supercomputers, are essential to the development of AI systems.
DALL-E is an AI image generator released by OpenAI on January 5, 2021. A second version, DALL-E 2, began to roll out on July 20, 2022. The name is a portmanteau of the names of the Pixar character WALL-E and the artist Salvador Dalí.
Both versions are powered by a version of GPT-3 that has been modified to generate images instead of text. They therefore have the same technological building blocks as ChatGPT, despite their very different output.
As with other AI image generators (e.g., Midjourney), there has been a lot of discussion about how DALL-E may influence the future of art, with some detractors claiming that these tools plagiarize existing images and can be used to create deepfakes.
This was seen as a milestone in the development of AI, demonstrating the system’s mastery of the complex strategies involved in chess play. Years later, a system called AlphaGo would accomplish the same feat in the game of Go.
A deepfake is a piece of media created by deep learning or other AI technology to misrepresent reality—for example, by showing someone doing or saying something that they did not do or say. Deepfakes can be images, video, or audio. They are often easy to identify as fake, but they’re becoming more and more convincing.
Deepfakes are a major concern of AI safety experts, since they can be used to spread misinformation and fake news, to generate sexually explicit images of people without their consent, or to facilitate fraud.
Social media platforms, governments, and other institutions often create rules against deepfakes, but they are difficult to enforce if they can’t be reliably detected. Some AI detectors are designed to detect deepfakes, although they are reportedly not very reliable so far.
They consist of a series of layers (which is why they’re called “deep”) that gradually transform raw data (e.g., pixels) into a more abstract form (e.g., an image of a face), allowing the model to understand it on a higher level.
Its programming was quite basic, typically just repeating the user’s input back at them in the form of a question, but it proved surprisingly convincing to users, who often became convinced that it could understand them—an example of anthropomorphism.
Emergent behavior refers to unexpected abilities or actions taken by AI systems—things they weren’t explicitly trained to do but which have emerged from their training. Complex behaviors arise from the interaction of a large number of simple elements.
For example, LLMs, on a basic level, work by guessing what token (word) should come next in a sentence. Despite this quite simple mechanism, they are able to produce coherent, intelligent-seeming responses to a huge range of prompts.
Other emergent behaviors can be more negative, such as the tendency of LLMs to hallucinate or give inappropriate answers. A lot of AI research relies on feeding big data into a model and seeing what happens, which involves a lot of trial and error and unpredictable results.
Generative AI refers to AI systems that generate text, images, video, audio, or other media in response to prompts from the user. This includes chatbots like ChatGPT, image generators like Midjourney, video generators, music generators, and so on.
Generative pre-trained transformer (GPT)
A generative pre-trained transformer (GPT) is a type of large language model (LLM) used to power generative AI applications. GPT models are neural networks based on a kind of deep learning model called a transformer. They are currently the most popular form of LLM.
A number of GPT models have been released by the AI company OpenAI: GPT-1 in 2018, GPT-2 in 2019, GPT-3 in 2020, GPT-3.5 in 2022, and GPT-4 in 2023. GPT-3.5 and GPT-4 are used to power ChatGPT, while DALL-E runs on GPT-3.
In the context of AI, hallucination refers to the phenomenon of an AI model confidently responding in a way that isn’t justified by its training data—in other words, making things up. For example, ChatGPT might respond to a question it doesn’t know the answer to with a confident response that is completely wrong.
Hallucination is an emergent behavior that has been observed in several LLM-powered chatbots. It seems to result from the fact that these models are just predicting plausible answers and can’t check them against sources. Because they are encouraged to provide helpful answers, they try to do so even when they don’t have the information requested.
Hallucination is one issue discussed by AI safety researchers, who are concerned that AI systems may inadvertently spread misinformation.
Large language model (LLM)
A large language model (LLM) is a neural network with a large number (usually billions) of parameters, with large amounts of text used as its training data. It is currently a highly popular approach to natural language processing (NLP) research.
LLMs are trained for general purposes rather than for specific tasks. They appear to function well because of the big data used in their training, exhibiting various emergent behaviors and capable of producing highly sophisticated responses.
Machine learning (ML)
Machine learning (ML) is a field of research focused on how machines can “learn” (i.e., become capable of more and more advanced tasks). It involves the use of algorithms interacting with training data to make autonomous decisions and predictions.
Machine translation is the use of software to translate text or speech between languages. Machine translation apps typically make use of neural networks. Popular apps include Google Translate and DeepL.
On a basic level, this can consist of replacing a word in one language with its equivalent in another. But for better results, machine translation systems need to consider the syntax of the two languages and the context in which each word appears, so that they can arrive at the translation most likely to be correct.
Midjourney is a generative AI system that produces images in response to prompts from the user. It was developed and released by the San Francisco–based company Midjourney, Inc. It is currently in open beta.
Like DALL-E and other AI image generators, Midjourney has been the subject of both praise and criticism, with some claiming that the app can be used to create deepfakes or that it plagiarizes the work of human artists, while others stress the creative potential and accessibility of the tool.
Natural language processing (NLP)
Natural language processing (NLP) is the study of how AI systems can be used to process and generate human language.
The current focus of the field is on neural networks and large language models (LLMs), which show strong potential in their ability to generate fluent text in response to user prompts (as seen in applications like ChatGPT).
A neural network (or neural net) is a computer system inspired by the structures of human and animal brains. They are technically called artificial neural networks (ANNs) to distinguish them from the biological neural networks that make up brains.
A neural network consists of a collection of artificial “neurons” (based on the neurons in biological brains) that send and receive signals to other neurons in the network. This structure allows the network to gradually learn to perform tasks with greater accuracy.
OpenAI is a major AI research and development company. It has released popular tools including ChatGPT and DALL-E, and it developed the generative pre-trained transformer (GPT) technology behind these tools.
The company was founded in 2015 by Ilya Sutskever, Greg Brockman, Sam Altman, and Elon Musk (who is no longer involved). Its current CEO is Sam Altman. OpenAI began as a nonprofit company but transitioned to for-profit in 2019.
A parameter is a variable within an AI system that is used to make predictions. As LLMs become more advanced, they tend to involve more and more parameters. It is believed that GPT-4 contains several hundred billion parameters.
Perplexity is a measurement of how unpredictable (perplexing) a text is. A text with high perplexity is more likely to read unnaturally or be nonsensical than a text with low perplexity. AI writing tools tend to produce text with relatively low perplexity, as this gives them a higher chance of making sense.
Because of this, perplexity is used in combination with burstiness as a way for AI detectors to label writing as AI or human. AI writing tools may include a temperature setting that allows the user to produce text with higher perplexity.
Plagiarism is the act of passing off someone else’s words or ideas as your own. It occurs when you, for example, copy text from another writer and submit it as your own work without citing the source. Plagiarism issues have surrounded the rise of generative AI tools.
Commentators are debating whether the tools themselves can be said to be plagiarizing from their training data (e.g., do AI image generators steal the work of artists?) and whether the unacknowledged use of AI writing tools in academic writing can be considered plagiarism.
Programming is the act of giving a computer instructions in the form of programs, which consist of computer code. This code can be in various programming languages, such as C++. Computer programs often make use of algorithms.
A prompt is the user input that a generative AI model (such as ChatGPT) responds to. It usually consists of text typed in by the user. The process of developing clear prompts that will produce useful results and avoid unwanted outcomes is called prompt engineering.
Reinforcement learning from human feedback (RLHF)
It involves the use of human feedback to establish a “reward model” that then automatically optimizes the system’s responses. It was used in the development of ChatGPT to fine-tune the model’s responses before release.
A robot is a machine that is capable of carrying out actions automatically (sometimes autonomously). Robots usually contain computer systems that are programmed to allow them to carry out their tasks. The study and design of robots is called robotics.
They are often used in industrial contexts such as factories to carry out repetitive labor (thus automating tasks previously done by humans).
With a relatively low temperature, the model is likely to produce predictable but coherent text, giving very similar responses if you repeat the same prompt. With a higher temperature, the model tends to produce text that is less predictable but more likely to contain errors. At very high temperatures, it often produces nonsense.
For tools like ChatGPT, the temperature setting usually defaults to something like 0.7: high enough to give unpredictable responses but unikely to produce nonsense.
The training data of an AI system is the data (e.g., text, images) that was used to train it. Machine learning (ML) usually involves training the system on a large amount of data to recognize patterns and learn to produce desirable responses.
User conversations in tools like ChatGPT may be used as training data for future models.
A token is the basic unit of text processed by a large language model (LLM). It’s often a complete word, but it can also be part of a word, or a punctuation mark.
LLMs produce text by repeatedly predicting what token should come next based on patterns they have learned from their training data. For example, it’s very likely that the next token after the text “I couldn’t get to sleep last …” is “night.”
LLMs don’t simply pick the token with the highest probability every time, though; this tends to produce very repetitive text that doesn’t seem human. Instead, they have a temperature setting that influences how likely they are to pick, for example, the third most likely token in each case. This is why AI writing tools can give a variety of responses to a single prompt.
The Turing test (also called the imitation game) is a test invented by the computer scientist Alan Turing in 1950. It’s designed to test a machine’s ability to exhibit apparently human intelligence (i.e., to “seem like a human”).
The test involves a human judge who engages in text-based conversations with a machine and a human, without knowing which is which. The judge must try to establish which is the machine and which the human. If they cannot reliably distinguish between the two, the machine has passed the test.
“Passing the Turing test” has come to be seen in popular culture as a standard that AI systems should aim to meet, but it’s not accepted as a reliable measure of intelligence by AI researchers themselves and has been criticized in other thought experiments like the Chinese room.
Frequently asked questions about AI tools
- What is information extraction?
Information extraction refers to the process of starting from unstructured sources (e.g., text documents written in ordinary English) and automatically extracting structured information (i.e., data in a clearly defined format that’s easily understood by computers). It’s an important concept in natural language processing (NLP).
For example, you might think of using news articles full of celebrity gossip to automatically create a database of the relationships between the celebrities mentioned (e.g., married, dating, divorced, feuding). You would end up with data in a structured format, something like MarriageBetween(celebrity1,celebrity2,date).
The challenge involves developing systems that can “understand” the text well enough to extract this kind of data from it.
- What is knowledge representation and reasoning?
Knowledge representation and reasoning (KRR) is the study of how to represent information about the world in a form that can be used by a computer system to solve and reason about complex problems. It is an important field of artificial intelligence (AI) research.
An example of a KRR application is a semantic network, a way of grouping words or concepts by how closely related they are and formally defining the relationships between them so that a machine can “understand” language in something like the way people do.
A related concept is information extraction, concerned with how to get structured information from unstructured sources.
- What kind of data are needed for deep learning?
Deep learning requires a large dataset (e.g., images or text) to learn from. The more diverse and representative the data, the better the model will learn to recognize objects or make predictions. Only when the training data is sufficiently varied can the model make accurate predictions or recognize objects from new data.
- How can I use AI writing tools?
AI writing tools can be used to perform a variety of tasks.
These tools can also be used to paraphrase or summarize text or to identify grammar and punctuation mistakes. You can also use Scribbr’s free paraphrasing tool, summarizing tool, and grammar checker, which are designed specifically for these purposes.
- How does ChatGPT work?
ChatGPT is a chatbot based on a large language model (LLM). These models are trained on huge datasets consisting of hundreds of billions of words of text, based on which the model learns to effectively predict natural responses to the prompts you enter.
ChatGPT was also refined through a process called reinforcement learning from human feedback (RLHF), which involves “rewarding” the model for providing useful answers and discouraging inappropriate answers—encouraging it to make fewer mistakes.
Essentially, ChatGPT’s answers are based on predicting the most likely responses to your inputs based on its training data, with a reward system on top of this to incentivize it to give you the most helpful answers possible. It’s a bit like an incredibly advanced version of predictive text. This is also one of ChatGPT’s limitations: because its answers are based on probabilities, they’re not always trustworthy.
Cite this Scribbr article
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.