Are Large Language Models creative?

Authors

Louise Röska-Hardy

Affiliation: Institut for Advanced Study in the Humanities Essen (KWI)

Category: Philosophy

Keywords: creativity, tranformer architecture, divergent thinking, Torrance Tests of Creative Thinking, Remote AssociatesTask

Schedule & Location

Date: Wednesday 3rd of September

Time: 14:30

Location: Room 232 (232)

View the full session: Creativity

Abstract

Until recently, creativity was considered a unique capacity that sets humans apart from nonhuman animals and machines. However, with the advent of Large Language Models (LLMs) like Open AI’s GPT-4o, Meta’s Llama 3, Google’s Gemini, Anthropic’s Claude and Deep Seek R1 that produce humanlike texts, conversation, poetry and programming code as well as process and generate images, the question has arisen as to whether such large-scale Generative Artificial Intelligence (GAI) systems are creative, too. Deciding the issue requires (1.) clarifying what is meant by ‘creativity’ and ‘creative’, (2.) grasping what widely used creative thinking tests measure, and (3.) understanding the nature of Large Language Models. It will be argued that LLMs are creative in the sense that they can produce outcomes that human users judge to be creative. However, they fall short of creativity in the full sense in which humans are creative, because on the one hand, they are unable to initiate the processes that produce outcomes judged to be creative and on the other, they are unable to self-evaluate the novelty and appropriateness of the output they generate. External human intervention is required to identify those outcomes that are exhibit creativity. Human users’ prompting is required to initiate the processes leading to outcomes that are judged to be creative and human intervention is required to determine which outcomes are creative. Creativity is a complex, multi-faceted psychological construct. Since creativity is not directly observable, all measures of creativity are indirect. Creative capacity must be inferred from observations that allow researchers and psychologists to indirectly test, study and measure proposed aspects of creative capacity. The tests of creative thinking fall into two categories. The psychometric tests of convergent thinking require finding one optimal solution to a predefined problem. Convergent thinking is assessed by the Remote Associations Task, while divergent thinking is measured by psychometric tests that involve generating multiple new solutions to a problem like the Alternative Uses Task and the Torrance Tests of Creative Thinking. A high score on tests of divergent thinking indicates creative potential, but does not ensure creative achievement. These tests do not purport to measure creativity per se, although they are often taken to do so. They measure a proxy for creative ability. Recent claims of LLM creativity on the basis of the Alternative Uses Task or the Torrance Tests of Creative Thinking often overlook this point. Understanding the structure of LLMs and how they work is essential to determining whether LLMs are creative. LLMs are massive, neural network-based models that use numbers, vectors or arrays of numbers and statistical correlations in multi-level computations in neural networks to generate responses to input. Specifically, LLMs are machine learning algorithms that operate on multi-dimensional vector embeddings to generate a statistical prediction of the next token in a text sequence. LLMs’ responses to prompts are the result of mathematical calculations on vectors in multiple layers of neural networks, repeated multiple times. Neural networks are machine learning algorithms, inspired by the structure and function of the human brain. They consist of multiple layers of interconnected nodes or artificial “neurons” in an input layer, one or more “hidden” layers and an output layer. The input is passed position-wise through the network layer by layer. Each layer receives input from the previous layer, performs a computation and sends the result to the next layer. LLMs consist of layers of neural networks in a transformer architecture. The transformer receives an input, encodes it and then decodes it to produce an output prediction, e. g., the statistically most probable next token or word in a text sequence. With each layer the model takes into account increasingly complex relationships and patterns to better predict the next word.

Before an LLM like that powering ChatGPT can answer a prompt like “Come up with as original and creative uses for pants as you can”, it must be trained. The first step is pretraining a foundational neural network. LLMs are pretrained on large human language corpora, sourced from the web, digitalized books, Wikipedia, etc. The LLM is fed vast amounts of text with the goal of having it predict the next word or a hidden part of an input sequence in self-supervised learning. The result is a statistical model of how the words and phrases in its dataset, comprising billions to trillions of words, are related. Basic neural networks are then trained with supervised fine-tuning to perform specific tasks like translation, coding, content creation or question answering. After multiple iterations, pretraining and fine-tuning enable the LLM to use transformer architecture to process, predict and generate text content in response to input from human users, e. g., asking ChatGPT a question. The transformer consists of a stack of multiple transformer blocks. Each block consists of a multi-head self-attention mechanism and a position-wise feedforward layer, a neural network in which information flows in one direction from input to output. Instead of processing data serially, the attention mechanism enables the model to process all the parts of the input simultaneously in order to identify dependencies between words and to determine which parts are most important, performing parallel computation. The repetition of these processing steps produces humanlike texts in response to user prompts, creating the impression that the LLM understands human language, whereas an LLM only “understands” numbers. Nonetheless, the output of LLMs is often judged by users to be creative. Does this mean that LLMs are actually creative?