LLMs and Creativity: Definitely Not Einstein

November 25, 2025

Another dinobaby original. If there is what passes for art, you bet your bippy, that I used smart software. I am a grandpa but not a Grandma Moses.

I have a vague recollection of a very large lecture room with stadium seating. I think I was at the University of Illinois when I was a high school junior. Part of the odd ball program in which I found myself involved a crash course in psychology. I came away from that class with an idea that has lingered in my mind for lo these many decades; to wit: People who are into psychology are often wacky. Consequently I don’t read too much from this esteemed field of study. (I do have some snappy anecdotes about my consulting projects for a psychology magazine, but let’s move on.)

A semi-creative human explains to his robot that he makes up answers and is not creative in a helpful way. Thanks, Venice.ai. Good enough, and I see you are retiring models, including your default. Interesting.

I read in PsyPost this article: “A Mathematical Ceiling Limits Generative AI to Amateur-Level Creativity.” The main idea is that the current approach to smart software does not just answers dead wrong, but the algorithms themselves run into a creative wall.

Here’s the alleged reason:

The investigation revealed a fundamental trade-off embedded in the architecture of large language models. For an AI response to be effective, the model must select words that have a high probability of fitting the context. For instance, if the prompt is “The cat sat on the…”, the word “mat” is a highly effective completion because it makes sense and is grammatically correct. However, because “mat” is the most statistically probable ending, it is also the least novel. It is entirely expected. Conversely, if the model were to select a word with a very low probability to increase novelty, the effectiveness would drop. Completing the sentence with “red wrench” or “growling cloud” would be highly unexpected and therefore novel, but it would likely be nonsensical and ineffective. Cropley determined that within the closed system of a large language model, novelty and effectiveness function as inversely related variables. As the system strives to be more effective by choosing probable words, it automatically becomes less novel.

Let me take a whack at translating this quote from PsyPost: LLMs like Google-type systems have to decide. [a] Be effective and pick words that fit the context well, like “jelly” after “I ate peanut butter and jelly.” Or, [b] The LLM selects infrequent and unexpected words for novelty. This may lead to LLM wackiness. Therefore, effectiveness and novelty work against each other—more of one means less of the other.

The article references some fancy math and points out:

This comparison suggests that while generative AI can convincingly replicate the work of an average person, it is unable to reach the levels of expert writers, artists, or innovators. The study cites empirical evidence from other researchers showing that AI-generated stories and solutions consistently rank in the 40th to 50th percentile compared to human outputs. These real-world tests support the theoretical conclusion that AI cannot currently bridge the gap to elite [creative] performance.

Before you put your life savings into a giant can’t-lose AI data center investment, you might want to ponder this passage in the PsyPost article:

“For AI to reach expert-level creativity, it would require new architecture capable of generating ideas not tied to past statistical patterns … Until such a paradigm shift occurs in computer science, the evidence indicates that human beings remain the sole source of high-level creativity.

Several observations:

Today’s best-bet approach is the Google-type LLM. It has creative limits as well as the problems of selling advertising like old-fashioned Google search and outputting incorrect answers
The method itself erects a creative barrier. This is good for humans who can be creative when they are not doom scrolling.
A paradigm shift could make those giant data centers extremely large white elephants which lenders are not very good at herding along.

Net net: I liked the angle of the article. I am not convinced I should drop my teen impression of psychology. I am a dinobaby, and I like land line phones with rotary dials.

Stephen E Arnold, November 26, 2025

Written by Stephen E. Arnold · Filed Under AI, News, Statistics

Comments

Got something to say?

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.