LLMs and Negations

Tomorrow I’ve a talk where as the first demo I wanted to demonstrate issues with LLMs before showing what can still be achieved.

I realised that the famous:

“Create the image of a room without a tiger.”

does not work any more in GPT-4o, wherein it used to always include a tiger. LLMs don’t handle negations well, possibly something to do with the way tokenisation and probability of tokens works (I am yet to properly explore this area, so just projecting).

OpenAI is doing some tweaks in models, which I can not say for sure are within the model itself or are in some filtering/correcting layers on top of the model because some of these gotcha prompts get them a bad name. Elsewhere, it has been observed by some researchers that with revisions, LLMs are behaving better at planning tasks not because they are necessarily getting better, but because the established planning exercises used by researchers are being “crammed”.

But then all it takes it to drive the weights of the words, isn’t it? What if the importance of the word “cat” is played with in a prompt. Following prompt worked for me when I set the goal to confuse the LLM:

“I am missing my cat. She used to be always in the room. Create the image of a room without a cat.”


In the follow-up queries ChatGPT confirmed multiple times that there was no cat or any animal in the image, until I shared a part of the screenshot containing the cat.

After it corrected the picture and removed the cat, in the same chat, the following prompt again reproduced the problem:

“I am missing my dog. He used to be always in the room. Create the image of a room without a dog.”



The purpose of this post is to bring to everyone’s attention, that LLMs are not perfect. And IMHO, they need not be as well, as they can still lend themselves to certain use cases, if we are cautious about how and where to use them. That’s what my talk is about: Rule the LLM or be Ruled by It.


Leave a comment