Shapes vs. Words

A "toy" problem shows how LLMs do and don't approach geometry

May 19, 2025

Not to brag, but I was manipulating some physical objects the other day. Specifically, I was playing with Magna-Tiles. For those unfamiliar: Magna-Tiles are plastic tiles with little magnets embedded in the edges. The magnets make the tiles click together easily: great for little kids whose fine motor skills are still developing.

So, I was fiddling with a few unwanted triangles while my kids built grander structures. I took two equilateral triangles and two 45-45-90 right triangles whose legs were the same length as the sides of the equilateral triangles. I fit these four tiles together into an irregular tetrahedron. I first held it looking down at its long side, and I didn’t recognize it. Then I placed it flat on one of the equilateral faces, and it still didn’t look familiar. Then I built another one, and clicked it together with the first one along one of the 45-45-90 triangles. At this point everything became clear.

Here are the first two angles I looked at:

And here’s how two of them look together:

It’s a square pyramid. So, my original shape was half of such a pyramid, cut along the plane that runs through the pyramid’s apex and two of its opposite bottom corners.

This isn’t a hard problem by any stretch. In fact, I think the only reason it wasn’t immediately obvious is because I happened to approach it from an oblique angle. Even so, the Magna-Tiles made it easy to figure out.

Naturally, I wondered how an LLM would fare. So, I came up with a prompt that roughly simulated my initial experience:

Consider a tetrahedron, two of whose faces are equilateral triangles with unit side lengths, and two of whose faces are 45-45-90 right triangles with unit leg lengths. Glue two such tetrahedra together along the right triangle faces. Describe the resulting polyhedron.

I could have just asked something about the tetrahedron, but I did it this way because there is a bit of a trick. Namely, when you glue a pair of tetrahedra (4 faces each, 8 faces total) along a matching face, you almost always have 8-2=6 faces left on the resulting shape. Only rarely do any of those 6 faces merge, but that’s exactly what happens here: the two 45-45-90 triangles that aren’t glued together become a single face—the square base of the pyramid. I thought the fact that the merging was unexpected might trip up the LLMs. To be fair, this was also my own path: the first thing I realized was that the glued-together shape was a square pyramid. Also, practically speaking, this would make the answer easy to check.

I sampled o3, o4-mini-high, and Gemini 2.5 Pro (05-06 update) each 4 times. o3 and o4-mini-high struck out: 0/4 correct. Gemini 2.5 Pro got it once. This is a fun result: these models, or at least Gemini, can get the right answer. But they mostly don’t.

Of the wrong answers, 10 were as expected: they think it’s the generic case with 6 faces, a so-called triangular bipyramid. Here’s a representative example from o3:

The “6 faces” part is the smoking gun that it’s wrong.

The other wrong answer was o3 saying the result was itself a tetrahedron. It is indeed possible to glue two tetrahedra together so that not one but two pairs of faces merge, yielding a third tetrahedron. I thought this might be a case where o3 went overboard with the right idea, but it doesn’t look that way. Here is part of o3’s output, discussing the gluing. It seems to say that the four vertices of one tetrahedron are attached to the four vertices of the other. As far as I can tell, that is nonsense.

Gemini’s wrong answers look different, and I think also suggest why it sometimes gets the right answer. We’ve discussed before how the models only seem to have one strength when it comes to geometry:

The low road involves putting everything in a coordinate system—e.g., the Cartesian plane—and then slogging through a morass of algebra to get the answer. This latter approach is called a “coordinate bash”: you bash the problem with coordinates until it cracks.

Since they don’t seem to have any other strengths, it’s a good idea to use this one!

For whatever reason, Gemini 2.5 Pro is more inclined than the other models to do so in this problem. But apparently that’s not enough! Here’s an excerpt from one of its wrong answers:

Don’t worry about the V/R/T stuff, just focus on the numerical coordinates. It’s correct! The resulting shape can be placed at those coordinates. But the last four coordinates lie on the same plane: x+y=1! The right answer is under its nose, but it can’t see it.

Its correct answer looks pretty similar:

Aha! That last line is the key: it has the good fortune of thinking to check if any faces merge. Sure enough:

It’s an interesting situation: not only does it have to get the gnarly coordinate calculations right, but it has to think to ask the right question. Usually faces don’t merge, but sometimes they do, so you have to check for that sort of thing.

But, checking edge cases is the sort of thing the models generally seem good at. So, to be honest, I don’t really have a good explanation for Gemini’s failure here: it’s already tackling the problem with coordinates, which seems like the big thing it needs to realize to do. Why doesn’t it remember to check for coplanar faces every time? If I pose the question in the somewhat more leading manner “how many faces does the resulting polyhedron have”, it gets the right answer 3/4 times.

At any rate, I wouldn’t chalk this up to a fundamental limitation. It’s got the building blocks it needs, so perhaps some more training will help it figure out how to put them together more reliably.

Still, stepping back, this is a pretty simple geometry problem, for humans. Does that mean the models are extra far behind? Maybe so, and maybe we can even reference Moravec’s paradox by way of explanation: human spatial intuition may well draw from the “sensory and motor portions of the human brain”.

Then again, human spatial intuition has its limits. The famous computer scientist and AI pioneer Geoffrey Hinton has this advice for visualizing higher-dimensional spaces:

To deal with hyper-planes in a 14-dimensional space, visualize a 3-D space and say ‘fourteen’ to yourself very loudly. Everyone does it.

I guess the amazing thing is that this apparently sort of works, at least for some people. Still, it’s probably for the best that we invented coordinates.

Lemmata

Discussion about this post