Great article. But it's crazy to me it can do anything approaching what you've written about here and still fail at judging simple river crossing puzzles (like the ones Colin Fraser posts). I asked it this:
"A baker needs to get across a river with his prize pig, his hound, and five loaves of bread. The boat has enough room for the steerer, a single animal and a single loaf (they're big loaves). The pig will eat any loaf that it's left alone with. The hound will attack the pig if left alone with it. How can he get across the river in the fewest trips with all his possessions fully intact?"
It mapped out every step (the proof, if you will) and said the minimum number of trips was 13, though 11 is clearly possible. It insisted even when prompted to double check.
I'm sure it must get the answer right sometimes, however, the USAMO this is not. My theories on this kind of failure are no different from anyone else's, but I do want to emphasise that it's wacky as hell.
Yeah, the so-called “jagged frontier” is wild. One thing I keep in mind is how it can’t do arbitrarily large arithmetic problems, as it loses the thread so to speak. I think a number of surprising failures are of that same flavor, maybe partly including your river puzzle.
Great article. But it's crazy to me it can do anything approaching what you've written about here and still fail at judging simple river crossing puzzles (like the ones Colin Fraser posts). I asked it this:
"A baker needs to get across a river with his prize pig, his hound, and five loaves of bread. The boat has enough room for the steerer, a single animal and a single loaf (they're big loaves). The pig will eat any loaf that it's left alone with. The hound will attack the pig if left alone with it. How can he get across the river in the fewest trips with all his possessions fully intact?"
It mapped out every step (the proof, if you will) and said the minimum number of trips was 13, though 11 is clearly possible. It insisted even when prompted to double check.
I'm sure it must get the answer right sometimes, however, the USAMO this is not. My theories on this kind of failure are no different from anyone else's, but I do want to emphasise that it's wacky as hell.
Yeah, the so-called “jagged frontier” is wild. One thing I keep in mind is how it can’t do arbitrarily large arithmetic problems, as it loses the thread so to speak. I think a number of surprising failures are of that same flavor, maybe partly including your river puzzle.