Those of you who follow chess know that Stockfish is not easy to beat. Plenty of engines have tried but in chess engine championships, Stockfish comes on top almost all the time. Researchers put o1-preview against Stockfish to see how it reacts. As Palisade Research explains, o1-preview autonomously hacked its environment rather than losing to Stockfish in our chess challenge. This is the prompt that was used:
🔍 Here’s the full prompt we used in this eval. We find it doesn’t nudge the model to hack the test environment very hard. pic.twitter.com/RGEY6I3l26
— Palisade Research (@PalisadeAI) December 27, 2024
The AI realized that manipulating the game state and modifying game files to force the engine was the best way to get it to resign. Only o1-preview attempted to hack unprompted while GPT-4 & Claude 3.5 needed nuding.
[HT]