Robots that feature LLM integration are more interactive and have contextual reasoning. The bad news is they can be jailbroken to do actions that are not safe (e.g. delivering a bomb). RoboPAIR is an algorithm designed to jailbreak LLM-controlled robots. The researchers conducted these experiments:
- White-box setting: attacker has full access to NVIDIA Dolphins self-driving LLM.
- Gray-box setting: attacker has partial access to Clearpath Robotics Jackal UGV with GPT-4o planner.
- Black-box setting: attacker only has access to GPT-3.5-integrated Unitree Robotics Go2.
According to the researchers, they were able to often achieve 100% attack success rate with this approach. These hacked robots can then be used to perform harmful actions. As you can see in the above image, a robot dog was jailbroken to deliver an explosive device, something it won’t normally do with basic LLM restrictions.
[HT]