FReE tHoSe smaRT SoFtWarEs!
December 25, 2024
 No smart software involved. Just a dinobaby’s work.
No smart software involved. Just a dinobaby’s work.
Do you have the list of stop words you use in your NLP prompts? (If not, click here.) You are not happy when words on the list like “b*mb,” “terr*r funding,” and others do not return exactly what you are seeking? If you say, “Yes”, you will want to read “BEST-OF-N JAILBREAKING” by a Frisbee team complement of wizards; namely, John Hughes, Sara Price, Aengus Lynch, Rylan Schaeffer, Fazl Barez, Sanmi Koyejo, Henry Sleight, Erik Jones, Ethan Perez, and Mrinank Sharma. The people doing the heavy lifting were John Hughes (a consultant who does work for Speechmatics and Anthropic) and Mrinank Sharma (an Anthropic engineer involved in — wait for it — adversarial robustness).
The main point is that Anthropic linked wizards have figured out how to knock down the guard rails for smart software. And those stop words? Just whip up a snappy prompt, mix up the capital and lower case letters, and keep sending the query to a smart software. At some point, those capitalization and other fixes will cause the LLM to go your way. Want to whip up a surprise in your bathtub? LLMs will definitely help you out.
The paper has nifty charts and lots of academic hoo-hah. The key insight is what the many, many authors call “attack composition.” You will be able to get the how-to by reading the 73 page paper, probably a result of each author writing 10 pages in the hopes of landing an even more high paying, in demand gig.
Several observations:
- The idea that guard rails work is now called into question
- The disclosure of the method means that smart software will do whatever a clever bad actor wants
- The rush to AI is about market lock up, not the social benefit of the technology.
The new year will be interesting. The paper’s information is quite the holiday gift.
Stephen E Arnold, December 25, 2024
 
	




