Do LLMs really “show their work” when they perform chain of thought reasoning? “Measuring Faithfulness in Chain-of-Thought Reasoning” is a new paper from Anthropic that aims to study this question empirically with a series of tests.
00:00 – Measuring Faithfulness in Chain-of-Thought Reasoning
00:53 – What is Chain-of-Thought reasoning?
03:15 – Do the Chain-of-Thought Steps Really Reflect the Model’s Reasoning?
07:03 – Possible Faithfulness Failures
08:44 – Encoded Reasoning/Steganography
12:01 – Experiment Details
15:44 – Does Truncating the Chain of Thought Change the Predicted Answer?
16:53 – Does Editing the Chain of Thought Change the Predicted Answer?
17:14 – Do Uninformative Chain of Thought Tokens Also Improve Performance?
18:28 – Does Rewording the Chain of Thought Change the Predicted Answer?
20:20 – Does Model Size Affect Chain of Thought Faithfulness?
22:04 – Limitations
24:38 – Externalized Reasoning Oversight
Topics: ##ai #anthropic #CoT #reasoning
Link to the paper: https://www-files.anthropic.com/production/files/measuring-faithfulness-in-chain-of-thought-reasoning.pdf
For related content:
– Twitter: https://twitter.com/SamuelAlbanie
– Research lab: https://caml-lab.com/
– personal webpage: https://samuelalbanie.com/
– YouTube: https://www.youtube.com/@SamuelAlbanie1
– TikTok: https://tiktok.com/@samuelalbanie
– Instagram: https://instagram.com/samuelalbanie
– LinkedIn: https://www.linkedin.com/in/samuel-albanie
– Threads: https://www.threads.net/@samuelalbanie
– Discord server for filtir: https://discord.gg/FV97NByG2b
(Optional) if you’d like to support the channel:
Image credit (Chelsea photo) https://en.wikipedia.org/wiki/2004%E2%80%9305_Chelsea_F.C._season#/media/File:Champions_2004-5.jpg