Posts

The setup. Thinking LLMs spend extra compute between <think> and </think> tokens before emitting an answer; pass@1 climbs with thinking length. This was a launch headline when OpenAI o1 and DeepSeek R1 shipped; by the time GPT-5.5, Opus 4.7, Gemini 3.5, Qwen 3.6, and Kimi K2.5 arrived, the curve had become table stakes — every frontier model has it, nobody markets it anymore (Snell et al. 2024; OpenAI 2024; DeepSeek 2025; Muennighoff et al. 2025). The mechanistic question — what’s the rate, and what determines it — is less settled. This post adds one angle. ...

dlt-proof-writing: An Agent Skill for LaTeX Proofs in Deep Learning Theory

Reasoning as Optimization: A Rate for Test-Time Scaling