Discussion about this post

User's avatar
Arjun Ramani's avatar

nice post! I shared it with metrics folks as: "omitted variable bias in the test-time compute scaling law"

Expand full comment
phil's avatar

Hi guys. About the o1 test-time scaling chart: are you sure you're not mixing this up with the deepseek-R1 response length scaling chart?

IIRC, the meaning of the RHS of the o1 chart is just what it looks like: they set o1 to run for longer (by some crude mechanism like adjusting the probability of the stop token). This is presumably the same mechanism behind the "medium" and "high" versions available on the chatgpt website.

Expand full comment

No posts