It's interesting how accurately you've definately captured the insidious way tech metrics redefine our internal value systems. I couldn't agree more with the premise that external quantification inherently alters our perception of self-worth and genuine accomplishment.
Great post—thanks. Demis Hassabis talked on the Lex Friedman pod a couple of years ago about the value of using games during AI research, for exactly the reasons you cited—clear metrics of progress in a well-defined container (vid. Go, chess, StarCraft II,…). The end goal is ASI; the way to get there is to learn on tasks you can quantify. And in a recent Dwarkesh (I still can’t believe that’s the dudes actual name) podcast, Ilya Sutskever talks about how doing explicit RL training to crush eval scores may be why models keep getting better at evals while not getting more “useful” generally. Goodhart’s law, meet AI development… And plead don’t stop posting borderline cringey stuff for likes. There’s almost nothing that doesn’t merit some bit of humor :-)
"how doing explicit RL training to crush eval scores may be why models keep getting better at evals while not getting more “useful” generally"...this is great connection to GenAI wave.
Nguyen's only AI reference in The Score is about how some of the labs are building their video models on popular Netflix and YouTube content...which are optimized for engagement...so that's getting embedded into the video tools.
And, fair enough, the cringe will continue hahaaha
A long time reader, I really liked this post - more personal and thoughtful, and thought provoking. I really like your usual style of post as well - I hope you’ll throw a few of these in occasionally too!
Great post - thanks for sharing. Lots to digest. Exposing the metrics that quietly shape our day is fascinating.
Thanks for the read, Alex! Highly reccomend C. Thi's book!
It's interesting how accurately you've definately captured the insidious way tech metrics redefine our internal value systems. I couldn't agree more with the premise that external quantification inherently alters our perception of self-worth and genuine accomplishment.
Appreciate the read, RR!
"The Score" sounds really interesting. Thanks for bringing it to my attention. I personally derive tremendous aesthetic pleasure from metrics.
Definitely check that out and the linked Twitter paper he wrote.
Great post—thanks. Demis Hassabis talked on the Lex Friedman pod a couple of years ago about the value of using games during AI research, for exactly the reasons you cited—clear metrics of progress in a well-defined container (vid. Go, chess, StarCraft II,…). The end goal is ASI; the way to get there is to learn on tasks you can quantify. And in a recent Dwarkesh (I still can’t believe that’s the dudes actual name) podcast, Ilya Sutskever talks about how doing explicit RL training to crush eval scores may be why models keep getting better at evals while not getting more “useful” generally. Goodhart’s law, meet AI development… And plead don’t stop posting borderline cringey stuff for likes. There’s almost nothing that doesn’t merit some bit of humor :-)
Thanks for the read, Eric!
"how doing explicit RL training to crush eval scores may be why models keep getting better at evals while not getting more “useful” generally"...this is great connection to GenAI wave.
Nguyen's only AI reference in The Score is about how some of the labs are building their video models on popular Netflix and YouTube content...which are optimized for engagement...so that's getting embedded into the video tools.
And, fair enough, the cringe will continue hahaaha
A long time reader, I really liked this post - more personal and thoughtful, and thought provoking. I really like your usual style of post as well - I hope you’ll throw a few of these in occasionally too!