It's interesting how accurately you've definately captured the insidious way tech metrics redefine our internal value systems. I couldn't agree more with the premise that external quantification inherently alters our perception of self-worth and genuine accomplishment.
Great post—thanks. Demis Hassabis talked on the Lex Friedman pod a couple of years ago about the value of using games during AI research, for exactly the reasons you cited—clear metrics of progress in a well-defined container (vid. Go, chess, StarCraft II,…). The end goal is ASI; the way to get there is to learn on tasks you can quantify. And in a recent Dwarkesh (I still can’t believe that’s the dudes actual name) podcast, Ilya Sutskever talks about how doing explicit RL training to crush eval scores may be why models keep getting better at evals while not getting more “useful” generally. Goodhart’s law, meet AI development… And plead don’t stop posting borderline cringey stuff for likes. There’s almost nothing that doesn’t merit some bit of humor :-)
"how doing explicit RL training to crush eval scores may be why models keep getting better at evals while not getting more “useful” generally"...this is great connection to GenAI wave.
Nguyen's only AI reference in The Score is about how some of the labs are building their video models on popular Netflix and YouTube content...which are optimized for engagement...so that's getting embedded into the video tools.
And, fair enough, the cringe will continue hahaaha
It's interesting how accurately you've definately captured the insidious way tech metrics redefine our internal value systems. I couldn't agree more with the premise that external quantification inherently alters our perception of self-worth and genuine accomplishment.
Appreciate the read, RR!
"The Score" sounds really interesting. Thanks for bringing it to my attention. I personally derive tremendous aesthetic pleasure from metrics.
Definitely check that out and the linked Twitter paper he wrote.
Great post—thanks. Demis Hassabis talked on the Lex Friedman pod a couple of years ago about the value of using games during AI research, for exactly the reasons you cited—clear metrics of progress in a well-defined container (vid. Go, chess, StarCraft II,…). The end goal is ASI; the way to get there is to learn on tasks you can quantify. And in a recent Dwarkesh (I still can’t believe that’s the dudes actual name) podcast, Ilya Sutskever talks about how doing explicit RL training to crush eval scores may be why models keep getting better at evals while not getting more “useful” generally. Goodhart’s law, meet AI development… And plead don’t stop posting borderline cringey stuff for likes. There’s almost nothing that doesn’t merit some bit of humor :-)
Thanks for the read, Eric!
"how doing explicit RL training to crush eval scores may be why models keep getting better at evals while not getting more “useful” generally"...this is great connection to GenAI wave.
Nguyen's only AI reference in The Score is about how some of the labs are building their video models on popular Netflix and YouTube content...which are optimized for engagement...so that's getting embedded into the video tools.
And, fair enough, the cringe will continue hahaaha