I did a deep dive into DeepSeek R1's technical/strategic decisions....nothing they did was completely unknown to the rest of the industry, they were just the first to adopt some things from experiments to release, like using 8-bit for training and inference. And they didn't spend only $5M on the model, the salaries of their top developers are multiples of that. The biggest difference between DeepSeek and Bytedance? You're expected to work 24/7 at both, but DeepSeek offers US$2M salaries to top talent. That's rumoured to be how much they gave one of their developers, Luo Fuli, to jump from Alibaba. The drop in NVidia stock was more an excuse by investors to take money off the table after an 11X run since 2022.
+1 on the fact that the efficiency gains were known. Amodei writes about that. But knowing about research and implementing it in a good package is very different (eg. Google did trasnformer paper but didn't make ChatGPT first).
Re:Nvidia sell-off, if investors were looking for excuses to sell. Why that exact weekend? Why was the $600B wipeoff on that Monday and not any other trading day since the V3 paper was released?
Why does my dog chase his tail? Who knows why things happen in the capital markets? LOL All I can say is that NVDA has become as divisive as BTC over the past year....you're either a true believer and laughed for it, or you're a skeptic and laughed for it, there's no middle ground. Thanks for a great column.
Working on AI and NNet research in the early 90s - efficient use of HW and compute was 90% of the effort. Nice to see it still matters. I'm jealous of the compute resources of today, but retirement is still pretty sweet. Maybe when the grandkids are a bit older I can make a comeback?
I bought my old computer with a programable graphic chip and a second port, but I never got around to it. I moved on to song writing and have considered starting a blog here.
Satpost is an efficient aggregator of links.
Satpost never disappoints but this was especially Informative and entertaining. Didn’t think I’d laugh that hard about AI
Thanks for the read, Phil! It was an epic weekend haha.
Re: the dopamine receptors. Had to just lock the phone away. Kept doomscrolling. I don't have any social on the laptop, so that helps.
I did a deep dive into DeepSeek R1's technical/strategic decisions....nothing they did was completely unknown to the rest of the industry, they were just the first to adopt some things from experiments to release, like using 8-bit for training and inference. And they didn't spend only $5M on the model, the salaries of their top developers are multiples of that. The biggest difference between DeepSeek and Bytedance? You're expected to work 24/7 at both, but DeepSeek offers US$2M salaries to top talent. That's rumoured to be how much they gave one of their developers, Luo Fuli, to jump from Alibaba. The drop in NVidia stock was more an excuse by investors to take money off the table after an 11X run since 2022.
Thanks for the read Joseph.
+1 on the fact that the efficiency gains were known. Amodei writes about that. But knowing about research and implementing it in a good package is very different (eg. Google did trasnformer paper but didn't make ChatGPT first).
Re:Nvidia sell-off, if investors were looking for excuses to sell. Why that exact weekend? Why was the $600B wipeoff on that Monday and not any other trading day since the V3 paper was released?
Thanks again for the read!
Why does my dog chase his tail? Who knows why things happen in the capital markets? LOL All I can say is that NVDA has become as divisive as BTC over the past year....you're either a true believer and laughed for it, or you're a skeptic and laughed for it, there's no middle ground. Thanks for a great column.
That's a very fair point. Literally could find correlation with the snowdrop levels if we looked hard enough hahaha.
Going to be a wild 2025!
Post request: how do you scroll until your dopamine receptors “get taken to the woodshed” and then manage to repeatedly produce such solid pieces?
This was phenomenal. Hands down the most helpful and engaging analysis I've read of DeepSeek and its impact! Answered so many questions.
Thanks for the read, Troy! It felt pretty monumental so I wanted to just crank this long piece out...happy its useful for people to read.
Working on AI and NNet research in the early 90s - efficient use of HW and compute was 90% of the effort. Nice to see it still matters. I'm jealous of the compute resources of today, but retirement is still pretty sweet. Maybe when the grandkids are a bit older I can make a comeback?
Thanks for the read, Roger. That's a fantastic insight and makes a lot of sense. You should dabble with some Nvidia chps!
I bought my old computer with a programable graphic chip and a second port, but I never got around to it. I moved on to song writing and have considered starting a blog here.
Such a great post. Thank you for making me feel like I understand AI more… even if I’m not sure I really understand.
Hi Trung, please check your inbox. Thanks.