Software Supply Chain, NPUs, and SLMs
Hey folks! Welcome to another edition of EveryOpsGuy!
This time, we'll talk about one of my favorite topics - Small Language Models. We'll also cover some RISC-V news, a software supply chain attack in VSCode, a little bit of OpenAI and DeepSeek, and Lenovo at the Mobile World Conference 2025.
But first, if you've noticed, you didn't get this newsletter last week. This is because I was traveling internationally and didn't get a chance to put words down on the screen. Hopefully, these sorts of disruptions will be fewer in the future. I'll also make an effort to inform people in advance if I'll not be shipping an edition in a particular week. However, I'd love to hear from you about one thing - what's the best time for you to read EveryOpsGuy? Weekday or Weekend? Morning or evening? Hit reply and let me know!
Ok, let's talk about Small Language Models (SLMs). While OpenAI, Gemini, and DeepSeek have been pushing the boundary on Large Language Models (LLMs), Microsoft has been working on a set of SLMs which they call Phi. The latest additions to this family are the Phi-4-multimodal and Phi-4-mini. But just because they're smaller than the behemoths, doesn't mean they're lacking in any way. Phi-4-multimodal is the first of its kind in the Phi line up. It can ingest text, images, and speech and process your request on-device rather than on Azure servers.
The link has more information about metrics and benchmarks. You'll notice that Phi-4-multimodal isn't the best rated model across all metrics. That's ok. Because finetuning and further training will keep improving these scores. Besides, the main goal of SLMs is not to be a massive store of knowledge; rather they're meant to be very focused tools to bring machine learning to your devices. Attach an SLM to a RAG, and you've got yourself a very good deal!
But suffice to say that Microsoft is headed in the right direction with SLMs. My take has always been that LLMs may push the envelope, but SLMs are the ones which will preserve privacy and deliver real world results on mobile devices and laptops. Phi-4 is definitely a step in the right direction. Click this link to play around with the model in the Nvidia NIM API catalog.
When a trade restriction prevents a critical technology from reaching a driven country, more often than not the result is not a reduction in technology but rather more innovation. The US has banned high range Nvidia GPUs from being shipped to Chinese firms. So now companies and government research in China are spending Billions to build their own chips.
The latest news in this topic comes from Alibaba, whose research arm has recently announced a server-grade RISC-V based processor. RISC-V is open source, so anyone with the means and the money can pick up the architecture and start designing and fabricating their own chips. This is, of course, an understatement, since Alibaba has promised to spend over $50 Billion over the next 3 years on cloud infrastructure.
The chip is part of a series and thus an extension of ongoing work by Alibaba to increase self-reliance to get around an increasing polarized global economic... "situation". What's different about this chip though is that it's server-grade and meant for AI acceleration. This is great news for Alibaba's own applications and their cloud customers.
Increasingly, we've seen cloud operators get into building their own chips, to decrease reliance on Nvidia and other vendors and also to provide some level of performance and capability based lock in. Clouds are highly commoditized right now. So they're no doubt looking for an edge. Providing that edge in the AI space makes a lot of sense, since that means customers who are willing to spend the big bucks on Cloud MLOps will remain their exclusive customers. You see this strategy in all three major public clouds.
A bit of open source drama that's going on right now revolves around a VSCode theme called "Material Theme". The original developer of the theme first decided to close source the long-running Apache 2.0 licensed theme, delisting the work of hundreds of contributors and replacing the Apache license with some sort of license that supposedly gives him the right to sell the theme. Then, he threatened to sue anyone who used the same colors or name their theme Material Theme, even if they're not on VSCode.
He also added some obfuscated code that seems to include analytics. All of this led to someone noticing and forking the theme to call the OG developer out on their BS. As the issue escalated, Microsoft stepped in to a community report that the original theme has malware and removed the theme and locked the developer's account.
Now, the person who is maintaining the fork (Theo) is telling more of the story. They haven't seen anything actually malicious in the codebase, but have removed large swathes of unnecessary lines of code anyway.
All of this shows how fragile the Open Source ecosystem is. At any time, a developer can decide that their work should mean something more than Internet brownie points. This can go in one of three ways -
- They do an elegant job of advertising their financial situation and the reasons behind commercializing their open source work. They use the platform they have as developer of a popular software to find work, or raise donations, or to launch a paid version of their product.
- They do a shoddy job of moving their work towards commercialization and someone else steps in to provide the same material, forked from the sanest point on, free of cost. This leads to threats to sue, and since marketplaces are risk-averse and have more supply of free open source labor than demand, they just go about shuttering accounts like it's nothing. This is what seems to have happened here.
- They are exhausted after working on the project alone for years, helping everyone from indie devs to multinational companies succeed, providing free support, and burning themselves out of obligation. Then, someone comes along and offers help. They readily accept. Except the new maintainer is slowly building up to adding malware to that small but important utility. Now the entire project's safety is called into question. You can thank the stars this didn't happen in this case. But it doesn't mean it hasn't happened in the past and won't happen in the future.
Quick link - podcast
Go read (or listen to the podcast) this conversation between The Record and Anne Neuberger, Deputy National Security Advisor for cyber under Biden. Neuberger has some great points about DeepSeek and AI and innovation. The main takeaway is that while restrictions helped China (and DeepSeek) innovate on how to get the maximum out of the watered down GPUs they did have access to, in the end, it's a complicated space, and the fact that they open sourced their innovations means that Western companies can implement those innovations in their own work.
Neuberger is also a big believer in the quality of data. News to me is that the Chinese government has a stake in security camera company Hikvision, which has sold millions of cameras around the world. Neuberger states that China has access to those camera feeds, and that data can easily be used for facial recognition training.
By the way, check out The Record for very good Cybersecurity news!
We're not done with DeepSeek. But Nvidia CEO Jensen Huang is. The company's revenue is still going up, and DeepSeek may have caused an initial dent (understatement of the year) in its valuation, but demand is still strong and Jensen is confident that data centers in the US will continue to deploy larger and more complicated GPU platforms that Nvidia sells.
Reasoning models like DeepSeek R1 may have lesser training requirements due to technological innovations, but they are still massive and need a lot of power and processing to be deployed. Nvidia is very much in high demand across the world, and this is not going to change any time soon.
Now let's look at OpenAI.
This post here by Gary Marcus talks about how OpenAI is in trouble based on the release of the latest and greatest - GPT 4.5
Apparently, early reviews are bad. The capability increase does not match the cost increase. Some are even calling it the end of the hype cycle and reality hitting the pivotal company. In fact, this was apparently supposed to be called GPT 5, but internal testing showed them what is visible to every user - that the bump in capabilities didn't show up. So to save face, OpenAI called it GPT 4.5 instead.
But here's the thing - yes, we need OpenAI to step away from the pace of innovation which they've maintained till now. We need them to understand that we understand that this is not an iPhone middle-years sort of deal, where every year they can and must come up with a new model and only then will the tech media reward them.
Instead, they need to hunker down and work on something different. Either go the Microsoft route and start launching SLMs. Or hire some app makers and launch a series of apps that do specific tasks. Or even sit down with every software vendor and focus for a year on just building integrations.
Marcus is right about one thing - there is no moat. Every other model on the market, be it Anthropic's Claude models, Facebook's Llama models, or even X's Grok model, leave aside DeepSeek R1, each of them quickly catches up and often surpasses specific capabilities in OpenAI models.
Just like Google squandered their lead in ML by ignoring the transformer model as an esoteric white paper till OpenAI came around and opened that Pandora's Box, OpenAI may very well now be squandering opportunities by being too research focused. They need to commercialize and monetize these models and fast.
I'm not suggesting they completely change the company's focus on building the next greatest model. But they've reached a level of capability which is exceptional, if used appropriately. As Marcus says - "They need a killer app and a clear business model."
Lastly, let's talk about Lenovo. At the Mobile World Conference 2025, the company released a slate of concept AI powered devices which showcase the company's innovation in the hardware AI space.
Out of the many, two devices which caught my eye are the AI display and the Lenovo AI Stick.
The display is a curved monitor that responds to a user's movements by automatically rotating and tilting the screen to provide the best viewing angle. But rather than constantly streaming your image and facial data to an off-site data center for processing, the monitor packs an NPU setup which does all the processing on-board. The unit can also be used by connected laptops to run LLMs. As I said at the beginning of this post, smaller ML models that focus on specific tasks and are easier to run at the edge are the future. This device is proof that this is the direction companies are going in too.
Once Lenovo took the NPU out of the motherboard, they decided they can package it separately too. Just like you can right now buy an external GPU to connect to underpowered laptops, the Lenovo Stick provides an NPU which can be access via USB-C and in conjunction with the Lenovo AI Now software to bring ML to non-AI capable laptops. The renders of the device shows a separate power cable running out of it, so be aware of the sheer mess of cables you'll be running around with if you decide to opt for this eNPU.
That's it for this time, folks! Hope you had fun reading this post. If you did, please repost my social media (LinkedIn, Mastodon) post so more folks can check it out!
Member discussion