AI不是更快的老软件:从Karpathy的Sequoia演讲看范式拐点
当顶级AI研究者Karpathy在Sequoia Ascent 2026上直言“LLM的意义远超加速编码”,当独立开发者靠AI驱动的纠纷回应系统第一次在Stripe上赢得争议,当开源社区推出专为“真实工作”设计的万亿参数模型——这些信号共同指向一个事实:我们正在经历的不是工具升级,而是软件本体论的迁移。
核心观点:LLM最大的价值不在于加速现有软件流程,而在于催生了传统代码无法实现的全新功能类型——这要求我们重新理解什么才是真正的“AI Native”。
大语言模型到底改变了什么?如果你去问一位普通的技术从业者,最可能得到的回答是“帮程序员写代码更快了”“让数据分析更高效了”。这种回答并非错误,但它极其危险地窄化了我们对这场变革的理解。安德烈·卡帕斯(Andrej Karpathy)在Sequoia Ascent 2026炉边谈话中提出了一个鲜明的观点:LLM不只加速旧事物,它们正在创造全新类型的功能——那些在古典软件范式下要么本不该存在、要么根本不可能实现的功能。这不是技术的渐进升级,而是计算范式的根本跃迁。
卡帕斯列举了三个例证:第一个是“menugen”,一个完全不需要传统代码的应用程序,输入一张图片,输出一张图片,LLM原生地完成这个任务;第二个是用“安装.md技能”替代“安装.sh脚本”,用一段自然语言描述安装过程,让LLM理解你的系统环境、自动调试、智能适配;第三个是基于LLM的知识库系统,它处理的是来自任意来源、任意格式的非结构化数据——在传统软件中,这类任务因为缺乏确定性的算法路径而几乎不可能完成。这三个例子的共同点是什么?它们都不是在“用AI让老东西跑得更快”,而是在创造“没有AI根本做不了的事情”。
这看似是一个技术细节的讨论,实则触及了一个更深层的问题:我们如何定义“软件”?在冯·诺依曼架构下,软件始终是一组确定的、可执行的指令序列。无论采用什么编程范式,最终都是把人的意图翻译成机器可执行的步骤。但LLM带来的变化是革命性的:你把意图用自然语言描述,模型就自己去理解上下文、推断目标、生成执行路径——它不再是“执行指令”,而是“理解意图”。这不是效率的提升,而是抽象层级的跃迁。
反对者会说:这不过是自然语言编程的又一种包装,代码生成工具已经存在多年。但问题在于,传统代码生成工具(包括早期的GPT辅助编码)本质上仍是模板匹配——你给一段描述,它吐出一段代码,然后人再去调优、编译、测试。而卡帕斯所说的“menugen”和“.md技能”完全不同:它们不再以最终产出“代码”为目标,而是直接以“完成功能”为输出。代码消失了,不是被隐藏了,而是被消解了。这就像在工业革命中,你不再需要建造一个更好的织布机,而是直接用蒸汽机把棉花纺成布——生产线本身被重新定义了。
独立开发者Pieter Levels最近分享了一个生动的案例:他用“vibe coding”的方式写了一个Stripe纠纷自动回应系统,结果十年来第一次在争议中获胜,赢回了1200美元。这个系统的核心逻辑并不复杂——收到Stripe的webhook后,自动收集用户行为数据、生成详细的证据PDF——但关键在于,这种“针对具体用户行为生成定制化证据”的需求,在传统软件范式下几乎不可能低成本实现。你需要为每种纠纷类型写不同的逻辑分支,要考虑各种边缘情况,最后很可能得到一个比人工处理还慢的系统。但Levels用的方法很简单:让AI理解不同用户的活动模式,然后“描述”给银行。结果是他赢得了一场原本必输的纠纷。
这里存在一个深刻的悖论:LLM最强大的能力——理解上下文、生成自然语言、进行非确定性推理——恰恰是古典软件最不擅长的事情。反过来,古典软件最擅长的确定性计算、精确控制、可验证性,却是LLM的弱项。卡帕斯用“锯齿形”来形容这种能力分布:同一个模型可以重构10万行代码库,同时告诉你“走到洗车店去洗车”。这种看似矛盾的表现,根源在于训练数据的分布和强化学习回路的覆盖范围——那些有足够商业价值的领域(如代码生成、客服对话),模型被训练得极其强大;而那些偏离主流数据分布的边缘任务,模型的表现就像在丛林中挥舞砍刀。
这意味着什么?意味着我们正在进入一个混合计算的时代。古典软件没有消亡,也不应该消亡——它们负责那些需要确定性、可验证、低成本的运算;而LLM负责那些需要理解上下文、处理非结构化信息、生成自然语言的环节。两者不是替代关系,而是互补关系,甚至正在形成一种新的计算生态:一个任务可以同时跨越1.0(传统逻辑)、2.0(机器学习和统计)、3.0(LLM推理)三种计算范式。卡帕斯在演讲中提到的“代理原生经济”,本质上就是这个混合生态的商业化版本。
但问题在于,当前绝大多数的“AI应用”仍然停留在1.0思维里。它们把LLM当作一个API调用,塞进传统的软件架构中,然后用原有的指标去衡量价值。比如用“生成的代码行数”来衡量编程助手的效果,用“减少的客服人数”来衡量聊天机器人的价值。这种做法并没有错,但它遮蔽了真正的新机会。那些“没有AI做不了的事情”——比如从任意来源的知识库中提取洞察、根据上下文动态生成用户界面、用自然语言代替脚本进行系统管理——才是范式迁移的主战场。
有人会问:这些新功能真的能产生商业价值吗?Levels的纠纷系统给出了肯定的答案。但更深层的问题是:我们如何识别和规模化这些“LLM独有”的功能?卡帕斯给出的线索是“数据分布”——那些已经有大量训练数据覆盖、有明确商业回报的领域,模型会表现得最好。这意味着第一批大规模落地的“LLM原生应用”,很可能出现在客服、内容生成、代码辅助等已经有成熟商业模式的领域——但真正颠覆性的应用,可能来自那些传统上因为“太难处理非结构化数据”而被忽视的需求。
这也是我对MiMo V2.5这类开源模型感到兴奋的原因。一个万亿参数的开源模型,如果它真的“为实际工作而设计”,而不是仅仅在基准测试上刷分,那就可以让更多开发者绕过商业API的限制,去尝试那些被大公司忽略的边缘需求。开源社区的力量在于,它允许失败——那些在商业公司看来不值得投入的“小场景”,在开源社区里可能变成创新的温床。当然,这也意味着我们需要重新审视AI安全的问题:当模型的能力越来越强,而部署的门槛越来越低,如何确保这些“新功能”不会被滥用?
回到最初的问题:LLM改变了什么?它没有改变计算的基本物理——半导体依然遵循摩尔定律,网络的延迟依然以毫秒计。但它改变了我们与计算交互的方式,改变了“软件”这个概念本身的内涵。软件不再是一组确定的指令,而是一个理解意图、生成解决方案的动态过程。这个转变的深远影响,可能要到十年后才能完全显现——但今天,从卡帕斯的演讲到Levels的实践,从开源模型的发布到开发者社区的讨论,我们已经能够看到这个新范式的轮廓。
这不是技术的进步,这是范式的跃迁。而那些只把AI当作“加速器”的人,正在错过真正的未来。
参考来源
- Fireside chat at Sequoia Ascent 2026 from a ~week ago. Some highlights:
- The first theme I tried to push on is that LLMs are about a lot more than just speeding up what existed before (e.g. coding). Three examples of new horizons:
- 1. menugen: an app that can be fully engulfed by LLMs, with no classical code needed: input an image, output an image and an LLM can natively do the thing.
- 2. install .md skills instead of install .sh scripts. Why create a complex Software 1.0 bash script for e.g. installing a piece of software if you can write the installation out in words and say "just show this to your LLM". The LLM is an advanced interpreter of English and can intelligently target installation to your setup, debug everything inline, etc.
- 3. LLM knowledge bases as an example of something that was *impossible* with classical code because it's computation over unstructured data (knowledge) from arbitrary sources and in arbitrary formats, including simply text articles etc.
- I pushed on these because in every new paradigm change, the obvious things are always in the realm of speeding up or somehow improving what existed, but here we have examples of functionality that either suddenly perhaps shouldn't even exist (1,2), or was fundamentally not possible before (3).
- The second (ongoing) theme is trying to explain the pattern of jaggedness in LLMs. How it can be true that a single artifact will simultaneously 1) coherently refactor a 100,000-line code base *and* 2) tell you to walk to the car wash to wash your car. I previously wrote about the source of this as having to do with verifiability of a domain, here I expand on this as having to also do with economics because revenue/TAM dictates what the frontier labs choose to package into training data distributions during RL. You're either in the data distribution (on the rails of the RL circuits) and flying or you're off-roading in the jungle with a machete, in relative terms. Still not 100% satisfied with this, but it's an ongoing struggle to build an accurate model of LLM capabilities if you wish to practically take advantage of their power while avoiding their pitfalls, which brings me to...
- Last theme is the agent-native economy. The decomposition of products and services into sensors, actuators and logic (split up across all of 1.0/2.0/3.0 computing paradigms), how we can make information maximally legible to LLMs, some words on the quickly emerging agentic engineering and its skill set, related hiring practices, etc., possibly even hints/dreams of fully neural computing handling the vast majority of computation with some help from (classical) CPU coprocessors. - https://nitter.net/karpathy/status/2049903821095354523#m
- 🏆 For the first time in a decade on @Stripe I've started winning disputes with my vibe coded dispute responder
- I used to ignore disputes so I almost always lost them, now I've started winning, this one is the first big dispute for $1,199 USD!
- Whenever a dispute comes in, my site gets a webhook notice from Stripe, it then starts collecting evidence and generates a PDF with entire user's details, when they signed up, and most importantly what they did in the app
- In this case the user used the app for months, generated thousands of photos then tried to get the money back from their bank
- The evidence has to be REALLY detailed, and REALLY good, which is why it's perfect to vibe code it, you can get quite detailed with different types of users and activity on your app, and put that all in the PDF
- I'm shocked because I again I never would win disputes before
- People in US especially abuse the [ chargeback ] or [ dispute ] en masse, unlike the rest of the world, it's easily built into their banking app next to every transaction, so it's one tap to get free stuff. And why not? You get free stuff!
- It's destructive for business owners like me on many levels, if I get over 1% disputes on my account, I risk getting shutdown permanently by Stripe, Visa and MasterCard, like permanently for life, not just my business but on my personal name too, it's ruthless
- Disputes are also super expensive for business owners: you don't just pay back the amount they disputed, for every dispute you pay $30, which you only get back if you win!
- But with AI we can now create our own tools to fight back against dispute abuse and finally win! 🎉 - https://nitter.net/levelsio/status/2049847252680614105#m
- MiMo V2.5: The Free 1 Trillion Parameter Model Built For Real Work - https://www.reddit.com/r/AISEOInsider/comments/1szrfp0/mimo_v25_the_free_1_trillion_parameter_model/