Artificial intelligence
人工智能
Beyond the hype
炒作之外
ChatGPT mania may be cooling, but a serious new industry is taking shape
对ChatGPT的狂热正在消退,但一门重要的新兴产业正不断成型
The first wave of excitement about generative artificial intelligence (AI) was like nothing else the world had seen. Within two months of its launch in November 2022, ChatGPT had racked up 100m users. Internet searches for “artificial intelligence” surged; more than $40bn in venture capital flowed into AI firms in the first half of this year alone.
生成式人工智能引起的第一波狂热可谓史无前例。于2022年11月发布的ChatGPT,在两个月内就收割了一亿用户。互联网上“人工智能”词条的搜索量激增;今年上半年就有超过四百亿美元的风险资本流入了AI公司。
The craze for consumer experimentation has since cooled a little: ChatGPT use has fallen and fewer people are Googling “AI”. Son Masayoshi, a Japanese investor notorious for diving into already frothy markets, is thought to be interested in investing in OpenAI, ChatGPT’s creator. But a second, more serious phase is beginning. An entirely new industry centred on supercharged AI models is taking shape. Three forces will determine what it eventually looks like—and whether OpenAI stays dominant, or other players prevail.
这一场消费者实验(消费者在购买产品或使用服务时,对新事物或不同选择进行试验和探索的行为)的狂热自此之后有所消退:ChatGPT的使用次数和上谷歌搜索“AI”的人数都有下降。以投机泡沫市场而恶名远扬的孙正义,据称有兴趣投资ChatGPT的开发公司OpenAI。但是更为重要的第二阶段已经开始。一门聚焦于增强AI模型的全新产业正不断成型。这门产业最终会呈现何种形态——以及OpenAI是依旧处于领先地位,还是被其他公司击败——将取决于以下三个因素。
The first factor is computing power, the cost of which is forcing model-builders to become more efficient. Faced with the eye-watering costs of training and running more powerful models, for instance, OpenAI is not yet training its next big model, GPT-5, but GPT-4.5 instead, a more efficient version of its current leading product. That could give deep-pocketed rivals such as Google a chance to catch up. Gemini, the tech giant’s soon-to-be-released cutting-edge model, is thought to be more powerful than OpenAI’s current version.
第一个因素是计算力,因其成本费用会不断迫使模型开发者提高效率。以OpenAI为例,要训练并运行更强大的模型需要极高的成本,因此该公司并没有训练开发下一代大模型GPT-5,而以GPT-4.5(当前OpenAI主导产品升级后更为有效的版本)取而代之。这样一来,财力雄厚的对手公司(如谷歌),就有了赶超OpenAI的机会。谷歌即将发布的尖端模型Gemini据称强于OpenAI现阶段的产品。
High computing costs have also encouraged the proliferation of much smaller models, which are trained on specific data to do specific things. Replit, a startup, has trained a model on computer code to help developers write programs, for instance. Open-source models are also making it easier for people and companies to plunge into the world of generative AI. According to a count maintained by Hugging Face, an AI firm, roughly 1,500 versions of such fine-tuned models exist.
高昂的计算成本也促进了小模型(用特定数据训练以完成特定任务的模型)的发展。例如,一家名为Replit的创业公司用计算机代码训练模型以帮助程序员编写程序。开源模型的发展也使得个人与公司都能投身于生成式AI的开发。根据一家名为Hugging Face的AI公司统计,该种优化模型现有约1500个。
All these models are now scrambling for data—the second force shaping the generative-AI industry. The biggest, such as OpenAI’s and Google’s, are gluttonous: they are trained on more than 1trn words, the equivalent of over 250 English-language Wikipedias. As they grow bigger they will get hungrier. But the internet is close to being exhausted. Many model-makers are therefore signing deals with news and photography agencies. Others are racing to create “synthetic” training data using algorithms; still others are trying to work with new forms of data, such as video. The prize is a model that beats the rivals.
如今,所有的这些模型都在争夺数据——这是塑造生成式AI产业的第二大因素。最大的AI模型,如OpenAI和谷歌两家的产品,对数据的需求无休无止:它们的训练材料超过一万亿字,相当于250多倍的英语维基百科。当模型体量变得越来越庞大,它们对数据的需求也会与日俱增。但互联网已经出现了疲态。因此,许多模型开发人员正与新闻、摄影机构签约获取数据。其他开发者争相使用算法开发“人工合成”训练数据;还有一部分开发者尝试运用新数据形式(如视频)训练AI。竞争的终点则是能击败一切对手的模型。
Generative AI’s hunger for data and power makes a third ingredient more important still: money. Many model-makers are already turning away from ChatGPT-style bots for the general public, and looking instead to fee-paying businesses. OpenAI, which started life in 2015 as a non-profit venture, has been especially energetic in this regard. It has not just licensed its models to Microsoft, but is setting up bespoke tools for companies including Morgan Stanley and Salesforce. Abu Dhabi plans to establish a company to help commercialise applications of Falcon, its open-source AI model.
生成式AI的算力对数据的需求又推动了第三个更为重要的因素:资金。许多模型开发者已经放弃了ChatGPT这种适用于普罗大众的机器人,而转向了付费商业模式。OpenAI于2015年创立,彼时还是非盈利企业,如今却在付费领域十分活跃。它不仅将自己的模型授权给未然,还为Morgan Stanley和Salesforce这样的公司开发定制工具。阿布扎比计划成立公司推广Falcon(阿联酋的开源AI模型)的商业化应用。
Another approach is to appeal to software developers, in the hope of getting them addicted to your model and creating the network effects that are so prized in tech. OpenAI is offering tools to help developers build products using its models; Meta hopes that LLaMA, its open-source model, will help create a loyal community of programmers.
另外一个获取资金的方式是吸引软件开发商,以期开发商对模型形成依赖,最终形成技术行业最为重视的网络效应。OpenAI就在用自己的模型为开发商提供工具开发产品;Meta希望其开源模型LLaMA能有助于构建忠实的程序员社群。
Who will emerge victorious? Firms like OpenAI, with its vast number of users, and Google, with its deep pockets, have a clear early advantage. But for as long as computing power and data remain constraints, the rewards for clever ways around them will be large. A model-builder with the most efficient approach, the most ingenious method to synthesise data or the most appealing pitch to customers could yet steal the lead. The hype may have cooled. But the drama is just beginning.
谁会取得最终的胜利呢?像OpenAI这样有庞大用户群体的公司,还有像谷歌这样财力雄厚的公司,在早期都有明显优势。但是只要计算力和数据存在约束力,那么能绕开这两个因素的创造性方案必然会抢占先机。如今的模型开发者,只要有着最有效的方案,或是最巧妙的合成数据方式,又或是吸引消费者的最佳手段,都有可能遥遥领先。对于AI的炒作也许已经消停,但各大公司明争暗斗的好戏才刚刚开始。