北京时间2025年3月21日凌晨,OpenAI宣布在语音技术领域实现重大突破,正式发布三款新模型:GPT-4o Transcribe、GPT-4o Mini Transcribe以及GPT-4o Mini TTS。这些模型为AI智能体带来了更自然流畅的语音交互能力,也意味着与上一代的Whisper模型相比,在处理复杂语音场景和输出个性化语音方面有了显著提升。
In theearly morning of March 21, 2025(Beijing Time), OpenAI announced a major breakthrough in the field of speech technology, officially releasing three new models:GPT-4o Transcribe,GPT-4o Mini Transcribe, andGPT-4o Mini TTS. These models provide AI agents with more natural and fluent speech interaction capabilities, marking a notable improvement over the previous generation Whisper model in handling complex speech scenarios and delivering personalized voice outputs.
OpenAI特意开发了一个新网站来展示这些新功能,允许用户通过交互式演示体验这些模型的能力,感兴趣的小伙伴们快去了解一下吧!
OpenAI has developed a new websitespecifically to demonstrate these new features, allowing users to experience the capabilities of these models through interactive demos. If you’re interested, be sure to check it out!
Open AI三款新语言模型介绍
Introduction to OpenAI’s Three New Language Models
-
高性能语音转文本 - High-Performance Speech-to-Text
GPT-4o Transcribe在复杂环境(如嘈杂音、多口音、变速语音)下的转录准确度有了大幅提升。通过对超大规模音频数据进行训练,它能够更好地捕捉语音中的微小差异,显著降低了词错误率(WER)。
GPT-4o Transcribe significantly improves transcription accuracy in complex environments (such as noisy settings, multiple accents, and varying speech speeds). Trained on large-scale audio data, it is better at capturing subtle differences in speech, thereby markedly reducing the Word Error Rate (WER).

多款主流模型在 FLEURS 数据集上的词错误率(WER)对比
(WER 数值越低代表转录准确度越高)

最新语音转文本模型在 FLEURS 数据集上实现的转录错误率降低
-
多语言与多场景适配 - Adaptable to Multiple Languages and Scenarios
该模型的训练语料包含各种语言、方言以及真实场景下的音频数据,因此在不同语言环境和行业领域中,都具备较高的适用性。对于需要高精准度的使用场景(如会议记录、法律文档、医学访谈等),GPT-4o Transcribe显然更具优势。
The model's training data covers a range of languages, dialects, and real-world audio samples, making it highly applicable in different linguistic environments and industry settings. For scenarios requiring high accuracy—such as meeting minutes, legal documents, and medical interviews—GPT-4o Transcribe clearly holds a notable advantage.
-
轻量化设计 - Lightweight Design
GPT-4o Mini Transcribe通过知识蒸馏与模型压缩技术,在确保较高精度的前提下,显著减少了模型体积与计算开销。
Leveraging knowledge distillation and model compression techniques, GPT-4o Mini Transcribe significantly reduces model size and computational overhead while maintaining high accura
-
实时性与低资源占用 - Real-Time Performance and Low Resource Usage
得益于模型小型化,它能够在资源有限的移动端或嵌入式设备上快速运行,兼顾实时性与准确度。在满足中等规模的语音转录需求方面更有弹性,并降低了部署成本。
Thanks to its compact architecture, it can run swiftly on mobile or embedded devices with limited resources, striking a balance between real-time performance and accuracy. This approach offers greater flexibility for moderate-scale speech transcription needs and lowers deployment costs.

-
广泛应用前景 - Broad Application Prospects
对实时性要求较高的领域(如短语音命令、即时翻译、语音助手)可优先考虑Mini Transcribe,以便在保证准确度的同时提升用户体验。
For scenarios where real-time performance is critical—such as short voice commands, instant translation, and voice assistants—Mini Transcribe is an optimal choice, ensuring accuracy while enhancing the user experience.
-
自然流畅的文本转语音 - Natural and Fluent Text-to-Speech
这款模型不仅在合成语音的清晰度和逼真度上表现突出,也能通过模拟人类发声特征,让转换后的语音听来更加自然。
This model excels not only in producing clear and realistic synthesized speech but also in simulating human vocal characteristics, resulting in a more natural-sounding voice output.
-
可定制的情感与风格 - Customizable Emotions and Style对语调、情感和发音风格的精细化控制——可以让AI以“富有同情心的客服代表”、或“富有戏剧效果的故事讲述者”的口吻进行发声。这种定制化能力远超以往的TTS系统。
Its fine-grained control over tone, emotion, and vocal style—allowing the AI to adopt the persona of a“compassionate customer service representative”or a“dramatic storyteller.” This level of customization far surpasses previous TTS systems.
-
多语言、多角色支持 - Multi-Language, Multi-Role Support模型可生成多种不同性别、年龄甚至口音的语音,适合在客服热线、有声书、播客等场景进行更贴合用户或内容需求的个性化呈现。Capable of generating voices with different genders, ages, and even accents, the model is well-suited for scenarios such as customer service hotlines, audiobooks, and podcasts, enabling more personalized voice outputs tailored to user or content requirements.
总之,与上一代Whisper模型的对比,这三款新模型在识别准确度、性能与速度以及情感与个性化方面都有显著提升,无论是需要更精准的语音转录、多端高效的实时应用,还是对定制化语音风格的追求,都能获得更加出色的表现。
In summary, compared to the previous generation Whisper model, these three new models have significantly improved in terms of recognition accuracy, performance and speed, as well as emotional expression and personalization. Whether you need more precise speech-to-text, efficient real-time applications across multiple platforms, or customized voice styles, you can expect superior results.

目前已通过API向全球开发者开放,大家能够轻松地将语音功能集成到现有的应用中。
The API is now open to developers worldwide, allowing for easy integration of speech functionalities into existing applications.


OpenAI还推出了更新的Agents SDK,简化了将文本智能体转换为语音智能体的过程。开发者可以通过仅几行代码就实现语音交互。
OpenAI has also released an updated Agents SDK, making it simpler to transform text-based agents into voice-enabled ones. Developers can implement speech interactions with just a few lines of code.

一直以来,Sinokap都紧随AI发展步伐,致力于为各行业提供ChatGPT培训与IT技术支持。我们将持续为大家带来最新资讯与实战经验,帮助各行各业快速掌握并应用前沿AI技术。Sinokap has consistently kept up with the pace of AI development, dedicating itself to providing ChatGPT training and IT technical support across various industries. We will continue to share the latest information and hands-on experience, helping all sectors rapidly master and implement cutting-edge AI technologies.

扫码报名或邮件咨询
consulting@sinokap.com
Sinokap经过国家公认权威机构严格地考察和审核,凭借成熟的管理、规范的流程、丰富的经验及雄厚的实力,已获取ISO/IEC 27001:2013信息安全管理体系认证和ISO/IEC 20000-1:2018信息技术服务管理体系认证。


往期回顾
BREAK AWAY
OpenAI
Operator
OpenAI
15页报告
Skype 5月
永久停服
Zabbix
IT资产监控
Oxidized
备份工具
GPT 4.5突破
无监督学习
AI从计算到
情感智能跨越
企业数据中心
服务器升级
关机或无网络如
何找回iPhone
警惕仿冒
DeepSeek
CMDB内训
成功案例
打工人通用
AI提示词
PowerToys
高级粘贴功能
公司网速过慢
可能的原因
Sinokap公众号每日放送IT小技巧,让你的工作、学习、甚至是日常生活更加高效、便捷。
Sinokap China IT Service Teamdaily broadcastsProfessional IT Support Tips and Tricks. We are engaged to bring you a variety ofIT skillsthat can be widely used in your business work, study and daily life.
关注我们
Follow us
如果你想了解更多IT资讯和小诀窍~
If you would like to learn moreIT related newsandIT Trick Tips.
加入我们
Join us
如果你想变得更强,成为一名专业的工程师~
If you want to improve self-ability and grow into askillful engineer.
联系我们
Contact us
如果你需要专业的IT服务支持需求~
If you needProfessional IT Support and IT ServicesinShanghai, China.
点击进入公众号,长期关注不迷路!