编辑器初学者中的字幕

编辑器入门版附带一种为视频和音频资源生成字幕的方法。
默认情况下，它使用 OpenAI Whisper API。

🌐 The Editor Starter comes with a method to generate captions for videos and audio assets.
It uses the OpenAI Whisper API by default.

有关实现细节，请参阅 src/editor/captioning 中的源代码。

🌐 For implementation details, refer to the source code in src/editor/captioning.

在编辑器入门版中，字幕被视为一等的item type，类似于视频、图片或音频。这使得它们可以像时间轴和画布中的其他图层一样进行操作。

🌐 In the Editor Starter, captions are treated as a first-class item type, similar to videos, images, or audio. This allows them to be manipulated like any other layer in the timeline and canvas.

使用 OpenAI Whisper 进行设置（推荐）

🌐 Setup with OpenAI Whisper (recommended)

要使用 OpenAI 的 Whisper 模型生成字幕，请将你的 OpenAI 密钥添加到 .env 文件中：

🌐 To generate captions using OpenAI's Whisper model, add your OpenAI key to the .env file:

OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxx

如果存在 /api/captions 后端路由，这将启用服务器端转录。

🌐 This enables server-side transcription if the /api/captions backend route is present.

点击视频或音频图层上的“生成字幕”:

🌐 Click "Generate Captions" on a video or audio layer:

音频是在客户端提取的。
它会将其上传到 /api/captions 并通过 OpenAI 转录（注意：适用 25MB 的限制）
它将 OpenAI 的响应转换为 Remotion 的 Caption 类型，并将其作为 CaptionsItem 添加到时间轴中。

编辑

🌐 Editing

检查器默认允许用户编辑字幕的以下属性：

🌐 The inspector allows users to edit the following properties of captions by default:

单个令牌
排版：字体、文字颜色、高亮词颜色、文字不透明度、文字描边宽度和颜色
页面持续时间
调整单个词的时间

页面的自动创建

🌐 Automated creation of pages

字幕会自动拆分成“页面”，以便更容易管理。页面是按照时间分组的单词或句子，可以很好地显示在屏幕上。这是通过使用 @remotion/captions 包中的 createTikTokStyleCaptions 实现的。

🌐 Captions are automatically split into "pages" for easier management. Pages are timed groups of words or sentences that fit nicely on screen. This is achieved by using createTikTokStyleCaptions from @remotion/captions package.

限制

🌐 Limits

默认的字幕方式是使用 OpenAI Whisper API，该 API 每次请求的限制为 25MB。

🌐 The default way of captioning is to use the OpenAI Whisper API, which has a limit of 25MB per request.

在16Khz的采样率下，这大约是13.4分钟的单声道音频。
默认情况下，如果音频长度超过这个时间，编辑器入门版会禁用字幕功能。

🌐 At a 16Khz sample rate, this is about 13.4 minutes of mono audio.
By default, the Editor Starter disables the captioning feature if the audio is longer than that.

查看 MAX_DURATION_ALLOWING_CAPTIONING_IN_SEC 的逻辑以进行调整。

🌐 Review the logic of MAX_DURATION_ALLOWING_CAPTIONING_IN_SEC to tweak it.

替代方案

🌐 Alternatives

`@remotion/whisper-web`

你可以将 OpenAI Whisper API 替换为 @remotion/whisper-web 以进行本地、浏览器内的转录。
这消除了对 OpenAI 密钥和 S3 转录请求的需求，但你仍然需要在本地处理音频加载。

🌐 You can replace the OpenAI Whisper API with @remotion/whisper-web for local, in-browser transcription.
This eliminates the need for an OpenAI key and S3 fetches for transcription, but you'll still need to handle audio loading locally.

注意事项：

🌐 Caveats:

性能：转录在浏览器中运行于 CPU 上，这可能比像 OpenAI 云服务这样的 GPU 加速选项慢得多。
模型大小：较小的模型（例如 'tiny'）速度更快但准确性较低；较大的模型需要更多的内存和空间。
你需要为你的应用启用跨源隔离。

`@remotion/install-whisper-cpp`

你可以使用 @remotion/install-whisper-cpp 在 Node.js 服务器上转录音频。

🌐 You can use @remotion/install-whisper-cpp to transcribe audio on a Node.js server.

注意事项：

🌐 Caveats:

你负责托管和扩展服务器，这可能既昂贵又复杂。

另请参阅

🌐 See also

使用 OpenAI Whisper 进行设置（推荐）​

编辑​

页面的自动创建​

限制​

替代方案​

@remotion/whisper-web​

@remotion/install-whisper-cpp​

更多选择​

另请参阅​