Skip to main content

transcribe()

warning

不稳定的 API:此软件包目前处于实验阶段。在测试过程中,我们可能会对 API 做一些更改,并在未来切换到基于 WebGPU 的后端。

使用 WebAssembly 编译的 Whisper.cpp 转录预处理的音频数据(一个 Float32Array 波形),返回转录结果。

🌐 Transcribes pre-processed audio data (a Float32Array waveform) using WebAssembly-compiled Whisper.cpp, returning the transcription.

要转录音频文件,你首先需要使用 downloadWhisperModel() 下载一个模型,并使用 resampleTo16Khz() 将文件转换为 16kHz Float32Array

🌐 To transcribe an audio file, you first need to download a model using downloadWhisperModel() and turn the file into a 16kHz Float32Array using resampleTo16Khz().

要求

🌐 Requirements

此包需要一个跨源隔离的页面。

🌐 This package requires a page that is cross-origin isolated.

示例

🌐 Example

app.ts
import {transcribe} from '@remotion/whisper-web'; const {transcription} = await transcribe({ channelWaveform, model: 'tiny.en', onProgress: (p) => console.log(`Transcribing (${Math.round(p * 100)}%)...`), }); console.log(transcription.map((t) => t.text).join(' '));

参数

🌐 Arguments

channelWaveform

一个 Float32Array,表示单声道音频波形数据,重采样为 16kHz。通常通过使用音频 FileBlob 调用 resampleTo16Khz() 获得。

🌐 A Float32Array representing the mono audio waveform data, resampled to 16kHz. This is typically obtained by calling resampleTo16Khz() with an audio File or Blob.

model

要用于转录的 Whisper 模型(例如,'tiny.en''base''small')。这决定了模型的大小、速度和准确性。在调用 transcribe 之前,请确保已使用 downloadWhisperModel() 下载模型。

🌐 The Whisper model to use for transcription (e.g., 'tiny.en', 'base', 'small'). This determines the size, speed, and accuracy of the model. Ensure the model has been downloaded using downloadWhisperModel() before calling transcribe.

可能的值:tinytiny.enbasebase.ensmallsmall.en

🌐 Possible values: tiny, tiny.en, base, base.en, small, small.en.

有关可用模型名称的列表,请参阅包导出的 WhisperModel 类型或 downloadWhisperModel() 支持的模型。

🌐 For a list of available model names, refer to the WhisperModel type exported by the package or the models supported by downloadWhisperModel().

language?

默认:'auto'

🌐 default: 'auto'

可选。音频的语言,使用 ISO 639-1 格式(例如,'en''es''de')。设置为 'auto' 以由 Whisper 自动检测语言。

🌐 Optional. The language of the audio in ISO 639-1 format (e.g., 'en', 'es', 'de'). Set to 'auto' for automatic language detection by Whisper.

Possible values: Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Castilian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, Flemish, French, Galician, Georgian, German, Greek, Gujarati, Haitian, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latin, Latvian, Letzeburgesch, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Moldavian, Moldovan, Mongolian, Myanmar, Nepali, Norwegian, Nynorsk, Occitan, Panjabi, Pashto, Persian, Polish, Portuguese, Punjabi, Pushto, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Sinhalese, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Valencian, Vietnamese, Welsh, Yiddish, Yoruba, Zulu. Or their corresponding ISO 639-1 codes: af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zh, or 'auto'.

有关支持的语言代码列表,请参阅官方 Whisper 文档或包中导出的 WhisperLanguage 类型。

🌐 For a list of supported language codes, refer to the official Whisper documentation or the WhisperLanguage type exported by the package.

onProgress?

可选。根据转录进度采取行动。progress 的值是介于 0 和 1 之间的数字。

🌐 Optional. Act upon transcription progress. The progress value is a number between 0 and 1.

const onProgress = (progress: number) => {
  console.log(`Transcription progress: ${Math.round(progress * 100)}%`);
};

onTranscriptionChunk?

回调函数,在处理转录段时接收一个 TranscriptionItemWithTimestamp 对象数组。这对于显示实时转录更新非常有用。

🌐 Callback function that receives an array of TranscriptionItemWithTimestamp objects as transcription segments are processed. This is useful for displaying live transcription updates.

threads?

用于转录的线程数。默认值为 4。使用更多线程可以加快转录速度,但会增加 CPU 使用量。允许的最大值为 16;请求更多线程将被拒绝。

🌐 The number of threads to use for transcription. Defaults to 4. Using more threads can speed up transcription but increases CPU usage. The maximum allowed is 16; requests for more will be rejected.

logLevel?

默认:info

🌐 Default: info

类型: 'trace' | 'verbose' | 'info' | 'warn' | 'error'

确定向控制台记录了多少信息。

🌐 Determines how much info is being logged to the console.

并发

🌐 Concurrency

重要说明: transcribe() 函数不能同时被多次调用。如果你尝试在一次转录正在进行时启动新的转录,则新的调用会被拒绝并返回错误。在开始新的转录之前,请确保前一次转录过程已完成或已被稳妥处理。

另请参阅

🌐 See also