转录()v4.0.131

🌐 transcribe()v4.0.131

通过使用 Whisper.cpp 转录媒体文件。你应首先安装 Whisper.cpp，例如通过 installWhisperCpp()。

🌐 Transcribes a media file by utilizing Whisper.cpp.
You should first install Whisper.cpp, for example through installWhisperCpp().

note

此功能仅在 Whisper.cpp 1.5.5 或更高版本中有效，除非将 tokenLevelTimestamps 设置为 false。

transcribe.mjs
import path from 'path';
import {transcribe} from '@remotion/install-whisper-cpp';

const {transcription} = await transcribe({
  inputPath: '/path/to/audio.wav',
  whisperPath: path.join(process.cwd(), 'whisper.cpp'),
  whisperCppVersion: '1.5.5',
  model: 'medium.en',
  tokenLevelTimestamps: true,
});

for (const token of transcription) {
  console.log(token.timestamps.from, token.timestamps.to, token.text);
}

选项

🌐 Options

`inputPath`

要从中提取文本的文件路径。

🌐 The path to the file you want extract text from.

该文件必须是16位、16KHz的WAVE文件。有关更多信息，请参见将音频重采样为16kHz。

🌐 The file has to be a 16-bit, 16KHz, WAVE file. See Resample audio to 16kHz for more information.

`whisperPath`

到你的 whisper.cpp 文件夹的路径。如果你还没有安装 Whisper.cpp，你可以例如通过 installWhisperCpp() 安装，并使用相同的 folder。

🌐 The path to your whisper.cpp folder.
If you haven't installed Whisper.cpp, you can do so for example through installWhisperCpp() and use the same folder.

`tokenLevelTimestamps`v4.0.131

将 --dtw 标志传递给 Whisper.cpp 以生成更精确的时间戳，这些时间戳会在 t_dtw 字段下返回。建议使用以获得真正准确的时间，但仅适用于 1.0.55 之后的 Whisper.cpp 版本。如果你使用的是较早版本的 Whisper.cpp，请设置为 false。

🌐 Passes the --dtw flag to Whisper.cpp to generate more accurate timestamps, which are being returned under the t_dtw field.
Recommended to get actually accurate timings, but only available from Whisper.cpp versions later than 1.0.55.
Set to false if you use an older version of Whisper.cpp.

`model?`

默认: base.en

🌐 default: base.en

为转录指定一个具体的 Whisper 模型。

🌐 Specify a specific Whisper model for the transcription.

可能的值：tiny、tiny.en、base、base.en、small、small.en、medium、medium.en、large-v1、large-v2、large-v3、large-v3-turbo。

🌐 Possible values: tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v1, large-v2, large-v3, large-v3-turbo.

确保你想使用的模型存在于你的 whisper.cpp/models 文件夹中。你可以通过使用 downloadWhisperModel() API 来确保特定模型在本地可用。

🌐 Make sure the model you want to use exists in your whisper.cpp/models folder. You can ensure a specific model is available locally by utilizing the downloadWhisperModel() API.

注意：large-v3-turbo 仅在从 2024 年 11 月或更晚版本构建的 Whisper.cpp 以及 Remotion v4.0.229 或更高版本中正常工作。

🌐 Note: large-v3-turbo is only working properly from Whisper.cpp versions built from November 2024 or later and Remotion v4.0.229 or greater.

`modelFolder?`

_default: whisperPath/模型

🌐 default: whisperPath/models

如果你将 Whisper 模型保存到特定文件夹，请在此传入其路径。

🌐 If you saved Whisper models to a specific folder, pass its path here.

使用通过 whisperPath 定义的位置的 whisper.cpp/models 文件夹作为默认值。

🌐 Uses the whisper.cpp/models folder at the location defined through whisperPath as default.

`translateToEnglish?`

默认: 假

🌐 default: false

如果你想获得所提供文件的英文翻译抄本，请将此布尔标志设置为 true。请确保不要使用 *.en 模型，因为它们无法将外国语言翻译成英语。

🌐 Set this boolean flag to true if you want to get a translated transcription of the provided file in English. Make sure to not use a *.en model, as they will not be able to translate a foreign language to english.

note

我们建议至少使用 medium 模型，以在翻译时获得令人满意的结果。

`printOutput?`v4.0.132

是否将转录过程的输出打印到控制台。默认值为 true。

🌐 Whether to print the output of the transcription process to the console. Defaults to true.

`tokensPerItem?`v4.0.141

默认: 1

🌐 default: 1

每个转录项目中包含的最大令牌数量。

🌐 The maximum amount of tokens included in each transcription item.

将此标志设置为 null，以使用 whisper.cpp 的默认标记分组（对于生成电影风格的转录很有用）。

🌐 Set this flag to null, to use whisper.cpp's default token grouping (useful for generating a movie-style transcription).

info

tokensPerItem 只有在 tokenLevelTimestamps 设置为 false 时才能设置。

`splitOnWord?`v4.0.208

在 Whisper.cpp 中添加 --split-on-word 标志以获得更清晰的逐字输出。

`language?`v4.0.142

默认: 空

🌐 default: null

将 -l 标志传递给 Whisper.cpp，以指定音频文件的语音语言。

🌐 Passes the -l flag to Whisper.cpp to specific spoken language of the audio file.

Possible values: Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Castilian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, Flemish, French, Galician, Georgian, German, Greek, Gujarati, Haitian, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latin, Latvian, Letzeburgesch, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Moldavian, Moldovan, Mongolian, Myanmar, Nepali, Norwegian, Nynorsk, Occitan, Panjabi, Pashto, Persian, Polish, Portuguese, Punjabi, Pushto, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Sinhalese, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Valencian, Vietnamese, Welsh, Yiddish, Yoruba, Zulu. af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zh or auto.

`signal?`v4.0.156

来自 AbortController 的信号，用于取消转录过程。

🌐 A signal from an AbortController to cancel the transcription process.

`onProgress?`v4.0.156

听取转录过程的进度更新。
进度是介于 0 和 1 之间的数字。

🌐 Listen for progress updates from the transcription process.
The progress is a number between 0 and 1.

import type {TranscribeOnProgress} from '@remotion/install-whisper-cpp';

const onProgress: TranscribeOnProgress = (progress) => {
  console.log(`Transcription progress: ${progress * 100}%`);
};

`flashAttention?`v4.0.324

布尔值，启用闪存注意力。

🌐 Boolean value, enable flash attention.

`additionalArgs?`v4.0.324

要传递给 whisper 的附加参数，以数组形式。数组可以包含字符串或字符串对，例如

🌐 Additional args to be passed to whisper, in an array. The array can contain strings or string pairs, like

transcribe({
  ...,
  additionalArgs: ['-tdrz', ['--max-len', '1']]
})

返回值

🌐 Return value

`TranscriptionJson`

一个包含转录过程产生的所有元数据和转录内容的对象。

🌐 An object containing all the metadata and transcriptions resulting from the transcription process.

type Timestamps = {
  from: string;
  to: string;
};

type Offsets = {
  from: number;
  to: number;
};

type WordLevelToken = {
  t_dtw: number;
  text: string;
  timestamps: Timestamps;
  offsets: Offsets;
  id: number;
  p: number;
};

type TranscriptionItem = {
  timestamps: Timestamps;
  offsets: Offsets;
  text: string;
};

type TranscriptionItemWithTimestamp = TranscriptionItem & {
  tokens: WordLevelToken[];
};

type Model = {
  type: string;
  multilingual: boolean;
  vocab: number;
  audio: {
    ctx: number;
    state: number;
    head: number;
    layer: number;
  };
  text: {
    ctx: number;
    state: number;
    head: number;
    layer: number;
  };
  mels: number;
  ftype: number;
};

type Params = {
  model: string;
  language: string;
  translate: boolean;
};

type Result = {
  language: string;
};

export type TranscriptionJson<WithTokenLevelTimestamp extends boolean> = {
  systeminfo: string;
  model: Model;
  params: Params;
  result: Result;
  transcription: true extends WithTokenLevelTimestamp ? TranscriptionItemWithTimestamp[] : TranscriptionItem[];
};

在准确的时间戳上，优先依赖 t_dtw 值而不是 offsets。使用 convertToCaptions() 来使用我们对字幕后处理的有建议性的方案。

🌐 Prefer relying on the t_dtw value for accurate timestamps over offsets.
Use convertToCaptions() to use our opinionated suggestion for postprocessing the captions.

另请参阅

🌐 See also

选项​

inputPath​

whisperPath​

tokenLevelTimestampsv4.0.131​

model?​

modelFolder?​

translateToEnglish?​

printOutput?v4.0.132​

tokensPerItem?v4.0.141​

splitOnWord?v4.0.208​

language?v4.0.142​

signal?v4.0.156​

onProgress?v4.0.156​

flashAttention?v4.0.324​

additionalArgs?v4.0.324​

返回值​

TranscriptionJson​

另请参阅​

选项