Coqui STT - Extract Words from Audio Files
11/02/2022, WedSpeech to Text CLI Tool
There could be a time when you were listening to a podcast, but you can not make out the spelling of a word that you would like to understand. If there was a way to add a 'live-caption' feature to your podcast, this will inform you of the vocabulary that you are missing. To get remedy this problem, you can use a Speech-to-Text tool to output the transcript of the audio.
You might also want to convert an audio file to text if you are able to read quicker than you can listen to the audio file.
The following will show you how to use coqui STT to perform a transcribing of an audio file to text in the terminal.
docker pull ghcr.io/coqui-ai/stt-train
Download the docker container.
docker run -it --net="host" ghcr.io/coqui-ai/stt-train:latest
Add the host directory to docker for access inside the docker container when running the docker container.
docker run -v /folder/for/host:/folder/for/docker -it ghcr.io/coqui-ai/stt-train:latest
Download a version of the pre-trained data from
https://github.com/coqui-ai/STT/releases/tag/v1.4.0
Navigate to the host directory of where the folder 'coqui-stt-1.4.0-checkpoint' is stored.
In the same directory, create the directory for the 'checkpoint' folder.
mkdir coqui-stt-1.0.0-checkpoint
and download the huge-vocabulary.scorer from https://coqui.ai/english/coqui/v1.0.0-huge-vocab
Acquire a mp3 audio file for STT to process. You will most likely have to convert the audio file because STT will only take a 16-bit wav file format for input.
The following ffmpeg media converter command shows an example of a conversion.
ffmpeg -i "Gettysburg Address.mp3" -acodec pcm_u8 -ar 22050 "Gettysburg Address.wav"
Run the 'Single file (aka one-shot) inference' command to perform a basic conversion of an audio to text.
python -m coqui_stt_training.training_graph_inference --checkpoint_dir coqui-stt-1.4.0-checkpoint --scorer_path huge-vocabulary.scorer --n_hidden 2048 --one_shot_infer 'Gettysburg Address.wav'
The output will be displayed on the console.