How to call openai whisper model to get time informaiton in Python

data engineering

Publish Date: 2023-09-10

The speech to text API provides two endpoints, transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model. They can be used to:

Transcribe audio into whatever language the audio is in.
Translate and transcribe the audio into english.
File uploads are currently limited to 25 MB and the following input file types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm.

Here is one example in Python

import requests
import openai

# Define the API endpoint and headers
url = "https://api.openai.com/v1/audio/transcriptions"
headers = {
    "Authorization": "Bearer {}".format(open_ai_key)  # replace with your API key
    
}

# location of your audito files, could be mp3 or mp4, etc.
FILE_PATH = "./upload-whisper.mp4"
# define the parameters

files = {
    'file': ('test.mp4', open(FILE_PATH, 'rb')),
    'model': (None, 'whisper-1'),
    'response_format': (None, 'srt')

}

response = requests.post(url, headers=headers, files=files)
print(response.text)

the output is:

1
00:00:00,000 --> 00:00:02,600
First, I need you to go to the front desk to sign up for work.

Notice that, in the above code, we set the resonse_format to be “srt” which comes with timestamp.
The format of the transcript output can be also one of these options: json, text, srt, verbose_json, or vtt.

robot learner

https://datasciencebyexample.github.io/2023/09/10/how-to-call-openai-whisper-with-timestamp-such-as-srt-in-python/

All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !

openai whisper

How does openai rate limits work

2023-09-11 data engineering

openai

How to manage endpoints for model serving on Databricks using API and UI

2023-09-08 data engineering

databricks