I think it’s typical that the live captioners add a modest upcharge to convert their transcriptions to VTT.
If you’re really hurting for budget, a trick I use for shorter projects is to upload the video to an unlisted youtube and let their AI take the first crack at it. From their editing interface (when you have a creator studio account) you can edit and correct the mistakes the AI made and then download an SRT or VTT file. The quality of the transcription depends a lot on the clearness of the audio and a lack of accents or dialect. One good clear voice at a time, in a clean room, well-recorded, comes out of the box about 90 percent correct. Off-mic, echo-ey rooms and people speaking too fast or with strong accents – you get 50 t0 60 percent accurate, and have to clean the rest up manually. Youtube’s AI doesn’t do punctuation or capitalization either, nor will it assign name headers as each new voice chimes in on the track… But hey, it’s a start and better than nothing. Those extra features are why live human transcriptionists are still worth what they want to charge.
There’s some other free transcription systems out there, I don’t think they do any better of a job though, and they may not come with an easy editing interface like Youtube has.