Ask HN: What is the max audio quality to feed to machine-learning transcription?

2 pointsby kmfrkalmost 8 years ago

Since machine learning doesn't map directly to human cognition, I was wondering how wasteful it is to feed something like 160 kbps speech audio to a transcription service, and how much time and performance could be saved by reducing it to something vastly smaller.<p>I know little to nothing about this field, but as someone interested in where transcription is and in archival, I was wondering what the optimal solution is when ripping videos from different places and extracting the audio to transcribe it programmatically.

1 comment

braindead_inalmost 8 years ago

Most datasets are 16khz sampling rate, wav file. A good place to start is Mozilla deepspeech implementation.