Azure Cognitive Services – Speech

Hi all,

Last week I was doing something really different (not new because I already knew a little bit of Java and .net)

I guess that my mind set is going like Microsoft future plans for certifications. Half hardcore architect and half a coding freak. I cannot help my self when there is a new thing, I just have to jump to it, no matter how unknown territory I will be walking in.

Enough for the BS, lets talk tech for a sec.

Azure Cognitive Services and specifically Speech to Text or from Text to Speech.

Had an idea and Googled (Is this term already in dictionary??) Can I use files from my own hosted IIS, Apache whatever server or even local directory to transfer files asynchronous to Speech services @Azure or GCP … could this really be possible.

First I saw documentation about Azure Speech api, they said that if You want to use longer than 15sec file, You have to upload it to Blob Storage in Azure and use batch transcription.

Spoiler alert as a formula “+44 = 16khz * 16 bits * 600 seconds + 44 = 160002*600+44 = 19200044″ (I will open this one later, maybe)

So, then I started to try this and that, first with Powershell. Generating Azure Speech API Key (There is two keys generated, but only one is needed for auth)

I got this one working with under 15sec wav-file in 16khz. It translated the speech well enough, but the lenght was too short.

A hint for the powershell usage with invoke-restmethod and Api Key.

You have to put key inside {} to get it working, for some reason Azure declined the without these brackets.

Although, Microsoft is saying in the documentation that rest-api and subscription key have to be put in ‘key’ format, but nope. Just no.

And Microsoft also in the first pic (under 15sec files, but in the next document its only 10)

Ok, I get that this is new stuff to them also, there was Bing Speech before, but still this is new and exciting. Btw, Google also released new functions for their Speech Services. And yes I also tried Google invented GoLang for this, but that is a different story to be told, maybe.

And when You make the request, You have to use the same region conversion service that our Resource Group is in. So all the stuff in Azure have to be in same resource group or You will get funny (wasnt funny then) errors with Your queries.

Mine was northeurope.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1

Here the list for the services https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-speech-to-text

Eteenpäin sano mummo lumessa (Just had to use this one, Its Finnish and translated to … but hey wait, You can use Azure after this blog, not telling)

For codeless apps You can use Azure Logic Apps or Google App Engine for these. Here a nice write-up from Abhishek about Azure Logic Apps and batch transcript from Blob.

First You want to download Azure Speech SDK for other languages or JavaScript in a browser package There was a new version 1.16 release in the beginning of march and now it supports mp3-format like Google does.

I tried this one with WSL and node.js libraries. https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/cognitive-services/Speech-Service/includes/how-to/speech-to-text-basics/speech-to-text-basics-javascript.md

You can also use Visual studio to build a solution to test this one, but I prefer the node.js based solution. https://github.com/Azure-Samples/Cognitive-Speech-STT-Windows

And here is a sample repository for all different languages https://github.com/Azure-Samples/cognitive-services-speech-sdk

Happy speeching all, this article will be continued in part2.

Author: Harri Jaakkonen

Leave a Reply

Your email address will not be published. Required fields are marked *