Wednesday 4 April 2018

Speech Recognition on Office 365 SharePoint

Let us look how the speech recognition can be implemented on Office 365 SharePoint portals. The article contains the introduction to speech recognition service (speech to text conversion), detailed approach for SharePoint, code and snapshots for easier implementation.


Speech Recognition


Speech recognition helps recognizing the real-time audio from the microphone and converts it to the respective text. This kind of interfaces helps in building the voice triggered apps like chat bots, etc.

There are two approaches of implementing speech recognition on SharePoint.
  • First using the Speech Recognition interfaces.
  • Other way is using the Azure Bing Speech API. It is built on top of WebSockets API. The Speech SDK is available as extensions, which can be leveraged for development.
Let us go with the first approach in this post. We are going to see how the speech recognition interfaces are integrated on to SharePoint applications. In my future articles, I will detail out the integration of Bing Speech API (second approach).


Note: The modules depend upon WebRTC, so the modern day browsers like Chrome/Firefox will only be supported, which can get access to the microphone.


Web Speech API Interfaces


Web Speech API helps enabling voice data into the web apps. There are two parts to it.
  • Speech Recognition Interface – Converts speech into text. (This is what we are going to look into)
  • Speech Synthesis Interface – converts text into speech.


Integrating into SharePoint


Create a content editor webpart and place the following HTML text. This html contains the elements like button for start/stop recording, span for displaying the spoken text.


SpeechRecognition Interface


Then required script logic has to be developed. The following details out the interfaces and how the same is integrated into SharePoint.

Get the Speech Recognition interface and initialize an object. Then initialize the required properties. In our case, we are only setting the following property.
  • lang – Language of the spoken text. For English, language code will be en-US.

The other properties that can be considered are
  • interimResults – Boolean, default to false. Decides whether interim results to be shown or not.
  • maxAlternatives – Integer, default to 1. Number of alternative text for the speech given.
  • Continuous – Boolean, default to false. Sets if continuous results to be shown or not.      

Then add the event handlers for tracking the audio. The following event handlers are considered in the sample.
  • onspeechstart – Fired when some sound is detected by speech recognition service.
  • onspeechend – Fired when speech recognition service stops detecting sound.
  • onresult – Fired when speech recognition service returns the text after recognition and processing.

The methods considered in the sample are,
  • start – Start the speech recognition service listening to incoming audio
  • stop – Stop the speech recognition service from listening to incoming audio
More details can SpeechRecognition interface be found here.

Script File - speechrecognizer.js


The following code snippet shows how the speech recognition service is being integrated into SharePoint portal.



The following snapshot shows the recognized speech on the Office 365 SharePoint portal.

Speech Recognition on SharePoint
Speech Recognition on SharePoint


Note: The above logic is applicable to SharePoint 2013, SharePoint 2016 or SharePoint online versions.

In my next article, you will see how Azure Bing Speech API can be integrated on to SharePoint.