Web Speech API

Web Speech API

Speech API and Natural Language Processing (NLP) is the upcoming trends for user interface design, where there would not be any need to clicking and typing over the software or any mobile application, there are two aspect of Speech API – Speech Recognition and Speech Synthesis also called as text to speech .

Speech Recognition

In speech recognition the input would be taken by a microphone device which than sent to the speech services, the underlying speech services than check the input against predefine grammatical engine or a grammar database, once input get successfully verified a response will be sent to the calling API and further actions can be taken as a result. While implementing Speech API with web the main controller interface which take part in SpeechRecognition this also handles the SpeechRecognitionEvent sent from the recognition service by default speech recognition from the device will be used, like Dictation on macOS, Siri on iOS, Cortana on Windows 10, Android Speech, etc.

Note: On some browsers, for example Chrome, Speech Recognition on an HTML page requires a server-based recognition engine, audio is sent to a web service for recognition processing, so it won’t work offline

Browser and OS Support

Web Speech API speech recognition is currently limited to Chrome for Desktop and Android — Chrome has supported it since version 33 but with prefixed interfaces, so you need to include prefixed versions of them, e.g. webkitSpeechRecognition.

Constructor

SpeechRecognition.SpeechRecognition()
Creates a new SpeechRecognition object.

Properties

SpeechRecognition.grammars

Returns and sets a collection of SpeechGrammar objects that represent the grammars that will be understood by the current SpeechRecognition.

SpeechRecognition.lang

Returns and sets the language of the current SpeechRecognition. If not specified, this defaults to the HTML lang attribute value, or the user agent’s language setting if that isn’t set either.

SpeechRecognition.continuous

Controls whether continuous results are returned for each recognition, or only a single result. Defaults to single (false.)

SpeechRecognition.interimResults

Controls whether interim results should be returned (true) or not (false.) Interim results are results that are not yet final (e.g. the SpeechRecognitionResult.isFinal property is false.)

SpeechRecognition.maxAlternatives

Sets the maximum number of SpeechRecognitionAlternatives provided per result. The default value is 1.

SpeechRecognition.serviceURI

Specifies the location of the speech recognition service used by the current SpeechRecognition to handle the actual recognition. The default is the user agent’s default speech service.

Methods

SpeechRecognition.abort()

Stops the speech recognition service from listening to incoming audio, and doesn’t attempt to return a SpeechRecognitionResult.

SpeechRecognition.start()

Starts the speech recognition service listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition.

SpeechRecognition.stop()

Stops the speech recognition service from listening to incoming audio, and attempts to return a SpeechRecognitionResult using the audio captured so far.

Events

Listen to these events using addEventListener() or by assigning an event listener to the oneventname property of this interface.

audiostart

Fired when the user agent has started to capture audio. Also available via the onaudiostart property.

audioend

Fired when the user agent has finished capturing audio. Also available via the onaudioend property.

end

Fired when the speech recognition service has disconnected. Also available via the onend property.

error

Fired when a speech recognition error occurs. Also available via the onerror property.

nomatch

Fired when the speech recognition service returns a final result with no significant recognition. This may involve some degree of recognition, which doesn’t meet or exceed the confidence threshold. Also available via the onnomatch property.

result

Fired when the speech recognition service returns a result — a word or phrase has been positively recognized and this has been communicated back to the app. Also available via the onresult property.

soundstart

Fired when any sound — recognisable speech or not — has been detected. Also available via the onsoundstart property.

soundend

Fired when any sound — recognisable speech or not — has stopped being detected. Also available via the onsoundend property.

speechstart

Fired when sound that is recognised by the speech recognition service as speech has been detected.Also available via the onspeechstart property.

speechend

Fired when speech recognised by the speech recognition service has stopped being detected.Also available via the onspeechend property.

start

Fired when the speech recognition service has begun listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition. Also available via the onstart property.

Speech Synthesis

This is the process where input for a speech engine has taken by the keyboard or a mouse and in result the output will be given from an audio speaker this concept also known as Text-To-Speech or TTS. The main controller interface for Web Speech API is SpeechSynthesis — along with number of closely-related interfaces for representing text to be synthesized (known as utterances), different voices to be used for the utterance.

Properties of Speech Synthesis

SpeechSynthesis.paused Read only

A Boolean that returns true if the SpeechSynthesis object is in a paused state.

SpeechSynthesis.pending Read only

A Boolean that returns true if the utterance queue contains as-yet-unspoken utterances.

SpeechSynthesis.speaking Read only

A Boolean that returns true if an utterance is currently in the process of being spoken — even if SpeechSynthesis is in a paused state.

Methods

SpeechSynthesis.cancel()

Removes all utterances from the utterance queue.

SpeechSynthesis.getVoices()

Returns a list of SpeechSynthesisVoice objects representing all the available voices on the current device.

SpeechSynthesis.pause()

Puts the SpeechSynthesis object into a paused state.

SpeechSynthesis.resume()

Puts the SpeechSynthesis object into a non-paused state: resumes it if it was already paused.

SpeechSynthesis.speak()

Adds an utterance to the utterance queue; it will be spoken when any other utterances queued before it have been spoken.

Events

Listen to this event using addEventListener() or by assigning an event listener to the oneventname property of this interface.

voiceschanged

Fired when the list of SpeechSynthesisVoice objects that would be returned by the SpeechSynthesis.getVoices() method has changed. Also available via the onvoiceschanged property.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

code