C++ JavaScript – How to Implement Option to Return Blob, ArrayBuffer, or AudioBuffer from window.speechSynthesis.speak()

cjavascript

Note, presently have no experience composing or modifying C++ code, which the code that will need to compose or modify is written in.

Specifically, window.speechSynthesis.speak() does not currently provide an option to return the audio generated by the function call as a Blob, ArrayBuffer or AudioBuffer at either Chromium or Firefox, where have filed feature request for the same
Issue 733051, Bug 1377893.

The relevant workaround that have composed at SpeechSynthesisRecorder.js is to utilize navigator.mediaDevices.getUserMedia() and MediaRecorder() to record the audio output of .speak() call from Monitor of Built-in Audio Analog Stereo, which produces expected result at Chromium, though choppy playback of audio at Firefox. Though, why should we have to use navigator.mediaDevices.getUserMedia() and MediaRecorder() to create a copy of generated audio, instead of providing a parameter to .speak() which indicates that the generated audio should not be played, but rather, returned as an Blob, ArrayBuffer, AudioBuffer, or other data type?

meSpeak.js library provides a rawdata option

  • rawdata: Do not play, return data only. The type of the returned data is derived from the value (case-insensitive) of 'rawdata':
    • 'base64': returns a base64-encoded string.
    • 'mime': returns a base64-encoded data-url (including the MIME-header).
      (synonyms: 'data-url', 'data-uri', 'dataurl', 'datauri')
    • 'array': returns a plain Array object with uint 8 bit data.
    • default (any other value): returns the generated wav-file as an ArrayBuffer (8-bit unsigned).

There has been previous discussion as to incorporating Web Speech API with Web Audio API, for instance at RE: Interacting with WebRTC, the Web Audio API and other external sources

2) Storing and processing text-to-speech fragments.

Rather than mandating immediate output of the synthesized audio
stream, it should be considered to introduce an "outputStream"
property on a TextToSpeech object which provides a MediaStream object.
This allows the synthesized stream to be played through the
element, processed through the Web Audio API or even to be stored
locally for caching, in case the user is using a device which is not
always connected to the internet (and when no local recognizer is
available). Furthermore, this would allow websites to store the
synthesized audio to a wave file and save this on the server, allowing
it to be re-used for user agents or other clients which do not provide
an implementation.

mentioned at Can Web Speech API used in conjunction with Web Audio API?

I actually asked about adding this on the Web Speech mailing list, and
was basically told "no". To be fair to people on that mailing list, I
was unable to think of more than one or two specific use cases when
prompted.

So unless they've changed something in the past month or so, it sounds
like this isn't a planned feature.

and more recently by this user at MediaStream, ArrayBuffer, Blob audio result from speak() for recording?


Questions:

1) At this point in the development of this project or pursuit, is it more feasible to attempt to learn C++ and the methodology of submitting a patch to browser maintainers for the described implementation; that is, to make required changes at browser source code myself?

2) Continue to ask for help from authors of Web Speech API at browsers and, or developers at large, towards completion of the described implementation?

3) Both 1) and 2)?

Best Answer

At this point in the development of this project or pursuit, is it more feasible to attempt to learn C++ and the methodology of submitting a patch to browser maintainers for the described implementation; that is, to make required changes at browser source code myself?

It really depends on your commitment to this feature. The spec is still in draft, and you could try submitting feedback, however the most recent feedback was presumably from you, and you have not been answered yet. There will probably have to be more discussion about how the feature is described in the spec, and somebody will have to make the modifications (if it's decided to make your suggested change). As you've pointed out others have supposedly asked for this feature and have been rejected.

I don't think the browsers will accept a PR if it doesn't adhere to the current version of the spec draft, so I think the spec is the first step to get browser support.

Even when you get it in the spec draft, it may take a while for the browsers to review a PR from you, can it may take a lot of effort for you to contribute because you don't know C++, and browsers are very complicated software. If it gets into the spec, it may be simpler to just wait for the maintainers of the browser to implement the new features, but if you're determined to see it through, then by all means contribute the PR! You should decide for yourself if learning C++ and understanding the browser code is worth a PR that potentially will never be merged.

Continue to ask for help from authors of Web Speech API at browsers and, or developers at large, towards completion of the described implementation?

It sounds like you've already done what you can to ask for this to be implemented. You can either let it go, or you can keep trying to get your feature included in the spec. This may involve contacting the editors directly ("why hasn't my message on the mailing list been answered?"/"is the speech api draft dead?")-- it will help to show that others want the same feature. While I agree that the feature sounds like a good idea, the editors of the draft may have reasons why it's not a good idea. I found Kevin Ennis' message to the mailing list, it seemed unresolved to me, but the entire mailing list seems basically dead since they haven't answered your message from a month ago. The draft isn't on the W3 track for standardization. It's really up to your determination what you do, but #2 is definitely the first step.

Related Topic