Wednesday, May 7, 2014

Preparing to Code !


The current status is that, we already have the Pico Speech Synthesis Service on Gonk, i.e. the Firefox OS devices already have the synthesis via the pico engine.

After discussing with my mentor, Eitan Isaacson, my first step in the project, was to study the implementation of Pico service, to get inspiration for the future work.

After spending some time on the service, I have understood the basic workflow of the process. In this post, I would be explaining, or rather documenting the same, so that it helps in the future.


For any OS, Desktop support should be implemented as nsISpeechService and nsISpeechTaskCalback. When speak() is called on that interface, it is provided with a nsISpeechTask object that has methods for doing all the things that we would want to do. 

Following are the Pico specific classes:

nsPicoService :> our main service, subclasses nsISpeechService.


PicoCallbackRunnable :> a runnable class that subclasses nsISpeechTaskCallback.

Helper Classes : 


PicoApi :> acts as a wrapper for us and directly interacts with the pico library.

PicoVoice :> handles the voices


PicoSynthDataRunnable :> a runnable class that is used to send the synthesized data back to the browser.


The functions of the all the classes are defined in the following workflow:

The browser calls the speak method of the nsPicoService, with a reference to the nsISpeechTask object, along with four other parameters : text to utter, a unique voice identifier, rate to speak voice and the pitch.


The service then instantiates a new PicoCallbackRunnable object by pasing all these parameters, along with itself, and obtains a reference to that object.

Then, the PicoCallbackRunnable is executed on a new worker thread. In this process, all the text is fed to the engine in buffers of specified size and the output data from the engine is received in chunks.


These chunks are then sent to the DispatchSynthDataRunnable method. This method implements PicoSynthDataRunnable class.
This runnable is then executed on the main thread again, and it sends the synthesized data back to browser, using the functions of the nsISpeechTask object, passed to it.     


The nsPicoService is also used for other utility functions, such as, to initialize the PicoApi class, to register the voices and to load/unload the pico engine(this is done via PicoApi class ofcourse).

This is it for this week. Next week, my aim would be to test the Pico service and with that, start with the windows.

Cheers !

No comments: