Jul 2, 2015

Writing a Festival TTS for NVDA : Lessons learned


Attempted Festival TTS Integration with NVDA


For last 6 months, I’ve been spending lot of Saturdays in trying to integrate Festival TTS with NVDA Screen reader. This report summarizes the effort. TLDR version is that it’s not a great idea, and I’ll be trying flite addon next.

The Hindi voice and lexicon analyzer,


There is a certain voice for Hindi lanauges that was required to be added to NVDA. The nice folks developed the voice files and the lexicon generator, and had it running in Linux. Since Orca has a very nice festival integration, it was also able to use the Hindi voice files.

On the NVDA side,


There is a nice Festival addon for NVDA, with Russin lanauge, working with 2010 version of NVDA. That plugin had lot of code in C++ and Scheme, and drives the sound output itself. On commparision with the espeak integration of NVDA, i concluded that there was too much code in the existing plugin, and it’s possible to make things easieer. I decied to model the addon in the way the espeak integration has been designed, and use the nvwave.py (part of NVDA ) to play out the wav files.
I believed that pause and cancel can be achieved only in this way, but I can be completely wrong here.

Using the nvwave.py


The nvwave.py uses python ctypes to call the Windows multimedia functions to play the wave data. With espeak, the wave data is delivered as a callback, and then played out. The source file _espeak.py implements this functionality. We’ll name this callback based wave-data delivery as audio steraming. One interesting thing is that nvwaev.py calls the Windows audio playback functions in blocking call, i.e. the functions returns only when the complete data has been played back. Now, if the espeak’s synthesis ( text to wave data ) and this blocking playback is happening in same thread, some pauses in playbacks can be expected. My theory is that, either espeak is just too fast, or it’s actually running synthesis in seperate thread. Another theory is that i’m just missing something completely here.

Festival on Windows


One important point to know is, that festival is not meant to be used as a production TTS, and is a vehicle for implementing and experimenting with various aspects of TTS, from lexicon generation to synthesis. The main use case of the code base is to make a binary on linux, and trying to make a DLL on Windows is like trying to push a round peg in a square hole. Fortunately, the CMU guys have provide a good way to create the makefiles within cygwin, and compile using Visual C, which works great for making the speech tools and festival binaries.

Integrating with NVDA


Once basic Festival for Windows is there, there are two options, one to use the client server mode of festival, and second is to try to make a DLL out of festival. I decied to first try out the DLL model so that I can leverage the current espeak
codebase. It’ll be worth attempting the client-server model too, although it’ll require putting some clever hooks in NVDA to manage the festival server lifetime.

All the intelligence in festival is implemented in the Scheme language, and I wrote a very simple C layer over festival API, to evaluate scheme expressions and return the result. There was another function to synthesis text to wave data. Now when NVDA would give a large text, to synthesis, it’ll take it’s own time, and start playing back after a delay, which was unacceptable.

The Audio streaming


The festival doesn’t provide the audio streaming api, so i hooked in the HTS engine ( the Hindi voice file was an HTS file ), to collect the wave data as it’s generated, and deliver to nvda , just like espeak implementation. The problem was that there was some post synthesis resampling happening in festival, which was not happening with this callback approach. Also, because festival synthesis ( which is little slow due to all the file based data exchange and scheme overhead) and the blocking callback were in the same thread , the voice was streched out ( like running audio at a very slow speed ). The obvious thing to attempt is to run the festival in a different thread. With this approach, there were random memory corruptions happening, even with festival DLL compiled in ‘multi threaded DLL’ mode.

The parting comments,


The performance of Festival will remain in a challange in my opinion. This will be because of lot of file based exchanges of data even within festival codebase, and use of scheme language, and calling out the lexicon analyzer as a binary for each of the word. On most windows, there are various anti malware and anti virus software running which scan all file IO in real time, making them slow.
Over all, I’m not convinved that a robust and usable for average NVDA user can be created with this, so I don’t plan to work on this more.

I’ll move on to develop a flite addon with all the knowledge i’ve gained in doing this project. Flite is from festival authors, and has been designed to be embeddedm and used in production. It’s also gaining Indian languages slowly and steadily.

For the brave soul,


If anyone want to pursue festival+NVDA further, they should so. Here are some suggestions
1. revisit code base of 2010 festival NVDA addon. May be it can be used.
2. revisit using festival binary server mode. It still won’t give audio streaming.
3. Update the C code of festival-dll-wrapper, to do things in a seperate thead, but callback in the main thread. I tried doing this in python, and this will attempt doing it in C. This would also require to debug the multi-threaded-dll behaviour of festival code.

If robust audio streaming is achieved, there will some other things, like getting rid of printf/couts from festival code, implementing pitch and rate control etc, which are the easier part.

Jan 28, 2015

Genuine Free Windows for your Linux Box.

Spoiler: http://modern.ie has free and ready to use virtualbox images.
This page give quick steps on how to use the images with QEMU on an Arch Linux, including Voice, Folder share, and NAT ( i.e. no bridge )

Making qemu disk

Download your favorite Windows version,. While the single zip download is good, the problem is that the server doesn't support the HTTP range requests, so it's always a full download if connection breaks in the middle.
The `wget -i` friendly Batch file Download or Grab them all with cURL are much better options.


$ wget -i https://az412801.vo.msecnd.net/vhd/VMBuild_20141027/VirtualBox/IE11/Linux/IE11.Win8.1.For.Linux.VirtualBox.txt
$ cat IE11.Win8.1.For.Linux.VirtualBox.zip.00* > W81.zip
$ unzip W81.zip
$ tar xvf IE11\ -\ Win8.1.ova
$ qemu-img convert -f vmdk -O qcow2 IE11\ -\ Win8.1-disk1.vmdk  w81-orig.img

The image will expire in three months, and it's a good idea to keep the orig image. All files except w81-orig.img can be deleted.

Running Windows under Qemu 

Armed with a copy of the disk image created above, and qemu, you are ready to roll.

$ cp w81-orig.img w81.img
$ qemu-system-x86_64 -enable-kvm -hda w81.img

This should bring up something usable. The reasons one needs to run Windows within Linux may vary. Some applications may require multiple network cards, multiple sound cards, direct USB access etc.
I'm using this setup, primarily to work on the NVDA screen reader ( I am working with the India team), so working network and sound is main requirement. Also, it's a good idea to get shared folders working, so that common stuff can be kept on host directory. This allows me to avoid cygwin, and manages files from host. Unfortunately things like MSVC/SDK will still need to be installed, as they do not have the portable variant. Also, I shy away from bridges, and NAT is good enough for my purpose.

$ qemu-system-x86_64 \
         -smp cpus=2 \
         -m 2048 \
         -soundhw ac97\
         -enable-kvm \
         -hda w81.img \
         -device e1000,netdev=user.0 \
         -netdev user,id=user.0 \
         -net nic \
         -net user,smb=/data/nvda

This should boot up the Windows. I also tend to add -cdrom for the MSVC ISO to for first time install.

This command requires qemu, samba and pulseaudio to be installed.

First time boot

Windows should boot up, and log in the default user.
The Audio drivers can be installed by firing the device manager, and selecting 'Update driver' from context menu on Multi Media device.
The host shared drive can be accessed by

C:> net view \\10.0.2.4
C:> dir \\10.0.2.4\qemu
C:> net use h: \\10.0.2.4\qemu /persistent:yes

Troubleshooting

If sound doesn't work properly try emulating other cards. `qemu -soundhw list` will list available hardwares.

And lastly, the device / netdev / net-nic / net-user usage is confusing , a better command line will be highly appreciated.

http://modern.ie is doing a great service with the images, and even though they only dish out the Virtualbox images, it's easy to use the images in KVM.






Jun 7, 2014

Installing emacs24 on Debian wheezy , the proper way

A  google search for wheezy emacs24 gives various blogs, almost all of which end up with 'make install' rather than 'dpkg -i ....deb' .
Here is some simple steps to make a deb package of emacs24 for any debian version. The constraint is that we don't want to needlessly upgrade the rest of wheezy.

Getting the required files
First, get the source files.  From the emacs24 package page, download the source packages. The spec file is not really needed. Get the emacs24_*.orig.tar.bz2 and emacs24_*.debian.tar.xz
(I'm not providing the link to the files as they will soon go out of date )


Unpacking the files

The good old 'tar' can extract both the files.

tar jxvf emacs24_*.orig.tar.bz2
cd emacs24_*.orig
tar Jxvf ../emacs24_*.debian.tar.xz 

Note the 'J' option to tar to extract the 'xz' file, and 'j' to extract the bz2 file.

Installing the dependencies

We are going to use the 'debuild' wrapper. 

sudo apt-get install devscripts

Also, look at the debian/control for Build-Depends: line, and make sure
they are installed.  One can use the command dpkg-checkbuilddeps for checking this.

One thing I noticed that libgnutls28-dev is not available in wheezy, so change it libgnutls-dev, which is available in wheezy.

Do the build

This is really a single command 'debuild' and a wait.

debuild

After some time, there should be several deb files in the directory above.
cd ..
ls -1 *deb
emacs24-lucid_24.3+1-4_amd64.deb
emacs24-lucid-dbg_24.3+1-4_amd64.deb
emacs24-nox_24.3+1-4_amd64.deb
emacs24-nox-dbg_24.3+1-4_amd64.deb
emacs24_24.3+1-4_amd64.deb
emacs24-dbg_24.3+1-4_amd64.deb
emacs24-bin-common_24.3+1-4_amd64.deb
emacs24-common_24.3+1-4_all.deb
emacs24-el_24.3+1-4_all.deb

The installation

There are multiple tools like debi/gdebi etc. but this is what you essentially need to do:

sudo apt-get install emacsen-common
sudo dpkg -i emacs24_24*deb emacs24-common*deb \
emacs24-bin-common*deb

You may want to purge existing emacs23 from the system before installing emacs24

The drumrolls 

emacs -version

And finally

a request ... never ever ever do a 'make install' as root on your debian. There is always a way to make a deb, which can be later uninstalled cleanly.
 

Oct 21, 2013

biz idea : crowd sourced education site

This continues on my MOOC vs Crowdsourced theme. The problem I'm trying to fix is that MOOC is still like a 'hill top' institution, where a group of people decide on the content, and there is no evolution by genetic mutation, and it's very much like the closed source software industry. In this scheme  content does not improve like wikipedia.

And most of the people want certification more than education, so the hill top institutions backed MOOC's will continue have a strong customer base, as long as certification will be backed by the institution. On the other hand, a crowd source site ( like wikipedia) will have no certification, authenticity, but will have better quality of content and overall learning.

In my opinion, what is needed is a github for nano-sized self study modules, where anyone can fork, patch and provide a pull request and the content is continuously updated. Also, the packaging of  nano sized self study units into a course, very much like the linux distribution packaging, will be needed. So things like CBSE - Physics - XII can be created out of several self study modules, which have been authored and improved by people world over, and in possibly their native languages.

This bring out another feature of MOOC's ... complete reliance on videos and/or audio, and that these non-textual media are incompatible with the my scheme.  In my opinion, videos were merely the first step, taking out best teachers and broadcasting them to the world. However, it's non trivial to improve these content once they are recorded.  I'ld argue that current generation of online education is still extending the old school thought on to the new medium of internet, rather than redesigning learning for the masses, by the masses.

Now the technology and processing power have reached the level, where it's possible to provide a personalized online environment suited for each individual, for a huge internet population, and let the population improve the content. This platform will be a massive convergence of the best in text to speech, automatic language translations, animations, and possibly more technologies.

There are couple of business models which an be around this. All the crowd generated content has to be freely accessible. The software that drives all this may or may not be open sourced, but the data formats for the content have to be open royalty free applications. Revenue generation is possible by charging for premium courses ( dedicated courses for  certification courses).


This post has gone too long already  , will probably spill the beans in next entry.

Keep learning,

biz idea : excel to app ( desktop / mobile / web )

From 1920's to  1960's in US  , in the age of manual telephone exchanges, the job market for 'switchboard operators' was booming. Some famous editorial remarked that it looks like everyone in US will eventually become a switch board operators. And soon , it was proven true, imagine .. we actually have to dial the number we want call !!

Something very similar has been happening to the programming. The fact is, that today  Microsoft Excel is perhaps the easiest to use functional programming platform. People have built a great variety of data systems using Excel.

This post is about a thought process of creating a service where use can upload a  Excel file, and download desktop/web/mobile app for that. Let Excel be the IDE  (-: 
Let me call this service  ExcelAsIDE ( EAI) for the time being...

Flow

The App should be doing a true render of the formatting , formula, charts etc, including the dynamic data sources.
As easy as it can be. Just upload the xlsx file. Configure the data sources if any. Decide how the outputs cells / charts will show up on the screen. And download / run the app in the next screen. It should be possible to generate apps for all screens , cpu's and OS's

Technology

Hmm.. the more interesting stuff . Perhaps the biggest stuff this will require is a VBA compiler, and an implementation of complete Excel API. I'm sure there is some Microsoft licensing stuff here, which will have to be worked out.
The good thing is that all the information is available and open. Office Open XML file format , Excel API, VBA .. everything is available.




Oct 18, 2013

biz idea : come-in .. secure screen sharing site.

For Windows world, we have several screen sharing and remote controlling services, like team viewer, google hangout screen sharing, copilot etc. However, no such service exists for the Unix family. The good thing is, that all client side software is already present on the desktop, and this a pure server software requirements.

This service is being called come-in for this document.

Use cases

The first use case is, to allow a remote friend to login in the machine and do some stuff. The remote friend can use screen / tmux / vnc to share screen with local person, or just some magic command and exit.
 The 'Needy' is the person requiring assistance, and 'Friend' is the one who will will remotely login to help out. 
  1. Needy pings Friend on IM or in any other way. 
  2. Friend shares his public key with Needy. 
  3. Needy runs the come-in client, and enters Friend's public key. 
  4. Friend runs come-in client and invokes 'Connect'
  5. Friend is logged on to Needy's machine.
A instant 1:1 session should be possible without creating site login. The Friend's public key will be stored by come-in client, so needs to be entered only once.

Another use case is remote access of several machines by a group of person. These will use site  logins and registrations of machines. The server data stored will be only for lookup of group of machines to group of people.

Security considerations

The closed sourced software and services work  on the assumption that their servers are impenetrable. And moreover they want users data in plain text on there servers for some evil reasons.
Now as we are not evil, we wouldn't want to have any user data in plain text on our servers. We work on assumption that our servers and storage is vulnerable1, and even in that case, user's machine connected to server should be safe and secure. 

Architecture

The come-in client is wrapper over ssh client. Several ports will be setup as  forward / reverse proxy.  Some of these ports will be client to client, where as some may be client to server. The client to server connection is only used for identification. It should be noted that even if the server is malicious, it'll not be able to fool the client into connecting to a non-desired client, because the client to client connection, the data connection, is seprate from this client to server connection, and is encrypted end to end.

The come-in server needs to have extra functionality, of a 'matcher' and of a 'proxy'. It's possible the come-in server is an especially patched ssh server, or just a configuration of ssh server along with separate proxy and matcher program.

The come-in client will be able to wrap screen/tmux/vnc to provide a finished product feeling

Business Angle

The wise folks have told us that the value is discovered by customers, so I don't want to go in there.

The development of basic ssh sharing is about 100 hours of work. Server deployment can be taken as another 50$ per month.

This can work on a freemium model. The 1:1 can be free for upto 1 hour or so. The persistent connection model can be paid for something like 1$ per machine and 1$ per user per month.  This will also require to implement MIS like stuff, and some more fanciness which is another 100 hours of work.


So, it should be possible to bring up this service in 1 month flat.


Game for it ?

Oct 16, 2013

git flow for NVDA India branch


[ Note: This information is moving to http://code.google.com/p/saksham-projects/wiki/InNvdaSource ]


NVDA is a open source screen reader for Windows. There is a project underway right now for making sure the feature set required for Indians are contributed to it. This post is an attempt to describe the process and tools.

Flow

NVDA has it's own git repository. The 'master' branch is always ready for release, and all the work happens on the per-ticket branch, which are based on master. The  incubation branch is called 'next', on which the ticket-branches linger for testing.

For their work, India team has a repository in bitbucket. This repo completely mirrors the official NVDA repository. It also contains the per-ticket branch that India team members are working on. Finally, it also has the 'in_next' branch, which contains only the work being done by India team members.

When India team members are satisfied with the code in a particular branch, they will update the ticket with the branch / repo information, and also drop mail to nvda-dev mailing list about the feature. From then on,
NVDA maintainers can start their process of incubation in next and then moving in the master for release.

This flow requires the work to be kept 'live' in two branches, the feature branch, as well as the in_next branch. Every few weeks, it's required to merge master into the ticket-branch, and merge ticket-branch into the in_next.

Details

This section lists describes the conventions and git commands
NVDA has the convention of naming branch  as  't1234' for the ticket #1234.
There is another thought process, that branches should be named like 'in_t1234', so that they are different than the one that NVDA team uses.
This will help India  team member identify each other's work.
So any branch that exists in NVDA-India repository, but not in NVDA repository has a 'in_' prefix, and the same purpose as corresponding NVDA repository

Setting up the Repo 

In a empty directory, run following commands:
git init
git remote add nvda git://git.nvaccess.org/nvda.git
git remote add ours git@bitbucket.org:manish_agrawal/nvda.git
git fetch --all
git pull nvda -u master
git pull ours -u in_next
At this point , open readme.txt to see how to compile NVDA.

Starting on a ticket branch

git checkout master  
git pull
git checkout -b in_t1234
git push ours -u in_t1234

Updating work to in_next

git checkout master
git pull
git checkout in_t1234
git merge master
git commit -a
git push
git checkout in_next
git merge master
git merge in_t1234
git push

These instructions and background should be sufficient for someone to get started in contributing to NVDA via the NVDA-India repository.

Some handy git commands

To see exact diff between two branches, use branch1 dot dot branch2

git diff master..t1234
git diff master..
git diff master...t1234
git diff master...

To move back a branch from 'next' to 'master' , we'll have to use the old diff-patch mechanism


# Get the old branch
git checkout t1234
# bring it up to date, replace nvda by the name of nvda remote
git merge remotes/nvda/next
# save the differences
git diff remotes/nvda/next... > patch.1234

# switch to master
git checkout master
# create new branch
git checkout -b in_t1234
# apply the patch
patch < patch.1234
# review
git diff
# commit, push etc..  


List of tickets contributed by India team

git branch -a | grep in_ 

Selecting base branch

Unless really required, the base branch should be master.
If some work is common to multiple tickets, a common branch should be created, rather then
duplicating code in multiple branches.
If the base branch is NOT Master, than branch should be named as 'in_t1234_basebranch'
Git does not provide a way to find the base branch later, so naming convention is important.