Voxforge-speech recognition

This post would share about a wonderful project I have been following for over 2, 2 and a half years and the unique opportunities and challenges the project faces.

Voxforge is a unique project I have been following for over 2 years. The biggest opportunity it can bring to the fold of computing is bringing huge masses to digital computing who are illiterate as well as lazy into the fold of digital computing. There are many who are not able to get into computing either because they don’t know how to type or are loathe to type out blogs or other written material. This would bring millions to the land of computing who are not able to take advantage of distributed news and computing due to interface constraints. An interesting listing of use-cases can be seen in the Ubuntu wiki entry for speech recognition .

Let’s go sideways for a bit to some stats I stumbled upon.

India’s net penetration is 0.01% is similar to net penetration of Sudan and Syria. Our average speed is 849 Kbps . China, our eastern neighbor has similar speeds but they have more than a quarter of their population on the web (even though its tightly controlled) . Numerically we have the third largest number of people with 26% people under 256 Kbps. In this regard we are surpassed only by Syria and UAE . Infact, the only upbeat stats. which I came to know that we have around 5.8% of the estimated 3.5 million net users have 2 Mbps and above.

These stats are from Akamai’s State of Internet Report ended 1st Quarter 2010. These stats are relevant to voxforge as the contribution the project needs are speech corpuses . What would have been more interesting is if they had added the upload speeds to the stats as well.

While the voxforge project has been active it hasn’t been as active as I would have liked to see it. Some of the issues I see with the project are as follows, some of the issues would be also from not good network infrastructure countries like India.

a. Repository on Edge of networks :- The raw speech file has to be uploaded to a server somewhere in Europe. It would have been lot easier or better if the project ties up with network mirror providers and one could upload the file to the nearest mirror.

b. Repository of language :- I don’t know if people have thought of this. From what little I know there are atleast 5-6 types of known English. Indian English being a case in point. It would have been interesting as to whether the types of English affects the efficiency of the acoustic models which the developers are trying to develop ?

c. Devices and Noise :- I had been fortunate to be part of couple of Community Radio events which gave me a little bit of more nuanced understanding of voice recording. I do see that there is a survey of Microphones for recording but it needs to do little bit more of that. Share with people which of these are better/worse alongwith pricing. Also explain about noise Many of my friends are into designing low-noise desktops (which is a niche market) and the cheapest systems are at 50 K (Indian prices) . This is again a barrier.

d. Processing :- If you look at the uploads page, there is something called Processing . It would have been good if they were to share what the processing is all about.

e. Size :- Throughout the site, there is no average or even some idea of how big a single raw audio file would be. There could have been a gutenberg story as a reference point and an average of people who spoke the same story so atleast one knows for e.g. a 5 min. story could take potentially 100 Mb or whatever it might be. The pain (upload time) taken on different speeds would also have told people what to expect.

f. ftp :- FTP implementations are themselves something that needs to be looked at. For e.g. this idea at ubuntu tells the issue as it exists.

g. Social Media :- The site and the idea needs to reach more and more people. They need to use social media and blogs to reach more people who have the needed equipment and low latency high-bandwidth connections.

All in all, a great project with great potential but needs to drastically improve how they are doing it. I really would love to see some more activity on the site.

2 thoughts on “Voxforge-speech recognition

  1. Nice post! If anyone would take up Hindi, Tamil or Indian English part of Voxforge, that would be just amazing.

    > It would have been good if they were to share what the processing is all about.

    Recording is checked for contents (unfortunately manually) and placed into repository. Indeed, it would be nice to drop this processing stage altogether.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.