There is another set of standards called vXML (Voice XML) and ccXML (call control XML) that have also come on scene recently. Whilst these standards are useful in what they do, they are not really the answer to all the problems that face us. For example, when a vXML application is interacting with a speech recognizer, there doesn’t seem to be a clean way of specifying which grammar to use, especially when the grammar is big and you probably need just a “lever” to tell it to load a particular grammar, and not send the “entire” grammar over the wire.
The solution that I saw in a couple of implementations is that they passed out of band data or used some other handshake protocol to do this. Obviously this is less than a desirable situation to be in. If one has to leverage the full power of voice, then one has to do something more than just build standalone voice applications that have difficulty interacting with internal and external applications.
Comments