Process Modeling Meets Voice Interaction
Talking for Modeling Processes
«Alexa, turn on Netflix».
Hoy en día, cualquier persona, incluso sin conocimientos técnicos, puede hablar con un asistente de voz inteligente y sentir que la tecnología forma parte de su vida. Asistentes como Alexa son capaces de reproducir música, proporcionar información, ofrecer noticias y resultados deportivos, informarte del tiempo, controlar tu casa inteligente y también pedir productos online.
There are two main factors contributing to the success of companies like Amazon in our daily life. On one hand, coverage: they provide a new paradigm of interaction easily performed by anyone, no matter their age, race or religion. On the other hand, predictivity: one can anticipate the speech recognition of pre-defined sentences, and plan ahead the expected reaction. These reactions do not need to be simple; for instance, sentences like «Alexa, play Nothing else matters«, «Alexa, ask Uber to request a ride», or «Alexa, turn on the light«, will generate very different reactions.
In the 1968’s Stanley Kubrick’s movie «2001: A Space Odyssey», Dave interacts fluently with HAL 9000, and the computer is able to reason and react accordingly (sometimes in a way that it does not follow Dave’s intentions). I remember the first time I watched Kubrick’s movie – it was mid nineties – , I was sure that the technology in the future would be like this. But so far, it seems current human-machine voice interaction technology has not reached Kubrick’s expectations yet.
Aún queda mucho camino por recorrer antes de disponer de un ordenador con capacidades de interacción totalmente libres de artilugios, como HAL 9000. Sin embargo, ha llegado el momento de permitir la interacción por voz más allá de las aplicaciones de ocio. En Process Talks hemos desarrollado una plataforma de modelado de procesos que, además de otras funcionalidades novedosas, permite al modelador modelar con sólo decir sus intenciones a nuestro asistente de voz. Vea este vídeo en el que mostramos lo fácil que es utilizar la interacción por voz en Process Talks:
Speaking to Process Talks is as simple as sending a voice note in chat applications like Whatsapp. Just press and hold the microphone button, and speak in one of our supported languages: All without leaving the same collaborative modeling session! Your voice commands then safely travel encrypted through the network, and are handed to our reasoner engine, who decides the best way to fit them into your process model.
Understanding human language via voice brings lots of new challenges when compared to text. At Process Talks, we are still working on improving our speech recognition. One of the biggest challenges for speech recognition is capturing open input, i.e., speech input with undefined fragments where users can say anything they think of. In process modeling, the clearest example of open input are the activity names: Process modeling is a widespread practice, which makes it very difficult to anticipate the kind of language people will use when describing the activities. This is why we currently sort out this problem by simply assuming that elements like activities have been added previously. Voice interaction then uses their identifying number: A predictable voice snippet that our speech recognizer can easily understand. Notice how, for Alexa, utterances like «Nothing else matters» are not really open input, as the speech recognition technology has been pre-trained to understand this popular three word sequence, corresponding to this amazing song by Metallica. Still, even just Metallica already has around 151 songs, so how can Alexa understand all of them? The trick is simple: One would expect only 5-10 Metallica songs are the ones people remember, so speech recognition training is only done for a selection of the most popular songs.
We plan to explore how voice interaction can augment modelers when modeling alone, but especially in collaborative sessions arising from a discovery workshop. Currently, we support voice commands in English and Spanish, but depending on their adoption and demand, we may incorporate other languages as well.
Una implementación completa y profesional del modelado de voz para BPMN 2.0 puede tener adoptantes sin precedentes. Por ejemplo, con una tecnología de este tipo, las personas con deficiencias visuales o motoras podrán crear por primera vez diagramas BPMN sin problemas. Las oportunidades son innumerables.
¿Se atreve a intentarlo? Programe una demostración con nosotros en este enlace.
Photo by Stephen Harlan on Unsplash