An Expressive Conversational-Behavior Generation Model for Advanced Interaction within Multimodal User Interfaces

Matej Rojc and Izidor Mlakar
Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia

Series: Computer Science, Technology and Applications
BISAC: COM014000

Clear

$190.00

Volume 10

Issue 1

Volume 2

Volume 3

Special issue: Resilience in breaking the cycle of children’s environmental health disparities
Edited by I Leslie Rubin, Robert J Geller, Abby Mutic, Benjamin A Gitterman, Nathan Mutic, Wayne Garfinkel, Claire D Coles, Kurt Martinuzzi, and Joav Merrick

eBook

Digitally watermarked, DRM-free.
Immediate eBook download after purchase.

Product price
Additional options total:
Order total:

Quantity:

Details

The aim of the book is to represent a flexible and efficient algorithm and a novel system used for the planning, generation, and realization of conversational behavior (co-verbal behavior). Such behavior is best described as a set of moving body parts, which are meaningful. In terms of prosody, it is synchronized with the accompanying speech. The movement and shapes generated as a co-verbal behavior represent a contextual link between a repertoire of independent motor skills (shapes, movements, and poses that conversational agent can reproduce and execute), and the intent/meaning of spoken sequences (context).

The actual intent/meaning of spoken content is identified through language-dependent linguistic markers and prosody. The knowledge databases used to determine the intent/meaning of text are based on the linguistic analysis and classification of the text into semiotic classes and subclasses achieved through annotation of multimodal corpora based on the proposed EVA annotation scheme. The scheme allows for capturing features at a functional (context-dependent), as well as at a descriptive (context-independent) level. The functional level captures high-level features that describe the correlation between speech and co-verbal behavior, whereas the descriptive level allows us to capture and define body-poses and shapes independently of verbal content and in high-resolution.

The annotation scheme, therefore, not only interlinks speech and gesture at a semiotic level, but also serves as a basis for the creation of a context independent repertoire of movement and shapes. The process of generating the co-verbal behavior is, in this book, divided into two phases. The first phase deals with the classification of intent and its synchronization with the verbal content and prosody. The second phase then transforms the planned and synchronized behavior into a co-verbal animation performed by an embodied conversational agent (ECA). In order to be able to extrapolate intent from arbitrary text-sequences, the algorithm for the formulation of behavior deduces meaning/intent in regard to the semiotic intent. Furthermore, the algorithm considers the linguistic features of arbitrary and un-annotated text and select primitive gestures based on semiotic nuclei, as identified by semiotic classification and further modeled by the predicted prosodic features of speech to be generated by a general text-to-speech system (TTS).

The output of the phase for formulation of behavior is represented as a hierarchical procedure encoded in XML format, and as a speech sequence generated by TTS. The procedural description is event-oriented and represents a well-defined structure of consecutive movements of body-parts, as well as of body-parts moving in parallel. The second phase of the novel architecture transforms the procedural descriptions into a series of coherent animations of individual parts of the articulated embodied conversational agent. In this regard a novel ECA-based realization framework named EVA-framework is also represented. It supports a real-time realization of procedural animation descriptions and plans on multi-part mesh-based models, by using skeletal animation, blend shape animation, and the animation of predefined (pre-recorded) animated segments. This book, therefore, considers a complete design and implementation of an expressive model for the generation of co-verbal behavior, which is able to transform un-annotated text into a speech-synchronized series of animated sequences.
(Imprint: Nova)

Preface
pp. vii-xiv

Chapter 1
An Expressive Model for Generation of Co-Verbal Behavior
pp. 1-18

Chapter 2
A System for Generation of Co-Verbal Conversational Behavior
pp. 19-30

Chapter 3
The Annotation and Description of Co-Verbal Behavior
pp. 31-50

Chapter 4
Markup Language for Specification of Shapes, Poses, and Behavior of Conversational Agents - EVA-Script -
pp. 51-60

Chapter 5
Description of Motoric Capabilities of Conversational Agent
pp. 61-80

Chapter 6
Correlating Shapes and Semiotic Intent
pp. 81-92

Chapter 7
Algorithm for Automatic Generation of Expressive Co-Verbal Behavior
pp. 93-152

Chapter 8
The Framework for Realization of the Co-Verbal Behavior on Conversational Agent - EVA-framework -
pp. 153-168

Chapter 9
The Expressive Model and Multimodal User Interfaces
pp. 169-188

Chapter 10
The Embodied Conversational Agent EVA
pp. 189-214

References
pp. 215-228

Index
pp. 229-234

This book has been reviewed by the following people:

Anna Esposito, Professor, Seconda Università di Napoli, Dipartimento di Psicologia, Italy. To read the review, click here -

Bruno Apolloni, Professor, Dipartimento di Informatica, University of Milano, Milano, Italy. To read the review, click here -

Igor S. Pandžić, Professor, University of Zagreb, Croatia. To read the review, click here -

This book is for researchers and students: co-verbal behavior generation, multimodal interfaces, annotation schemes, embodied conversational agents, text-to-speech systems, web technologies, artificial bodies. For IT/IOT developers especially in fields of human-machine interaction, user-centric interfaces, multimodal user interfaces, IPTV developers, developers of smart home platforms, developers of platforms and interfaces for assisted living

You have not viewed any product yet.