[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

53rd IETF - Control of ASR and TTS Servers BOF (cats)

Control of ASR and TTS Servers BOF (cats)

Thursday, March 21 at 1300-1500

CHAIRS: Eric Burger <eburger@snowshore.com>
        David Oran <oran@cisco.com>

Mailing Lists:
 Send "subscribe mrcp" in the body to "majordomo@snowshore.com".

  o Agenda Bashing
  o Requirements/Problem Statement / Overview / Q&A
  o Open Issues
      o Scope of the work
      o What are special security issues for ASR/TTS control?
  o Draft Charter Discussion

This BOF will examine protocols to support distributed media
processing of audio streams.  There are multiple IETF protocols for
establishment and termination of media sessions (SIP, SDP), and media
record and playback (RTSP).  The focus of this BOF is to develop
protocols to support Automated Speech Recognition (ASR) and rendering
text into audio, a.k.a. Text-to-Speech (TTS). The BOF will only focus
on the distributed control of ASR and TTS servers.

Many multimedia applications can benefit from having ASR and TTS processing
available as a distributed, network resource.  To date, there are a number
of proprietary protocols for ASR and TTS control in the net.  Several
IETF drafts have been floated as well.  The existing solutions (under
the name Media Resource Control Protocol) have some serious
deficiencies.  In particular, they mix the semantics of existing
protocols yet are close enough to other protocols as to be confusing
to the implementer.  The confusion reflects possible unclarity of the
requirements and architecture, so this BOF (and potential working
group) will start from a careful discussion of scope and requirements.

The proposed work will not include distributed speech
recognition (DSR), as exemplified by the ETSI Aurora project.
The work proposed for ASR/TTS is part of a control plane for
speech applications, and hence complements data plane work such as DSR.
The latter seeks to improve recognition performance while reducing
bandwidth requirements, compared to using conventional vocoders. The
control protocol is in fact insensitive to the choice of data plane
encodings for recognition and speech synthesis, in the same way that
signaling protocols such as SIP and H.323 work identically for any data
plane media encoding.

The proposed working group will develop an informational RFC detailing the
architecture and requirements for distributed ASR and TTS control. The
working group will then examine existing media-related protocols, especially
RTSP, for suitability as a protocol for carriage of ASR and TTS control.
Then, the working group will propose extensions to existing protocols or the
development of new protocols, as appropriate, to meet the requirements
specified in the informational RFC.

The protocol will assume RTP carriage of media. Assuming session-oriented
media transport, the protocol will use SDP to describe the session.

The proposed work will not re-create functionality available in other protocols,
such as SIP or SDP. The working group will bring any requirements for changes
of existing protocols, with the possible exception of RTSP, to the appropriate
IETF working group for consideration. This working group will explore
modifications to RTSP, if required, but must participate in the current
revision work on RTSP to assure that a reasonable framework exists for
those changes.

IETF working groups to be coordinated with are SIPPING, MMUSIC and AVT, as
well as ensuring architectural consistency with VPIM and other applications
working groups in a more general sense.  In addition, the proposed
working group will coordinate so that there is consistency (or at least
good dialogue) with speech-related efforts by WC3, such as their
Multimodal Interaction activity.

The intention is to disband the work group within one year of chartering.

Required Reading for the BOF:


Additional Strawman Documents: