Linguistic documents synchronizing sound and text - Archive ouverte HAL Access content directly
Journal Articles Speech Communication Year : 2001

Linguistic documents synchronizing sound and text

Abstract

The goal of the LACITO linguistic archive project is to conserve and to make available for research recorded and transcribed oral traditions and other linguistic materials in (mainly) unwritten languages, giving simultaneous access to sound recordings and text annotation. The project uses simple, TEI-inspired XML markup for the kinds of annotation traditionally used in field linguistics. Transcriptions are segmented at the levels of, roughly, the sentence and the word, and annotation associated with different levels: metadata at the text level, free translation at the sentence level, interlinear glosses at the word level, etc. Time alignment is at the sentence (and optionally the word) level. To minimize in-house development and maintenance, the project uses standard software to the extent possible. Marked-up data is processed using widely-available XML/XSL/XSLT/XQL software tools, and displayed using standard browsers. The project has developed (1) an authoring tool, SoundIndex, to facilitate time-alignment, (2) a Java applet which enables standard browsers to access time-aligned speech, (3) XSL stylesheets which determine \"views\" on the data, and (4) a simple CGI interface permitting the user to choose documents and views and to enter queries. The paper describes these elements in detail. Current objectives are further development of the annotation with a view to linguistic research beyond simple browsing, and of a querying system (using a standard XML query processor) to exploit the annotated material.

Domains

Linguistics
Fichier principal
Vignette du fichier
speechCom33.pdf (639.95 Ko) Télécharger le fichier

Dates and versions

hal-00005544 , version 1 (22-06-2005)

Identifiers

  • HAL Id : hal-00005544 , version 1

Cite

Michel Jacobson, Boyd Michailovsky, John B. Lowe. Linguistic documents synchronizing sound and text. Speech Communication, 2001, 33, p. 79-96. ⟨hal-00005544⟩
222 View
339 Download

Share

Gmail Facebook X LinkedIn More