Linguistic documents synchronizing sound and text

Michel Jacobson; Boyd Michailovsky; John B. Lowe

Journal Articles Speech Communication Year : 2001

Linguistic documents synchronizing sound and text

(1) , (1) , (1)

Michel Jacobson

Function : Author
PersonId : 4226
IdHAL : michel-jacobson
ORCID : 0000-0003-2310-5692
IdRef : 099367637

Langues et civilisations à tradition orale

Boyd Michailovsky

Function : Author
PersonId : 957269

Langues et civilisations à tradition orale

John B. Lowe

Function : Author

Langues et civilisations à tradition orale

Abstract

The goal of the LACITO linguistic archive project is to conserve and to make available for research recorded and transcribed oral traditions and other linguistic materials in (mainly) unwritten languages, giving simultaneous access to sound recordings and text annotation. The project uses simple, TEI-inspired XML markup for the kinds of annotation traditionally used in field linguistics. Transcriptions are segmented at the levels of, roughly, the sentence and the word, and annotation associated with different levels: metadata at the text level, free translation at the sentence level, interlinear glosses at the word level, etc. Time alignment is at the sentence (and optionally the word) level. To minimize in-house development and maintenance, the project uses standard software to the extent possible. Marked-up data is processed using widely-available XML/XSL/XSLT/XQL software tools, and displayed using standard browsers. The project has developed (1) an authoring tool, SoundIndex, to facilitate time-alignment, (2) a Java applet which enables standard browsers to access time-aligned speech, (3) XSL stylesheets which determine \"views\" on the data, and (4) a simple CGI interface permitting the user to choose documents and views and to enter queries. The paper describes these elements in detail. Current objectives are further development of the annotation with a view to linguistic research beyond simple browsing, and of a querying system (using a standard XML query processor) to exploit the annotated material.

Keywords

speech annotation field linguistics

Domains

Linguistics

Fichier principal

speechCom33.pdf (639.95 Ko)

Michel Jacobson : Connect in order to contact the contributor

https://hal.science/hal-00005544

Submitted on : Wednesday, June 22, 2005-3:32:53 PM

Last modification on : Tuesday, April 2, 2024-3:48:04 PM

Long-term archiving on: Thursday, April 1, 2010-9:44:58 PM

Dates and versions

hal-00005544 , version 1 (22-06-2005)

Identifiers

HAL Id : hal-00005544 , version 1

Cite

Michel Jacobson, Boyd Michailovsky, John B. Lowe. Linguistic documents synchronizing sound and text. Speech Communication, 2001, 33, p. 79-96. ⟨hal-00005544⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-PARIS3 INALCO LACITO CAMPUS-AAR AAI ASIES_ET_PACIFIQUE

222 View

339 Download

Linguistic documents synchronizing sound and text

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Share