STRAIGHT information (This page is obsolete. New page is here. )

[Wakayama University]

STRAIGHT is a versatile speech manipulation tool invented by Hideki Kawahara when he was in ATR. A series of refinements and developments were conducted in the "Auditory Brain Project" under CREST program sponsored by JST.

This page on CD-ROM is modified to meet educational requirements for EDUCATION ARENA of Eurospeech'03 Geneve.

Visualizations of underlying principles

STRAIGHT is based on a simple channel VOCOER. It decomposes input speech signals into source parameters and spectral parameters. Successive refinements on each parameter extraction procedure enable the total system to resynthesize high-quality speech. These refinements provide educational case studies on how abstract signal processing concepts are applied to procees real world signals. (Note: Visualizations are encoded in DivX format. Please check the following URLs for players/decoders or a list ofsoftwares.)

Extended pitch synchronous analysis (visualization: 115MB)
Usual spectral analysis methods fail to represent smooth nature of voiced sounds because of interferences due to periodic excitation. This visualization illustrates how pitch synchronous analysis is extended to selectively eliminate the periodicity related interferences.
Instantaneous frequency and its use in wavelet based fundamental frequency estimation (visualization: 121MB)
Usual definition of period is not relevant for signals with changing periodicity like speech signals. Instantaneous frequency is an extension of frequency concept defined as the speed of phase change. This visualization illustrates how instantaneous frequency concept is relating to sinusoidal components of complex sounds and how wavelet analysis provides a mean to uniquely extract fundamental component. By definition, the instanteneous frequency of the fundamental component is the fundamental frequency (F0) of the speech signal under study.
Group delay and minimum phase component for representing source characteristics (visualization: 143MB)
Usual phase representation is sometimes misleading for understanding source characteristics. Group delay defined as a delivertive on the frequency axis is more intiutive representation by taking it as a centroid of energy at each frequency. Combining this interpretation with a concept of minimum phase concempt provides a very instructive interpretation of speech excitation. It illustrates when vocal tract excitation takes place at each frequency in a visually intuitive manner. Excitation source signal of STRAIGHT is designed based on this interpretation and approximation.


Affine transformation
The basic application of STRAIGHT is to modify fundamental frequencies, the frequency axis and the time axis of a speech sample independently in a proportional fashion. The following examples illustrates results of these basic manipulations.
Arbitrary transformation
Transformations applicable to STRAIGHT parameters are not necessarily proportional. Nonlinear and non-stationary transformations of parameters are allowed unless they do not violate physical feasibility of the modified representations.
Auditory morphing of speech sounds
Auditory morphing is to transform one speech example into the other speech example in a parameterized manner.


Notes on availability

Essential components of STRAIGHT are patented by ATR and JST. When a company is interested in using STRAIGHT as a research tool, a written agreement between JST and the company is necessary for the company to get access to the codes and the technical documents. (Please contact the JST office for details.) For non-commercial research and/or educational institutes, please contact the web-master.

Last update: 12 October, 2003