STRAIGHT information (This page is obsolete.
New page is here.
)
[Wakayama
University]
STRAIGHT is a versatile
speech manipulation tool invented by Hideki
Kawahara when he was in ATR.
A series of refinements and developments were conducted in the
"Auditory
Brain Project" under CREST
program sponsored by JST.
This page on CD-ROM is modified to meet educational requirements for EDUCATION ARENA of Eurospeech'03 Geneve.
Visualizations of underlying principles
STRAIGHT is based on a simple channel VOCOER.
It decomposes input speech signals into
source parameters and spectral parameters.
Successive refinements on each parameter
extraction procedure enable the total system to
resynthesize high-quality speech.
These refinements provide educational case studies on
how abstract signal processing concepts are applied to
procees real world signals.
(Note: Visualizations are encoded in DivX format.
Please check the following URLs for
players/decoders
or
a list ofsoftwares.)
- Extended pitch synchronous analysis
(visualization: 115MB)
- Usual spectral analysis methods fail to represent smooth nature of voiced sounds
because of interferences due to periodic excitation.
This visualization illustrates how pitch synchronous analysis is extended to
selectively eliminate the periodicity related interferences.
- Instantaneous frequency and its use in wavelet based fundamental frequency estimation
(visualization: 121MB)
- Usual definition of period is not relevant for signals with changing periodicity like speech signals.
Instantaneous frequency is an extension of frequency concept defined as the speed of
phase change.
This visualization illustrates how instantaneous frequency concept is relating to sinusoidal components
of complex sounds and how wavelet analysis provides a mean to uniquely extract fundamental component.
By definition, the instanteneous frequency of the fundamental component is the fundamental frequency
(F0) of the speech signal under study.
- Group delay and minimum phase component for representing source characteristics
(visualization: 143MB)
- Usual phase representation is sometimes misleading for understanding source characteristics.
Group delay defined as a delivertive on the frequency axis is more intiutive representation
by taking it as a centroid of energy at each frequency.
Combining this interpretation with a concept of minimum phase concempt provides
a very instructive interpretation of speech excitation.
It illustrates when
vocal tract excitation takes place at each frequency in a visually intuitive manner.
Excitation source signal of STRAIGHT is designed based on this
interpretation and approximation.
Demonstrations
- Affine transformation
- The basic application of STRAIGHT is to modify fundamental
frequencies, the frequency axis and the time axis of a speech
sample independently in a proportional fashion. The following
examples illustrates results of these basic manipulations.
- Japanese sentence (spoken by a female speaker)[original]
- Arbitrary transformation
- Transformations applicable to STRAIGHT parameters are not
necessarily proportional. Nonlinear and non-stationary transformations
of parameters are allowed unless they do not violate physical
feasibility of the modified representations.
- Auditory morphing of speech sounds
- Auditory morphing is to transform one speech example into
the other speech example in a parameterized manner.
References
- Hideki Kawahara, Ikuyo Masuda-Katsuse and Alain de Cheveigne:
Restructuring speech representations using a pitch-adaptive time-frequency
smoothing and an instantaneous-frequency-based F0 extraction:
Possible role of a repetitive structure in sounds, Speech Communication,
27, 3-4, pp.187-207 (1999). [The EURASIP Best-Paper Award 1998/99]
(draft)
- Hideki Kawahara, Haruhiro Katayose, Alain de Cheveigne, Roy
D. Patterson: Fixed Point Analysis of Frequency to Instantaneous
Frequency Mapping for Accurate Estimation of F0 and Periodicity
, Proc. EUROSPEECH'99, Volume 6, Page 2781-2784 (1999).
(PDF)
- Hideki Kawahara, Yoshinori Atake and Parham Zolfaghari: Accurate
vocal event detection method based on a fixed-point to weighted
average group delay, ICSLP-2000, Beijing, pp.664-667 2000.
(PDF)
- Parham Zolfaghari, Yoshinori Atake, Kiyohiro Shikano, Hideki
Kawahara: Investigation of analysis and synthesis parameters
of STRAIGHT by subjective evaluation, ICSLP-2000, Beijin
- H. Kawahara and P. Zolfaghari: Systematic F0 glitches around
vowel nasal transitions, EUROSPEECH'2001, pp.2459-2462, 2001.
- H. Kawahara, Jo Estill and O. Fujimura: Aperiodicity extraction
and control using mixed mode excitation and group delay manipulation
for a high quality speech analysis, modification and synthesis
system STRAIGHT, MAVEBA 2001, Sept.13-15, Firentze Italy, 2001.
(PDF)
- Hisami Matsui and Hideki Kawahara:
Auditorily motivated elastic spectral distance and its application
to emotional morphing of portrayal speech, FIRST PAN-AMERICAN/IBERIAN
MEETING ON ACOUSTICS, 2-6 December 2002, Cancun, 3pSC11.
- Hideki Kawahara and Hisami Matsui: AUDITORY MORPHING BASED
ON AN ELASTIC PERCEPTUAL DISTANCE METRIC IN AN INTERFERENCE-FREE
TIME-FREQUENCY REPRESENTATION, Proc. ICASSP'2003, vol.I, pp.256-259,
2003.
(PDF)
- Hideki Kawahara: Exemplar-based Voice Quality Analysis and Control
using a High Quality Auditory Morphing Procedure based on STRAIGHT,
VOQUAL'03, ISCA Tutorial and Research Workshop, Geneva, August 27-29, 2003, pp.109-114.
(PDF)
Notes on availability
Essential components of STRAIGHT are patented by ATR and JST.
When a company is interested in using STRAIGHT as a research tool,
a written agreement between JST and the company is necessary for
the company to get access to the codes and the technical documents.
(Please contact the JST office
for details.) For non-commercial research and/or educational institutes,
please contact
the web-master.
Last update: 12 October, 2003
web-master