A clustering experiment of the spectra and the spectral changes of speech to extract phonemic features

Katsuhiko Shirai, Kazunori Mano

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

As a step towards phoneme identification, a method of clustering speech spectra and spectral changes is discussed. In this technique, two kinds of acoustic features are defined in each frame of analysis. The first feature, called a feature of Level 1, shows a spectral contour of a frame which is represented by LPC cepstral coefficients. The second feature, called a feature of Level 2, shows a spectral change in a frame, which is defined by the difference between the LPC cepstral coefficients derived from the first half and the second half of a frame. A phonemic feature of each frame is defined as a triplet of phonemic names. The acoustical features of Levels 1 and 2 are calculated from 800 V, VV, CV, VCV (vowel, vowel-vowel, consonant-vowel, vowel-consonant-vowel) syllables uttered by one male and clustered with an algorithm of vector quantizer design. This VQ design method is based on the one by Linde, Buzo and Gray (1980). However, the proposed VQ method is slightly modified to consider frame labels belonging to each cluster. As a result, each frame is characterized by the cluster numbers, or the centroid numbers, of Level 1 and Level 2. The relation between the cluster numbers and the phonemic feature was investigated. It was found that the number of different phonemic labels corresponding to each cluster was less than five. In the resulting 5503 clusters, the existing combinations of Level 1 and Level 2 codes (centroid numbers), 4428 clusters had only one kind of label.

Original languageEnglish
Pages (from-to)279-290
Number of pages12
JournalSignal Processing
Volume10
Issue number3
DOIs
Publication statusPublished - 1986
Externally publishedYes

Fingerprint

Labels
Experiments
Acoustics

Keywords

  • Clustering methid
  • phoneme identification

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

A clustering experiment of the spectra and the spectral changes of speech to extract phonemic features. / Shirai, Katsuhiko; Mano, Kazunori.

In: Signal Processing, Vol. 10, No. 3, 1986, p. 279-290.

Research output: Contribution to journalArticle

@article{33a92bc9847645d195ded1d3e685ca08,
title = "A clustering experiment of the spectra and the spectral changes of speech to extract phonemic features",
abstract = "As a step towards phoneme identification, a method of clustering speech spectra and spectral changes is discussed. In this technique, two kinds of acoustic features are defined in each frame of analysis. The first feature, called a feature of Level 1, shows a spectral contour of a frame which is represented by LPC cepstral coefficients. The second feature, called a feature of Level 2, shows a spectral change in a frame, which is defined by the difference between the LPC cepstral coefficients derived from the first half and the second half of a frame. A phonemic feature of each frame is defined as a triplet of phonemic names. The acoustical features of Levels 1 and 2 are calculated from 800 V, VV, CV, VCV (vowel, vowel-vowel, consonant-vowel, vowel-consonant-vowel) syllables uttered by one male and clustered with an algorithm of vector quantizer design. This VQ design method is based on the one by Linde, Buzo and Gray (1980). However, the proposed VQ method is slightly modified to consider frame labels belonging to each cluster. As a result, each frame is characterized by the cluster numbers, or the centroid numbers, of Level 1 and Level 2. The relation between the cluster numbers and the phonemic feature was investigated. It was found that the number of different phonemic labels corresponding to each cluster was less than five. In the resulting 5503 clusters, the existing combinations of Level 1 and Level 2 codes (centroid numbers), 4428 clusters had only one kind of label.",
keywords = "Clustering methid, phoneme identification",
author = "Katsuhiko Shirai and Kazunori Mano",
year = "1986",
doi = "10.1016/0165-1684(86)90105-2",
language = "English",
volume = "10",
pages = "279--290",
journal = "Signal Processing",
issn = "0165-1684",
publisher = "Elsevier",
number = "3",

}

TY - JOUR

T1 - A clustering experiment of the spectra and the spectral changes of speech to extract phonemic features

AU - Shirai, Katsuhiko

AU - Mano, Kazunori

PY - 1986

Y1 - 1986

N2 - As a step towards phoneme identification, a method of clustering speech spectra and spectral changes is discussed. In this technique, two kinds of acoustic features are defined in each frame of analysis. The first feature, called a feature of Level 1, shows a spectral contour of a frame which is represented by LPC cepstral coefficients. The second feature, called a feature of Level 2, shows a spectral change in a frame, which is defined by the difference between the LPC cepstral coefficients derived from the first half and the second half of a frame. A phonemic feature of each frame is defined as a triplet of phonemic names. The acoustical features of Levels 1 and 2 are calculated from 800 V, VV, CV, VCV (vowel, vowel-vowel, consonant-vowel, vowel-consonant-vowel) syllables uttered by one male and clustered with an algorithm of vector quantizer design. This VQ design method is based on the one by Linde, Buzo and Gray (1980). However, the proposed VQ method is slightly modified to consider frame labels belonging to each cluster. As a result, each frame is characterized by the cluster numbers, or the centroid numbers, of Level 1 and Level 2. The relation between the cluster numbers and the phonemic feature was investigated. It was found that the number of different phonemic labels corresponding to each cluster was less than five. In the resulting 5503 clusters, the existing combinations of Level 1 and Level 2 codes (centroid numbers), 4428 clusters had only one kind of label.

AB - As a step towards phoneme identification, a method of clustering speech spectra and spectral changes is discussed. In this technique, two kinds of acoustic features are defined in each frame of analysis. The first feature, called a feature of Level 1, shows a spectral contour of a frame which is represented by LPC cepstral coefficients. The second feature, called a feature of Level 2, shows a spectral change in a frame, which is defined by the difference between the LPC cepstral coefficients derived from the first half and the second half of a frame. A phonemic feature of each frame is defined as a triplet of phonemic names. The acoustical features of Levels 1 and 2 are calculated from 800 V, VV, CV, VCV (vowel, vowel-vowel, consonant-vowel, vowel-consonant-vowel) syllables uttered by one male and clustered with an algorithm of vector quantizer design. This VQ design method is based on the one by Linde, Buzo and Gray (1980). However, the proposed VQ method is slightly modified to consider frame labels belonging to each cluster. As a result, each frame is characterized by the cluster numbers, or the centroid numbers, of Level 1 and Level 2. The relation between the cluster numbers and the phonemic feature was investigated. It was found that the number of different phonemic labels corresponding to each cluster was less than five. In the resulting 5503 clusters, the existing combinations of Level 1 and Level 2 codes (centroid numbers), 4428 clusters had only one kind of label.

KW - Clustering methid

KW - phoneme identification

UR - http://www.scopus.com/inward/record.url?scp=0022697802&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0022697802&partnerID=8YFLogxK

U2 - 10.1016/0165-1684(86)90105-2

DO - 10.1016/0165-1684(86)90105-2

M3 - Article

VL - 10

SP - 279

EP - 290

JO - Signal Processing

JF - Signal Processing

SN - 0165-1684

IS - 3

ER -