As a step towards phoneme identification, a method of clustering speech spectra and spectral changes is discussed. In this technique, two kinds of acoustic features are defined in each frame of analysis. The first feature, called a feature of Level 1, shows a spectral contour of a frame which is represented by LPC cepstral coefficients. The second feature, called a feature of Level 2, shows a spectral change in a frame, which is defined by the difference between the LPC cepstral coefficients derived from the first half and the second half of a frame. A phonemic feature of each frame is defined as a triplet of phonemic names. The acoustical features of Levels 1 and 2 are calculated from 800 V, VV, CV, VCV (vowel, vowel-vowel, consonant-vowel, vowel-consonant-vowel) syllables uttered by one male and clustered with an algorithm of vector quantizer design. This VQ design method is based on the one by Linde, Buzo and Gray (1980). However, the proposed VQ method is slightly modified to consider frame labels belonging to each cluster. As a result, each frame is characterized by the cluster numbers, or the centroid numbers, of Level 1 and Level 2. The relation between the cluster numbers and the phonemic feature was investigated. It was found that the number of different phonemic labels corresponding to each cluster was less than five. In the resulting 5503 clusters, the existing combinations of Level 1 and Level 2 codes (centroid numbers), 4428 clusters had only one kind of label.
- Clustering methid
- phoneme identification
ASJC Scopus subject areas
- Control and Systems Engineering
- Signal Processing
- Computer Vision and Pattern Recognition
- Electrical and Electronic Engineering