Recording script design for corpus-based TTS system based on coverage of various phonetic elements

Mitsuaki Isogai, Hideyuki Mizuno, Kazunori Mano

研究成果: Conference contribution

13 引用 (Scopus)

抄録

This paper describes a new recording script generation method that can create speech databases for corpus-based TTS systems. This method is efficient due to its two features; (1) It has a 2-stage algorithm to generate the recording script with consideration of the balance of triphone, syllable and morpheme elements. (2) It can control types of phonetic elements included in the recording script via the weight coefficients of the phonetic elements. An evaluation shows that the 2-stage algorithm is effective in raising the coverage of phonetic elements and that this method yields a recording script containing various phonetic elements. A preference test shows that changing the selection criteria influences the quality of the synthesized speech. The same test also shows that it is better to take account of morpheme-based elements than syllable-based elements in generating a task-specific recording script.

元の言語English
ホスト出版物のタイトルICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
出版者Institute of Electrical and Electronics Engineers Inc.
I
ISBN(印刷物)0780388747, 9780780388741
DOI
出版物ステータスPublished - 2005
外部発表Yes
イベント2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Philadelphia, PA
継続期間: 2005 3 182005 3 23

Other

Other2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
Philadelphia, PA
期間05/3/1805/3/23

Fingerprint

phonetics
Speech analysis
recording
syllables
evaluation
coefficients

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Acoustics and Ultrasonics

これを引用

Isogai, M., Mizuno, H., & Mano, K. (2005). Recording script design for corpus-based TTS system based on coverage of various phonetic elements. : ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (巻 I). [1415110] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2005.1415110

Recording script design for corpus-based TTS system based on coverage of various phonetic elements. / Isogai, Mitsuaki; Mizuno, Hideyuki; Mano, Kazunori.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 巻 I Institute of Electrical and Electronics Engineers Inc., 2005. 1415110.

研究成果: Conference contribution

Isogai, M, Mizuno, H & Mano, K 2005, Recording script design for corpus-based TTS system based on coverage of various phonetic elements. : ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 巻. I, 1415110, Institute of Electrical and Electronics Engineers Inc., 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05, Philadelphia, PA, 05/3/18. https://doi.org/10.1109/ICASSP.2005.1415110
Isogai M, Mizuno H, Mano K. Recording script design for corpus-based TTS system based on coverage of various phonetic elements. : ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 巻 I. Institute of Electrical and Electronics Engineers Inc. 2005. 1415110 https://doi.org/10.1109/ICASSP.2005.1415110
Isogai, Mitsuaki ; Mizuno, Hideyuki ; Mano, Kazunori. / Recording script design for corpus-based TTS system based on coverage of various phonetic elements. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 巻 I Institute of Electrical and Electronics Engineers Inc., 2005.
@inproceedings{75d26a761b52486f909ad30d57b09601,
title = "Recording script design for corpus-based TTS system based on coverage of various phonetic elements",
abstract = "This paper describes a new recording script generation method that can create speech databases for corpus-based TTS systems. This method is efficient due to its two features; (1) It has a 2-stage algorithm to generate the recording script with consideration of the balance of triphone, syllable and morpheme elements. (2) It can control types of phonetic elements included in the recording script via the weight coefficients of the phonetic elements. An evaluation shows that the 2-stage algorithm is effective in raising the coverage of phonetic elements and that this method yields a recording script containing various phonetic elements. A preference test shows that changing the selection criteria influences the quality of the synthesized speech. The same test also shows that it is better to take account of morpheme-based elements than syllable-based elements in generating a task-specific recording script.",
author = "Mitsuaki Isogai and Hideyuki Mizuno and Kazunori Mano",
year = "2005",
doi = "10.1109/ICASSP.2005.1415110",
language = "English",
isbn = "0780388747",
volume = "I",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Recording script design for corpus-based TTS system based on coverage of various phonetic elements

AU - Isogai, Mitsuaki

AU - Mizuno, Hideyuki

AU - Mano, Kazunori

PY - 2005

Y1 - 2005

N2 - This paper describes a new recording script generation method that can create speech databases for corpus-based TTS systems. This method is efficient due to its two features; (1) It has a 2-stage algorithm to generate the recording script with consideration of the balance of triphone, syllable and morpheme elements. (2) It can control types of phonetic elements included in the recording script via the weight coefficients of the phonetic elements. An evaluation shows that the 2-stage algorithm is effective in raising the coverage of phonetic elements and that this method yields a recording script containing various phonetic elements. A preference test shows that changing the selection criteria influences the quality of the synthesized speech. The same test also shows that it is better to take account of morpheme-based elements than syllable-based elements in generating a task-specific recording script.

AB - This paper describes a new recording script generation method that can create speech databases for corpus-based TTS systems. This method is efficient due to its two features; (1) It has a 2-stage algorithm to generate the recording script with consideration of the balance of triphone, syllable and morpheme elements. (2) It can control types of phonetic elements included in the recording script via the weight coefficients of the phonetic elements. An evaluation shows that the 2-stage algorithm is effective in raising the coverage of phonetic elements and that this method yields a recording script containing various phonetic elements. A preference test shows that changing the selection criteria influences the quality of the synthesized speech. The same test also shows that it is better to take account of morpheme-based elements than syllable-based elements in generating a task-specific recording script.

UR - http://www.scopus.com/inward/record.url?scp=33646781552&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646781552&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2005.1415110

DO - 10.1109/ICASSP.2005.1415110

M3 - Conference contribution

AN - SCOPUS:33646781552

SN - 0780388747

SN - 9780780388741

VL - I

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -