CADCC-汉语普通话自然口语对话语料库 | 语音与言语科学重点实验室

资源名称（中、英文）
CADCC-汉语普通话自然口语对话语料库
CADCC-Chinese Annotated Dialogue and Conversation Corpus

资源简介

CADCC自然口语对话语料库由自然口语对话语音数据和对话文本组成，适用于自然口语研究、语音识别工程和高级汉语普通话教学等领域。

该语料库为保证自然口语的纯粹性，对发音人对话内容不作任何限制，完全反映真实环境下汉语自然口语特征。语料库语音数据由经过挑选的标准普通话发音人在专业录音环境下录制，共计12个对话单元，每一对话单元有两位发音人。语料库声音数据文件采用高质量16KHZ采样、16位数据、单声道WAV格式存储，库容量约1.6GB。语料库内容文本采用人工标注方式完成，可信度高。

CADCC (Chinese Annotated Dialogue and Conversation Corpus) is a speech corpus designed for linguistic/phonetic research, phonetic engineering and advanced mandarin Chinese teaching. It is composed of speech sound files and the corresponding text files.

In order to make sure the naturalness of the spontaneous speech and to give real reflection of the phonetic features, there’s no limitation about the recording content. All the speakers speak standard mandarin. This corpus was recorded in spontaneous conversation form. 12 conversations (about 1.2GB) were included in the corpus and all text information has been transcribed manually.

The recording format is 16k, 16bit, mono wav.