Media conversion technologies such as speech recognition and speech synthesis have gained a lot of attention in recent years. They are applied in various human communication tools through smart phones and personal computers such as in language learning system. However, the learner still have loss in naturalness pronunciation problem because of effects on mother tongue interference. Although the learner can realize that his/her speech is different from the trainer's, the learner still cannot detect or check the precise wrong part in their utterances speech. We indicate these differences by visualizing a learner's wrong and correct pronunciation with speech-to-animated text visualization tool. In this study, we focused on the media conversion process between speech prosodic information and animated text information by using AHP method as the mapping method. The pairwise comparison between both speech and text information were conducted and evaluated by Japanese native and nonnative speakers. Here, we investigated the comparison between native and nonnative speakers' perspective and determined the ideal matching elements between attributes of speech and text information.