Effects of text length on lexical diversity measures

Using short texts with less than 200 tokens

Rie Koizumi, Yo In'nami

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Despite the importance of lexical diversity (LD) in L2 speaking and writing performance, LD assessment measures are known to be affected by the number of words analyzed in the text. This study aims to identify LD measures that are less affected by text length and can be used for the analysis of short L2 texts (50-200 tokens). We compared the type-token ratio, Guiraud index, Maas, measure of textual lexical diversity (MTLD), D, and HD-D to assess their degree of susceptibility to text length. Spoken texts of 200 tokens from 38 L2 English learners at the lower-intermediate-level were divided into segments of 50-200 tokens and the text length impact was examined. We found that MTLD was less affected by text length across most ranges, but was somewhat affected across 50 to 150 and 50 to 200 tokens. We further observed low correlations between equal-sized texts for up to 100 tokens. These results suggest that MTLD can be used with texts of more than 100 tokens and compared between 100- and 200-token texts. We also showed that D and HD-D produced similar results for texts, thus indicating that D and HD-D are comparable.

Original languageEnglish
Pages (from-to)522-532
Number of pages11
JournalSystem
Volume40
Issue number4
DOIs
Publication statusPublished - 2012 Dec

Fingerprint

Length
speaking
performance
Intermediate
English Learners
Susceptibility

Keywords

  • D
  • Guiraud index
  • Maas
  • Measure of textual lexical diversity
  • Type-token ratio

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Linguistics and Language

Cite this

Effects of text length on lexical diversity measures : Using short texts with less than 200 tokens. / Koizumi, Rie; In'nami, Yo.

In: System, Vol. 40, No. 4, 12.2012, p. 522-532.

Research output: Contribution to journalArticle

Koizumi, Rie ; In'nami, Yo. / Effects of text length on lexical diversity measures : Using short texts with less than 200 tokens. In: System. 2012 ; Vol. 40, No. 4. pp. 522-532.
@article{e3bd37eaeae046fe9d38558443041f4b,
title = "Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens",
abstract = "Despite the importance of lexical diversity (LD) in L2 speaking and writing performance, LD assessment measures are known to be affected by the number of words analyzed in the text. This study aims to identify LD measures that are less affected by text length and can be used for the analysis of short L2 texts (50-200 tokens). We compared the type-token ratio, Guiraud index, Maas, measure of textual lexical diversity (MTLD), D, and HD-D to assess their degree of susceptibility to text length. Spoken texts of 200 tokens from 38 L2 English learners at the lower-intermediate-level were divided into segments of 50-200 tokens and the text length impact was examined. We found that MTLD was less affected by text length across most ranges, but was somewhat affected across 50 to 150 and 50 to 200 tokens. We further observed low correlations between equal-sized texts for up to 100 tokens. These results suggest that MTLD can be used with texts of more than 100 tokens and compared between 100- and 200-token texts. We also showed that D and HD-D produced similar results for texts, thus indicating that D and HD-D are comparable.",
keywords = "D, Guiraud index, Maas, Measure of textual lexical diversity, Type-token ratio",
author = "Rie Koizumi and Yo In'nami",
year = "2012",
month = "12",
doi = "10.1016/j.system.2012.10.017",
language = "English",
volume = "40",
pages = "522--532",
journal = "System",
issn = "0346-251X",
publisher = "Elsevier Limited",
number = "4",

}

TY - JOUR

T1 - Effects of text length on lexical diversity measures

T2 - Using short texts with less than 200 tokens

AU - Koizumi, Rie

AU - In'nami, Yo

PY - 2012/12

Y1 - 2012/12

N2 - Despite the importance of lexical diversity (LD) in L2 speaking and writing performance, LD assessment measures are known to be affected by the number of words analyzed in the text. This study aims to identify LD measures that are less affected by text length and can be used for the analysis of short L2 texts (50-200 tokens). We compared the type-token ratio, Guiraud index, Maas, measure of textual lexical diversity (MTLD), D, and HD-D to assess their degree of susceptibility to text length. Spoken texts of 200 tokens from 38 L2 English learners at the lower-intermediate-level were divided into segments of 50-200 tokens and the text length impact was examined. We found that MTLD was less affected by text length across most ranges, but was somewhat affected across 50 to 150 and 50 to 200 tokens. We further observed low correlations between equal-sized texts for up to 100 tokens. These results suggest that MTLD can be used with texts of more than 100 tokens and compared between 100- and 200-token texts. We also showed that D and HD-D produced similar results for texts, thus indicating that D and HD-D are comparable.

AB - Despite the importance of lexical diversity (LD) in L2 speaking and writing performance, LD assessment measures are known to be affected by the number of words analyzed in the text. This study aims to identify LD measures that are less affected by text length and can be used for the analysis of short L2 texts (50-200 tokens). We compared the type-token ratio, Guiraud index, Maas, measure of textual lexical diversity (MTLD), D, and HD-D to assess their degree of susceptibility to text length. Spoken texts of 200 tokens from 38 L2 English learners at the lower-intermediate-level were divided into segments of 50-200 tokens and the text length impact was examined. We found that MTLD was less affected by text length across most ranges, but was somewhat affected across 50 to 150 and 50 to 200 tokens. We further observed low correlations between equal-sized texts for up to 100 tokens. These results suggest that MTLD can be used with texts of more than 100 tokens and compared between 100- and 200-token texts. We also showed that D and HD-D produced similar results for texts, thus indicating that D and HD-D are comparable.

KW - D

KW - Guiraud index

KW - Maas

KW - Measure of textual lexical diversity

KW - Type-token ratio

UR - http://www.scopus.com/inward/record.url?scp=84870435686&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84870435686&partnerID=8YFLogxK

U2 - 10.1016/j.system.2012.10.017

DO - 10.1016/j.system.2012.10.017

M3 - Article

VL - 40

SP - 522

EP - 532

JO - System

JF - System

SN - 0346-251X

IS - 4

ER -