Text data compression ratio as a text attribute for a language-independent text art extraction method

Tetsuya Suzuki, Kazuyuki Hayashi

研究成果: Conference contribution

3 引用 (Scopus)

抄録

Text based pictures called text art are often used in Web pages, email text and so on. They enrich expression in text data, but they can be noise for handling the text data. For example, they can be obstacle for text-to-speech software and natural language processing. Text art extraction methods, which detects the area of text art in a given text data, help to solve such problems. Previously proposed text art extraction methods, however, will not work for text data with more than one natural languages well because they assume that a specific natural language is used in text data. We have proposed a text art extraction method for multi natural languages in our past paper. The extraction method uses an attribute based on successive occurrences of same two characters. The attribute represents a characteristic such that same characters often appear successively in text art. In this paper, we use two data compression ratios of text data instead of the attribute in the our extraction method, namely compression ratio by Run Length Encoding (RLE) and that by LZ77. Our experiments show that our extraction method with compression ratio by RLE works better than both that with compression ratio by LZ77 and our previous extraction method.

元の言語English
ホスト出版物のタイトル2010 5th International Conference on Digital Information Management, ICDIM 2010
ページ513-518
ページ数6
DOI
出版物ステータスPublished - 2010
イベント2010 5th International Conference on Digital Information Management, ICDIM 2010 - Thunder Bay, ON
継続期間: 2010 7 52010 7 8

Other

Other2010 5th International Conference on Digital Information Management, ICDIM 2010
Thunder Bay, ON
期間10/7/510/7/8

Fingerprint

Data compression ratio
Electronic mail
Websites

ASJC Scopus subject areas

  • Information Systems

これを引用

Suzuki, T., & Hayashi, K. (2010). Text data compression ratio as a text attribute for a language-independent text art extraction method. : 2010 5th International Conference on Digital Information Management, ICDIM 2010 (pp. 513-518). [5664648] https://doi.org/10.1109/ICDIM.2010.5664648

Text data compression ratio as a text attribute for a language-independent text art extraction method. / Suzuki, Tetsuya; Hayashi, Kazuyuki.

2010 5th International Conference on Digital Information Management, ICDIM 2010. 2010. p. 513-518 5664648.

研究成果: Conference contribution

Suzuki, T & Hayashi, K 2010, Text data compression ratio as a text attribute for a language-independent text art extraction method. : 2010 5th International Conference on Digital Information Management, ICDIM 2010., 5664648, pp. 513-518, 2010 5th International Conference on Digital Information Management, ICDIM 2010, Thunder Bay, ON, 10/7/5. https://doi.org/10.1109/ICDIM.2010.5664648
Suzuki T, Hayashi K. Text data compression ratio as a text attribute for a language-independent text art extraction method. : 2010 5th International Conference on Digital Information Management, ICDIM 2010. 2010. p. 513-518. 5664648 https://doi.org/10.1109/ICDIM.2010.5664648
Suzuki, Tetsuya ; Hayashi, Kazuyuki. / Text data compression ratio as a text attribute for a language-independent text art extraction method. 2010 5th International Conference on Digital Information Management, ICDIM 2010. 2010. pp. 513-518
@inproceedings{7726a3a45db54840a4cfcd17dc983c73,
title = "Text data compression ratio as a text attribute for a language-independent text art extraction method",
abstract = "Text based pictures called text art are often used in Web pages, email text and so on. They enrich expression in text data, but they can be noise for handling the text data. For example, they can be obstacle for text-to-speech software and natural language processing. Text art extraction methods, which detects the area of text art in a given text data, help to solve such problems. Previously proposed text art extraction methods, however, will not work for text data with more than one natural languages well because they assume that a specific natural language is used in text data. We have proposed a text art extraction method for multi natural languages in our past paper. The extraction method uses an attribute based on successive occurrences of same two characters. The attribute represents a characteristic such that same characters often appear successively in text art. In this paper, we use two data compression ratios of text data instead of the attribute in the our extraction method, namely compression ratio by Run Length Encoding (RLE) and that by LZ77. Our experiments show that our extraction method with compression ratio by RLE works better than both that with compression ratio by LZ77 and our previous extraction method.",
author = "Tetsuya Suzuki and Kazuyuki Hayashi",
year = "2010",
doi = "10.1109/ICDIM.2010.5664648",
language = "English",
isbn = "9781424475728",
pages = "513--518",
booktitle = "2010 5th International Conference on Digital Information Management, ICDIM 2010",

}

TY - GEN

T1 - Text data compression ratio as a text attribute for a language-independent text art extraction method

AU - Suzuki, Tetsuya

AU - Hayashi, Kazuyuki

PY - 2010

Y1 - 2010

N2 - Text based pictures called text art are often used in Web pages, email text and so on. They enrich expression in text data, but they can be noise for handling the text data. For example, they can be obstacle for text-to-speech software and natural language processing. Text art extraction methods, which detects the area of text art in a given text data, help to solve such problems. Previously proposed text art extraction methods, however, will not work for text data with more than one natural languages well because they assume that a specific natural language is used in text data. We have proposed a text art extraction method for multi natural languages in our past paper. The extraction method uses an attribute based on successive occurrences of same two characters. The attribute represents a characteristic such that same characters often appear successively in text art. In this paper, we use two data compression ratios of text data instead of the attribute in the our extraction method, namely compression ratio by Run Length Encoding (RLE) and that by LZ77. Our experiments show that our extraction method with compression ratio by RLE works better than both that with compression ratio by LZ77 and our previous extraction method.

AB - Text based pictures called text art are often used in Web pages, email text and so on. They enrich expression in text data, but they can be noise for handling the text data. For example, they can be obstacle for text-to-speech software and natural language processing. Text art extraction methods, which detects the area of text art in a given text data, help to solve such problems. Previously proposed text art extraction methods, however, will not work for text data with more than one natural languages well because they assume that a specific natural language is used in text data. We have proposed a text art extraction method for multi natural languages in our past paper. The extraction method uses an attribute based on successive occurrences of same two characters. The attribute represents a characteristic such that same characters often appear successively in text art. In this paper, we use two data compression ratios of text data instead of the attribute in the our extraction method, namely compression ratio by Run Length Encoding (RLE) and that by LZ77. Our experiments show that our extraction method with compression ratio by RLE works better than both that with compression ratio by LZ77 and our previous extraction method.

UR - http://www.scopus.com/inward/record.url?scp=78650943868&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650943868&partnerID=8YFLogxK

U2 - 10.1109/ICDIM.2010.5664648

DO - 10.1109/ICDIM.2010.5664648

M3 - Conference contribution

AN - SCOPUS:78650943868

SN - 9781424475728

SP - 513

EP - 518

BT - 2010 5th International Conference on Digital Information Management, ICDIM 2010

ER -