Comparison of two ASCII art extraction methods: A run-length encoding based method and a byte pattern based method

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Text based pictures called ASCII art are often used in Web pages, email text and so on. They enrich expression in text data, but they can be noise for natural language processing and large ASCII arts are deformed in small display devices. We can ignore ASCII arts in text data or replace them with other strings by ASCII art extraction methods, which detect areas of ASCII arts in a given text data. Our research group and another research group independently proposed two different ASCII art extraction methods, which are a run-length encoding based method and a byte pattern based method respectively. Both of the methods use text classifiers constructed by machine learning algorithms, but they use different attributes of text. In this paper, we compare the two methods by ASCII art extraction experiments where training text and testing text are in English and Japanese. Our experimental results show that the two methods are competitive if training text and testing text are in a same set of languages, but the run-length encoding based method works better than the byte pattern based method if training text and testing text are in different sets of languages.

Original languageEnglish
Title of host publicationProceedings of the IASTED International Conference on Computational Intelligence, CI 2015
PublisherActa Press
Pages269-276
Number of pages8
ISBN (Electronic)9780889869752
DOIs
Publication statusPublished - 2015
Event2015 IASTED International Conference on Computational Intelligence, CI 2015 - Innsbruck, Austria
Duration: 2015 Feb 162015 Feb 17

Other

Other2015 IASTED International Conference on Computational Intelligence, CI 2015
CountryAustria
CityInnsbruck
Period15/2/1615/2/17

Fingerprint

Testing
Electronic mail
Learning algorithms
Learning systems
Websites
Classifiers
Display devices
Processing
Experiments

Keywords

  • ASCII art
  • Information extraction
  • Natural language processing
  • Pattern recognition

ASJC Scopus subject areas

  • Computational Mechanics
  • Artificial Intelligence

Cite this

Suzuki, T. (2015). Comparison of two ASCII art extraction methods: A run-length encoding based method and a byte pattern based method. In Proceedings of the IASTED International Conference on Computational Intelligence, CI 2015 (pp. 269-276). Acta Press. https://doi.org/10.2316/P.2015.827-026

Comparison of two ASCII art extraction methods : A run-length encoding based method and a byte pattern based method. / Suzuki, Tetsuya.

Proceedings of the IASTED International Conference on Computational Intelligence, CI 2015. Acta Press, 2015. p. 269-276.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Suzuki, T 2015, Comparison of two ASCII art extraction methods: A run-length encoding based method and a byte pattern based method. in Proceedings of the IASTED International Conference on Computational Intelligence, CI 2015. Acta Press, pp. 269-276, 2015 IASTED International Conference on Computational Intelligence, CI 2015, Innsbruck, Austria, 15/2/16. https://doi.org/10.2316/P.2015.827-026
Suzuki T. Comparison of two ASCII art extraction methods: A run-length encoding based method and a byte pattern based method. In Proceedings of the IASTED International Conference on Computational Intelligence, CI 2015. Acta Press. 2015. p. 269-276 https://doi.org/10.2316/P.2015.827-026
Suzuki, Tetsuya. / Comparison of two ASCII art extraction methods : A run-length encoding based method and a byte pattern based method. Proceedings of the IASTED International Conference on Computational Intelligence, CI 2015. Acta Press, 2015. pp. 269-276
@inproceedings{3c81c75ccfbc47cc8729ba2962b14d48,
title = "Comparison of two ASCII art extraction methods: A run-length encoding based method and a byte pattern based method",
abstract = "Text based pictures called ASCII art are often used in Web pages, email text and so on. They enrich expression in text data, but they can be noise for natural language processing and large ASCII arts are deformed in small display devices. We can ignore ASCII arts in text data or replace them with other strings by ASCII art extraction methods, which detect areas of ASCII arts in a given text data. Our research group and another research group independently proposed two different ASCII art extraction methods, which are a run-length encoding based method and a byte pattern based method respectively. Both of the methods use text classifiers constructed by machine learning algorithms, but they use different attributes of text. In this paper, we compare the two methods by ASCII art extraction experiments where training text and testing text are in English and Japanese. Our experimental results show that the two methods are competitive if training text and testing text are in a same set of languages, but the run-length encoding based method works better than the byte pattern based method if training text and testing text are in different sets of languages.",
keywords = "ASCII art, Information extraction, Natural language processing, Pattern recognition",
author = "Tetsuya Suzuki",
year = "2015",
doi = "10.2316/P.2015.827-026",
language = "English",
pages = "269--276",
booktitle = "Proceedings of the IASTED International Conference on Computational Intelligence, CI 2015",
publisher = "Acta Press",

}

TY - GEN

T1 - Comparison of two ASCII art extraction methods

T2 - A run-length encoding based method and a byte pattern based method

AU - Suzuki, Tetsuya

PY - 2015

Y1 - 2015

N2 - Text based pictures called ASCII art are often used in Web pages, email text and so on. They enrich expression in text data, but they can be noise for natural language processing and large ASCII arts are deformed in small display devices. We can ignore ASCII arts in text data or replace them with other strings by ASCII art extraction methods, which detect areas of ASCII arts in a given text data. Our research group and another research group independently proposed two different ASCII art extraction methods, which are a run-length encoding based method and a byte pattern based method respectively. Both of the methods use text classifiers constructed by machine learning algorithms, but they use different attributes of text. In this paper, we compare the two methods by ASCII art extraction experiments where training text and testing text are in English and Japanese. Our experimental results show that the two methods are competitive if training text and testing text are in a same set of languages, but the run-length encoding based method works better than the byte pattern based method if training text and testing text are in different sets of languages.

AB - Text based pictures called ASCII art are often used in Web pages, email text and so on. They enrich expression in text data, but they can be noise for natural language processing and large ASCII arts are deformed in small display devices. We can ignore ASCII arts in text data or replace them with other strings by ASCII art extraction methods, which detect areas of ASCII arts in a given text data. Our research group and another research group independently proposed two different ASCII art extraction methods, which are a run-length encoding based method and a byte pattern based method respectively. Both of the methods use text classifiers constructed by machine learning algorithms, but they use different attributes of text. In this paper, we compare the two methods by ASCII art extraction experiments where training text and testing text are in English and Japanese. Our experimental results show that the two methods are competitive if training text and testing text are in a same set of languages, but the run-length encoding based method works better than the byte pattern based method if training text and testing text are in different sets of languages.

KW - ASCII art

KW - Information extraction

KW - Natural language processing

KW - Pattern recognition

UR - http://www.scopus.com/inward/record.url?scp=85015616989&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015616989&partnerID=8YFLogxK

U2 - 10.2316/P.2015.827-026

DO - 10.2316/P.2015.827-026

M3 - Conference contribution

AN - SCOPUS:85015616989

SP - 269

EP - 276

BT - Proceedings of the IASTED International Conference on Computational Intelligence, CI 2015

PB - Acta Press

ER -