A calculation cost reduction method for a log-likelihood maximization in word2vec

Sakuya Nakamura, Masaomi Kimura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Word2vec models learn text data and provide distributed representations to words. The distributed representations use vectors which show the meaning of the words. Thus the word2vec models are useful for Natural Language Processing (NLP). However, it is difficult to update the models for new data addition because it takes a long time to generate the word2vec model. This calculation time has become an impediment to analize text data which contains a lot of unknown words. This is caused by computational time in the calculation of the likelihood function. The purpose of this study was to speed up the training of Continuous Bag-of-Word Model(CBOW), which is one of the word2vec models, by reducing the calculation cost of the likelihood function. The likelihood function in CBOW has been expressed by the use of a softmax function and has a huge amount of computational time. In this paper, a sigmoid function replaces the softmax function as the approximated likelihood function, because the sigmoid function can reproduce the charactaristic change of the likelihood function in CBOW.

Original languageEnglish
Title of host publicationICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing
EditorsHui Yu
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781861376664
DOIs
Publication statusPublished - 2019 Sep
Event25th IEEE International Conference on Automation and Computing, ICAC 2019 - Lancaster, United Kingdom
Duration: 2019 Sep 52019 Sep 7

Publication series

NameICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing

Conference

Conference25th IEEE International Conference on Automation and Computing, ICAC 2019
CountryUnited Kingdom
CityLancaster
Period19/9/519/9/7

Fingerprint

Cost reduction
Processing

Keywords

  • CBOW
  • Component
  • Computational time
  • Softmax
  • Training acceleration
  • Word2Vec

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Control and Optimization

Cite this

Nakamura, S., & Kimura, M. (2019). A calculation cost reduction method for a log-likelihood maximization in word2vec. In H. Yu (Ed.), ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing [8895214] (ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/IConAC.2019.8895214

A calculation cost reduction method for a log-likelihood maximization in word2vec. / Nakamura, Sakuya; Kimura, Masaomi.

ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing. ed. / Hui Yu. Institute of Electrical and Electronics Engineers Inc., 2019. 8895214 (ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakamura, S & Kimura, M 2019, A calculation cost reduction method for a log-likelihood maximization in word2vec. in H Yu (ed.), ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing., 8895214, ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing, Institute of Electrical and Electronics Engineers Inc., 25th IEEE International Conference on Automation and Computing, ICAC 2019, Lancaster, United Kingdom, 19/9/5. https://doi.org/10.23919/IConAC.2019.8895214
Nakamura S, Kimura M. A calculation cost reduction method for a log-likelihood maximization in word2vec. In Yu H, editor, ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing. Institute of Electrical and Electronics Engineers Inc. 2019. 8895214. (ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing). https://doi.org/10.23919/IConAC.2019.8895214
Nakamura, Sakuya ; Kimura, Masaomi. / A calculation cost reduction method for a log-likelihood maximization in word2vec. ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing. editor / Hui Yu. Institute of Electrical and Electronics Engineers Inc., 2019. (ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing).
@inproceedings{f233ea8b884f47f7bea015a817594394,
title = "A calculation cost reduction method for a log-likelihood maximization in word2vec",
abstract = "Word2vec models learn text data and provide distributed representations to words. The distributed representations use vectors which show the meaning of the words. Thus the word2vec models are useful for Natural Language Processing (NLP). However, it is difficult to update the models for new data addition because it takes a long time to generate the word2vec model. This calculation time has become an impediment to analize text data which contains a lot of unknown words. This is caused by computational time in the calculation of the likelihood function. The purpose of this study was to speed up the training of Continuous Bag-of-Word Model(CBOW), which is one of the word2vec models, by reducing the calculation cost of the likelihood function. The likelihood function in CBOW has been expressed by the use of a softmax function and has a huge amount of computational time. In this paper, a sigmoid function replaces the softmax function as the approximated likelihood function, because the sigmoid function can reproduce the charactaristic change of the likelihood function in CBOW.",
keywords = "CBOW, Component, Computational time, Softmax, Training acceleration, Word2Vec",
author = "Sakuya Nakamura and Masaomi Kimura",
year = "2019",
month = "9",
doi = "10.23919/IConAC.2019.8895214",
language = "English",
series = "ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
editor = "Hui Yu",
booktitle = "ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing",

}

TY - GEN

T1 - A calculation cost reduction method for a log-likelihood maximization in word2vec

AU - Nakamura, Sakuya

AU - Kimura, Masaomi

PY - 2019/9

Y1 - 2019/9

N2 - Word2vec models learn text data and provide distributed representations to words. The distributed representations use vectors which show the meaning of the words. Thus the word2vec models are useful for Natural Language Processing (NLP). However, it is difficult to update the models for new data addition because it takes a long time to generate the word2vec model. This calculation time has become an impediment to analize text data which contains a lot of unknown words. This is caused by computational time in the calculation of the likelihood function. The purpose of this study was to speed up the training of Continuous Bag-of-Word Model(CBOW), which is one of the word2vec models, by reducing the calculation cost of the likelihood function. The likelihood function in CBOW has been expressed by the use of a softmax function and has a huge amount of computational time. In this paper, a sigmoid function replaces the softmax function as the approximated likelihood function, because the sigmoid function can reproduce the charactaristic change of the likelihood function in CBOW.

AB - Word2vec models learn text data and provide distributed representations to words. The distributed representations use vectors which show the meaning of the words. Thus the word2vec models are useful for Natural Language Processing (NLP). However, it is difficult to update the models for new data addition because it takes a long time to generate the word2vec model. This calculation time has become an impediment to analize text data which contains a lot of unknown words. This is caused by computational time in the calculation of the likelihood function. The purpose of this study was to speed up the training of Continuous Bag-of-Word Model(CBOW), which is one of the word2vec models, by reducing the calculation cost of the likelihood function. The likelihood function in CBOW has been expressed by the use of a softmax function and has a huge amount of computational time. In this paper, a sigmoid function replaces the softmax function as the approximated likelihood function, because the sigmoid function can reproduce the charactaristic change of the likelihood function in CBOW.

KW - CBOW

KW - Component

KW - Computational time

KW - Softmax

KW - Training acceleration

KW - Word2Vec

UR - http://www.scopus.com/inward/record.url?scp=85075783079&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85075783079&partnerID=8YFLogxK

U2 - 10.23919/IConAC.2019.8895214

DO - 10.23919/IConAC.2019.8895214

M3 - Conference contribution

AN - SCOPUS:85075783079

T3 - ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing

BT - ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing

A2 - Yu, Hui

PB - Institute of Electrical and Electronics Engineers Inc.

ER -