Detecting click fraud in online advertising

A data mining approach

Richard Oentaryo, Ee Peng Lim, Michael Finegold, David Lo, Feida Zhu, Clifton Phua, Eng Yeow Cheu, Ghim Eng Yap, Kelvin Sim, Minh Nhut Nguyen, Kasun Perera, Bijay Neupane, Mustafa Faisal, Zeyar Aung, Wei Lee Woon, Wei Chen, Dhaval Patel, Daniel Berrar

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Click fraud-the deliberate clicking on advertisements with no real interest on the product or service offered-is one of the most daunting problems in online advertising. Building an effective fraud detection method is thus pivotal for online advertising businesses. We organized a Fraud Detection in Mobile Advertising (FDMA) 2012 Competition, opening the opportunity for participants to work on real-world fraud data from BuzzCity Pte. Ltd., a global mobile advertising company based in Singapore. In particular, the task is to identify fraudulent publishers who generate illegitimate clicks, and distinguish them from normal publishers. The competition was held from September 1 to September 30, 2012, attracting 127 teams from more than 15 countries. The mobile advertising data are unique and complex, involving heterogeneous information, noisy patterns with missing values, and highly imbalanced class distribution. The competition results provide a comprehensive study on the usability of data mining-based fraud detection approaches in practical setting. Our principal findings are that features derived from fine-grained timeseries analysis are crucial for accurate fraud detection, and that ensemble methods offer promising solutions to highly-imbalanced nonlinear classification tasks with mixed variable types and noisy/missing patterns. The competition data remain available for further studies at http://palanteer.sis.smu.edu.sg/fdma2012/.

Original languageEnglish
Pages (from-to)99-140
Number of pages42
JournalJournal of Machine Learning Research
Volume15
Publication statusPublished - 2014
Externally publishedYes

Fingerprint

Data mining
Marketing
Industry

Keywords

  • Ensemble learning
  • Feature engineering
  • Fraud detection
  • Imbalanced classification

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Cite this

Oentaryo, R., Lim, E. P., Finegold, M., Lo, D., Zhu, F., Phua, C., ... Berrar, D. (2014). Detecting click fraud in online advertising: A data mining approach. Journal of Machine Learning Research, 15, 99-140.

Detecting click fraud in online advertising : A data mining approach. / Oentaryo, Richard; Lim, Ee Peng; Finegold, Michael; Lo, David; Zhu, Feida; Phua, Clifton; Cheu, Eng Yeow; Yap, Ghim Eng; Sim, Kelvin; Nguyen, Minh Nhut; Perera, Kasun; Neupane, Bijay; Faisal, Mustafa; Aung, Zeyar; Woon, Wei Lee; Chen, Wei; Patel, Dhaval; Berrar, Daniel.

In: Journal of Machine Learning Research, Vol. 15, 2014, p. 99-140.

Research output: Contribution to journalArticle

Oentaryo, R, Lim, EP, Finegold, M, Lo, D, Zhu, F, Phua, C, Cheu, EY, Yap, GE, Sim, K, Nguyen, MN, Perera, K, Neupane, B, Faisal, M, Aung, Z, Woon, WL, Chen, W, Patel, D & Berrar, D 2014, 'Detecting click fraud in online advertising: A data mining approach', Journal of Machine Learning Research, vol. 15, pp. 99-140.
Oentaryo R, Lim EP, Finegold M, Lo D, Zhu F, Phua C et al. Detecting click fraud in online advertising: A data mining approach. Journal of Machine Learning Research. 2014;15:99-140.
Oentaryo, Richard ; Lim, Ee Peng ; Finegold, Michael ; Lo, David ; Zhu, Feida ; Phua, Clifton ; Cheu, Eng Yeow ; Yap, Ghim Eng ; Sim, Kelvin ; Nguyen, Minh Nhut ; Perera, Kasun ; Neupane, Bijay ; Faisal, Mustafa ; Aung, Zeyar ; Woon, Wei Lee ; Chen, Wei ; Patel, Dhaval ; Berrar, Daniel. / Detecting click fraud in online advertising : A data mining approach. In: Journal of Machine Learning Research. 2014 ; Vol. 15. pp. 99-140.
@article{0bb872bdb2374c5ca2d2e0337b92b7c6,
title = "Detecting click fraud in online advertising: A data mining approach",
abstract = "Click fraud-the deliberate clicking on advertisements with no real interest on the product or service offered-is one of the most daunting problems in online advertising. Building an effective fraud detection method is thus pivotal for online advertising businesses. We organized a Fraud Detection in Mobile Advertising (FDMA) 2012 Competition, opening the opportunity for participants to work on real-world fraud data from BuzzCity Pte. Ltd., a global mobile advertising company based in Singapore. In particular, the task is to identify fraudulent publishers who generate illegitimate clicks, and distinguish them from normal publishers. The competition was held from September 1 to September 30, 2012, attracting 127 teams from more than 15 countries. The mobile advertising data are unique and complex, involving heterogeneous information, noisy patterns with missing values, and highly imbalanced class distribution. The competition results provide a comprehensive study on the usability of data mining-based fraud detection approaches in practical setting. Our principal findings are that features derived from fine-grained timeseries analysis are crucial for accurate fraud detection, and that ensemble methods offer promising solutions to highly-imbalanced nonlinear classification tasks with mixed variable types and noisy/missing patterns. The competition data remain available for further studies at http://palanteer.sis.smu.edu.sg/fdma2012/.",
keywords = "Ensemble learning, Feature engineering, Fraud detection, Imbalanced classification",
author = "Richard Oentaryo and Lim, {Ee Peng} and Michael Finegold and David Lo and Feida Zhu and Clifton Phua and Cheu, {Eng Yeow} and Yap, {Ghim Eng} and Kelvin Sim and Nguyen, {Minh Nhut} and Kasun Perera and Bijay Neupane and Mustafa Faisal and Zeyar Aung and Woon, {Wei Lee} and Wei Chen and Dhaval Patel and Daniel Berrar",
year = "2014",
language = "English",
volume = "15",
pages = "99--140",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

TY - JOUR

T1 - Detecting click fraud in online advertising

T2 - A data mining approach

AU - Oentaryo, Richard

AU - Lim, Ee Peng

AU - Finegold, Michael

AU - Lo, David

AU - Zhu, Feida

AU - Phua, Clifton

AU - Cheu, Eng Yeow

AU - Yap, Ghim Eng

AU - Sim, Kelvin

AU - Nguyen, Minh Nhut

AU - Perera, Kasun

AU - Neupane, Bijay

AU - Faisal, Mustafa

AU - Aung, Zeyar

AU - Woon, Wei Lee

AU - Chen, Wei

AU - Patel, Dhaval

AU - Berrar, Daniel

PY - 2014

Y1 - 2014

N2 - Click fraud-the deliberate clicking on advertisements with no real interest on the product or service offered-is one of the most daunting problems in online advertising. Building an effective fraud detection method is thus pivotal for online advertising businesses. We organized a Fraud Detection in Mobile Advertising (FDMA) 2012 Competition, opening the opportunity for participants to work on real-world fraud data from BuzzCity Pte. Ltd., a global mobile advertising company based in Singapore. In particular, the task is to identify fraudulent publishers who generate illegitimate clicks, and distinguish them from normal publishers. The competition was held from September 1 to September 30, 2012, attracting 127 teams from more than 15 countries. The mobile advertising data are unique and complex, involving heterogeneous information, noisy patterns with missing values, and highly imbalanced class distribution. The competition results provide a comprehensive study on the usability of data mining-based fraud detection approaches in practical setting. Our principal findings are that features derived from fine-grained timeseries analysis are crucial for accurate fraud detection, and that ensemble methods offer promising solutions to highly-imbalanced nonlinear classification tasks with mixed variable types and noisy/missing patterns. The competition data remain available for further studies at http://palanteer.sis.smu.edu.sg/fdma2012/.

AB - Click fraud-the deliberate clicking on advertisements with no real interest on the product or service offered-is one of the most daunting problems in online advertising. Building an effective fraud detection method is thus pivotal for online advertising businesses. We organized a Fraud Detection in Mobile Advertising (FDMA) 2012 Competition, opening the opportunity for participants to work on real-world fraud data from BuzzCity Pte. Ltd., a global mobile advertising company based in Singapore. In particular, the task is to identify fraudulent publishers who generate illegitimate clicks, and distinguish them from normal publishers. The competition was held from September 1 to September 30, 2012, attracting 127 teams from more than 15 countries. The mobile advertising data are unique and complex, involving heterogeneous information, noisy patterns with missing values, and highly imbalanced class distribution. The competition results provide a comprehensive study on the usability of data mining-based fraud detection approaches in practical setting. Our principal findings are that features derived from fine-grained timeseries analysis are crucial for accurate fraud detection, and that ensemble methods offer promising solutions to highly-imbalanced nonlinear classification tasks with mixed variable types and noisy/missing patterns. The competition data remain available for further studies at http://palanteer.sis.smu.edu.sg/fdma2012/.

KW - Ensemble learning

KW - Feature engineering

KW - Fraud detection

KW - Imbalanced classification

UR - http://www.scopus.com/inward/record.url?scp=84897035293&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897035293&partnerID=8YFLogxK

M3 - Article

VL - 15

SP - 99

EP - 140

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -