If you are interested in text mining, this is a good data set to start with. It is a bunch of text messages, each one line long, that have been classified by a human as either spam or ham (ham is a legitimate message).

Tiago A. Almeida, Jose Maria Gomez Hidalgo. SMS Spam Collection Data Set. Part of the UCI Machine Learning Repository. Available at https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection.

This Recommendation was added to the website on 2018-04-19 and was last modified on 2020-02-29. You can find similar pages at Datasets.

An earlier version of this page appears here.