IDENTIFICATION OF ANONYMOUS USERS IN TWITTER
Computer Science and Engineering, MSc Program, 2012
Assoc. Prof. Yücel Saygin (Thesis Supervisor), Assoc. Prof. Berrin Yanikoğlu, Asst. Prof. Hüsnü Yenigün, Assoc. Prof. Mehmet Ercan Nergiz, Assoc. Prof. Tonguç Ünlüyurt
Date & Time: August 2nd, 2012 – 11:00
Place: FENS L063
Users may have multiple profiles when writing comments, blogs, and tweets on the web. While some of these profiles reveal true identity, the others are created under pseudonyms. This is essential especially in the countries with oppressive governments where activists are writing pseudonymous tweets or Facebook messages. In these countries, government officials discovering the fact that a person is among the activists may have serious consequences, the activist being imprisoned, or even his or her life being jeopardized. Pseudonyms may provide a sense of anonymity, however the writing patterns of an author can provide clues that can be used to link the pseudonymous account to the public account. More specifically, one can look at some features within the text whose author is known, and build a model by using these features to predict whether a given (supposedly) anonymous text belongs to that author or not. In this work, we first demonstrate that a person can be identified as being part of a group by using his/her tweets. We used twitter since it is a popular platform, but the problem is not specific to twitter. We show that through tweets, an adversary can build a classifier from public tweets of known users to match them with pseudonymous twitter accounts. We use a simple vector-space model with tf-idf weights to represent documents and a Naive-Bayes classifer with cosine similarity measure. We show that the problem of matching public and pseudonymous accounts exists in twitter through experiments with real data. We also provide a formalism to describe the problem and based on the formalism we provide a solution to protect the privacy of individuals who would like to stay anonymous when writing tweets.