Friday, October 16, 2009

Email obfuscation is broken

An year and a half back, I was working on some project where I had to parse students' and professors' homepages to mine interesting information from them.
One common underlying pattern I found on their pages, from new grad students' to emeritus professors', was the method of obfuscation of their email ids.
Though they were in the right track on preventing spam bots from harvesting emails and spamming them, the actual way they did it was as insecure as protecting a house by fastening windows but leaving the door wide open. One should remember that people who write spam bots are not foolish. They earn lots of money from collecting emails and selling them to people who do the actual spamming. These trivial ways won't stop them by any means.

There are three popular methods used to obfuscate, each progressively difficult for the spam bot:
1) Just replacing the '@' and '.' characters, like email_id AT server DOT edu.
2) Encoding semantics, like "first 7 characters of last name" AT server DOT edu.
3) Placing the image of the email id.

Any good spam bot can easily extract the correct email id if methods (1) or (2) are used. It just has to have a list of common patterns that people use, and make some string substitution and educated guesses.
Method (3) is extremely secure, but unfortunately it is like a fastened window, where the open door is the URL of the webpage itself.
Typically the students' or professors' pages have URLs with a definite pattern, like cs.university.edu/~emailid or people.college.edu/emailid or emailid.school.college.edu. As you can see, almost all the URLs have email id encoded in them, after all this is how the page itself was created automatically.
From the URL, it is easy to guess the email id. For example, in the third example, the email would be either emailid@college.edu or emailid@school.college.edu.

So, what is the way out? The answer is there is no way out until the schools themselves change the URL naming scheme of the pages they assign to their staff and students.
Till then, the best way is to put the image of the email id and, in addition, route the emails via Gmail, hoping that Gmail will block the spams automatically.

 
Top 6 Posts
About Me
Hi, I am Balaji, a software engineer who sleeps 9 hours every day.
History
Visitor Count
8.8580
Powered by Javascript's random number generator :D