Rangatira: A discourse on spamming

Spamming is unsolicited commercial email, something at least i don't want to receive, eats operative (and capex) costs to have spam gateways to remove them, takes time to go through them etc.

The history of spamming dates back in 1978, when a sales presentative from DEC mailed to 600 users of ARPANET. Technical limitations limited this however for only half of them receiving it.

The motivation on successfull spamming over internet is that it is so cheap; you can reach millions of people all over world with relatively low cost, and more receivers you have, chances of someone buying increases; there's always some small percentage who think marketed stuff is great for them and they would not have known this before. Thus even 0.01% from millions of spam receivers can make it worthwhile. Also spammers don't bother target their advertising much. However, it is good to note that originator of spam sees the work cost/benefit as well - by if automation works, how much labour is needed etc.

So on my hobby site - roughly 80 000 registered users, 1.5-2.0 million hits per month, which is quite good for single server site - being online since 1990's with different variations of software and hardware started to get spam, and thought to write measures i went through to stop it (so far). Yes, it is not big site like ovi.com/yahoo.com/google.com but same principles here apply.

Post article - feature. This feature is an article submitted to site's main page as news. This is not discussion forum articles, posts, which also exist. All submissions go to moderation before publication, moderation means accepting and publishing it, maybe some feedback first etc. This specific started to receive spamming, funnily enough right after opening the site to search-engines, with hundreds of submissions;

- it can be administrative hurdle to delete all; need to go db level unless coding web interface. On a corporate environment that would have meant to ask db admin to be involved, maybe even start a (waterfall) project for that . On this case when i am also a dba, i can do changes on every layer.

- above means that it is a cost for me to do extra work to prevent spamming, remove them on site - double so if i am doing already something else and need to get back to fix this.

- better approach would be to block spam coming in first place. Only way to achieve this is to not have a site at all.

- however, cannot prevent dedicated person - so making it more expensive to post anything without normal user difficultness is the key. Although the site is high profile, which means it probably will lure some more spam in the future.

Preventation also can be reflected into game theory - if harshly going after individual behind spamming - that will create annoyance status as risk getting caught/spending more time to try to spam rises. Similarly like in game theory on not caring about monetary benefit but just win. If allowing spam (or paying ransom) it encourages the benefit from it. This is also reason why one should make it hard to have impact from attacks (patching is not making hard). Ideally could come public with this - like facebook has coming from their successfull endeavours on catching spammers, which shows that they will get after people and sue them - thus it might not be best interest of spammer to try spam through facebook due risks rising.

Each described step individually done, is not a big thing to bypass, but together make scripting quite hard, and not worth of - in top of some individual admin related actions create can create thought that it is better to go elsewhere (e.g if admin actively bans ip address, or change pwd for user, it requires miscreant to do some laborsome tasks and also that one is under constant hunting)

before allowed to post article, users need to register to be on site, and also be at level 1.

by default you are 0. This means waiting before can do anything other than read, thus no immediate ability to spam and cost time.

spammers use spam on email addresses on domain names; there is no reason to show the email address of anyone; site has internal messaging system built in, similar like in e.g facebook, thus address is shown only if you are level 2 or above, which generally means you are a contributor and trusted. This also lessens the exposure mentioned spam can be seen. Which makes impact limited. Also this helps to keep harvesters inch away.

spammers also filled personal info with spam info. So took them away, only required for registration is username, password, email

registration form has captcha, got suspicious about breaking it automatically, though not confirmed; created multi-color captcha with more transparency on colors and lengthened it, at least registration attempts lessened which looked scripting based on logs.

to make scripting harder, the posting article informed to register and having link to http://127.0.0.1, the script following link gets dossed.

for active spammers doing blindly, just changed password for account; meaning they have to create new, write stuff, and also wait until i bump them -> not so cost effective for spammers point of view, also gives mental image that someone is "fighting" against spammer - this is also important. similarly like best way to fight against graffiti is to clean them away as fast as you can.

ip address blocked, more work to find working ip and thus time/cost.

hide some functions from site which store user input etc- like post article, downloads unless logged on, and level 1. <-- audit trail, more time

spammers started mirroring site, so blocked on a - class network, and put this downloads requiring registration and logged on, dropped cookie validity time, meaning miscreant need to do active job in order to mirror the site.

requiring logged on, level meant they need to wait.

as apparently hardcoded scripting, renamed script used, and this now can be seen only if you are level 1 approved.

preventing reposting from same ip for 60 seconds.
requiring valid email addresses (doing check for existance of mx records for domains).

checking the maildomains against blacklists (spamhaus etc)

cookie lifetime reduced -> extra work to log-in again. (not a big in itself but with all these it becomes costly.)

These on application level - in addition to existing measures, of course there is firewall preventing something on tcp/ip layer which also reduces the load for application layer - level but that is a different story, but it is good to know what each layer can and cannot do instead of it being either "application problem" or "network problem"

Have been quiet for a while on spamming front. and so far no cries from ordinary users to complain about site. and less amount of administration needed now, thus cost savings from proactive effort for the future.

Hope this gave some ideas of problem - solution thinking, yes there can be more complex stuff included, standard answer being some bayesian stuff but that does not make spamming costly, just trying to block it, based on some rules, of course debate could be sending costs, but spammer sending is cheap, check ideas from OpenBSD spamd to increase cost, of course with botnets cost is subjective on this case.

Rangatira

Saturday, January 08, 2011

A discourse on spamming

No comments:

Blog Archive

Links