25/02/06
More anti-spam regex updating
Recently there seems to have been a minor swarm of a new style of spam - insurance spam. An extra little bit to the regex and all should be good: insurance for your car, travel, motor, life or medical should all now be blocked.
Also, I've noticed before that spammers were encoding the comment title on a trackback, using the "&#" number code for the vowels. Recently, they've started doing it on the 'blog name' as well, so an update has been made:
$blog_name = preg_replace("~&(#?)(48|49|5[0-7]|6[4-9]|[78][0-9]|9[07-9]|1[01][0-9]|12[0-2]);~e", "chr('\\2')", $blog_name);
Now all blog names that encode any standard english letter will be swapped back to their real letter.
As a final note, it's rather strange, but I've actually received hits for searches on "anti-spam regex". As much as I like the idea of being some important spam-fighter, the complete regex is staying private so that it doesn't become common and the spammers don't learn it.
Edit: Typical - my anti-spam filter blocked my trackback to the last anti-spam post. Oh well, at least I know it's working!
Edit 2: ten hours later and four comments got through. Turns out that "\b" for word boundary counts a hyphen as a word boundary, but not an underscore! That minor oversight is now fixed and insurance-scam spam should now be fairly well blocked (until they decide to advertise some other form of insurance, on they try Pratchett's Inn-Sewer-Ants!)
Comments, Trackbacks:
No Comments/Trackbacks for this post.
Navigation
Site Navigation
Links
- Hive World Terra - articles, downloads, fanfiction and an encyclopedia for Games Workshop's games.
- Skins@HWT - Skins, textures, badges, banners and team colours for Dawn of War, plus tutorials, tools and an FAQ.