Thoughts on Creating a Profanity Filter

Recently I’ve been working on making a front end profanity filter. It makes more sense to have a filter be on the server side and return an accepted or rejected response, but since my strength is in AS3 I’ve decided to do it this way. There’s two parts I want to explain, the first being the reason for a profanity filter, and the second being the actual code. You can check out the demo here.

Why

If you have a site that lets users submit text and you want to control what is displayed, you’re going to need some sort of moderation system. There are two main options. The first is a profanity filter, which is a script that looks for certain words or word patterns, and allows or blocks them based on it’s parameters. The strength here is that it is automatic, but the weakness is that it can’t always catch everything, due to new slang, misspellings, context, and creative punctuation. Additionally, a filter that tries to block too much might end up blocking actually appropriate submissions. For example, for a website that lets users submit their favorite books, the filter might block “My Big Hairy Dick” but it might also block “Dick Tracy” just because it picked up on the word “dick.” Context is a major weakness, which brings us to the second option, which is a live moderator. The strengths of having a live person allow and reject is that they can easily see past misspellings, have a better idea on new slang, and most of all, they can interpret context. The downside is that the content cannot go live until it is approved, and if there are lots of submissions, that means more work for the moderator, meaning more time and more money.

My solution is to use both. First, a profanity filter is not enough, unless you don’t really care what gets by. But in most cases, clients want to make sure nothing offensive is posted to their site, both in terms of vulgarities and  brand defacement. The trick with writing a profanity filter is not necessarily blocking offensive words, instead it’s allowing appropriate words. For example, when to allow the word “ass.” You can block the word ass alone, but what about ass tied with other words, such as asshole, assface, asshat, etc. You can combine ass with many nouns to create something sounding [absurdly] offensive. And you cannot just block all instances of ass+word, because then you block assistance, assets, assist, etc. When writing  a profanity filter, the filter should fire little to no false positives.

But if there’s a person moderating content, then why bother at all with a front end profanity filter? The reason is to make that person’s job easier. The profanity filter’s job is to cut down on the amount of submissions by rejecting the without a doubt non-acceptable ones. If a site gets thousands of submissions and one person needs to go though them one by one and approve, that can take a long time. But if we can reject a handful from the start that will make that person’s job much easier. The profanity filter should never “assume” that something is offensive, – if something might be offensive (such as the “Dick” example above), then let it pass and the moderator will decide based on it’s context. I worked on a site that used this technique (in this case it was just a word list and a php script that matched literals), and we tracked the number of times users clicked submit and the number of times the profanity filter returned true (meaning the submission was rejected). The result was that over one third of submissions were blocked by a very basic filter, which gives me reason to believe that having both systems is defiantly a good idea. Now, as with the previous case, the filter could just be a gigantic word list congaing every variation of certain words, but it’s more fun to try and make a more “smart filter” that can take care of leetspeak and word variations on it’s own.

How

Rather than looking for exact words, the filter I wrote looks more or less for word patterns. This was a good case to dive into some regular expressions. The filter has three word lists (which can be set by the user, as this content should be external). The first word list are words to flag no matter what – words such as “fuck” that in the terms of the site are never appropriate or a part of any word. The second set is a list of words that are only to be flagged if written on their own, so we detect “ass” but not “assist” or “assface.” I also test for plurals and other suffixes automatically, so not only would “shit” be matched, but “shits,” “shitting,” etc. The third list is an allow list, words that might be exceptions to the previous list. For instance, block “crap” and “crappy” but don’t block “craps,” as craps is a casino game.

The first step is converting the word lists into regular expression patterns. The first step is that each word from the profanity list is split up so that various characters can be replaced. For instance, asshole can be written as a$$hole. So rather than match /asshole/ we replace letters with boolean values.  Instead we have something that looks like this: /a(s|$)(s|$)hole/.  When the regular expression is run, the ignore case flag is included so that capitals and lowercase are matched automatically. However, this does not work for accented characters (such as É and Ü), which I will mention some more about later.

In order to test for plurals and other suffixes, we add an optional ending to the pattern. This looks something like ((s|ing|er)?) and would test for [word], [word]s, [word]ing, and [word]er. We can also try and look for [word]y but first we must make sure that only y can be appended on. In some cases, usually when the last character is a certain letter (b,d,f,g,l,m,n,p,r,t,v,z)  and the character before that is a vowel, we duplicate that last letter. So “shit” becomes “shitty” instead of just “shity.” Of course, this is a very broad rule and won’t work in every case, which again is why there is a person moderating. Additionally, in rare cases by adding on the suffix will create a valid word. This is where that third list comes into play, were we can make sure we don’t match “craps” for crap.

Once our word list is build, it’s simply running the different RegEx patterns on the given text. Even with large bodies of text, this is still relatively fast, although I’m sure it would be tons faster on the server side. However, I don’t think there would be many cases where users would be submitting huge paragraphs of text. The class I wrote has two options: a normal validate function and a quickValidate function. The quick one simply keeps running patterns, and if it gets a match it terminates and returns true. The other validate function runs though each word and keeps track of the matches. It then returns a result object, that contains the validation status (true or false), as well as three arrays: a list of matches as they appear in the submitted text, the index start and end values for those matches, and a list of matches as defined in the filter list. This is useful if you need to do any syntax highlighting or anything other than just simply validating.

One issue I ran into building my demo was involving accented characters. Whenever there was a non-regularly used character, it would throw off the index position of that character. It seems for one reason or another, certain characters are considered to take up more than one index position according to RegEx. For instance, the em dash character is interpreted to take up three index positions. So if there is an em dash at index position of 5, and the letter “a” at 6, RegEx will think that the letter “a” is at index position of  8 instead. The TextField class does not think that these characters take up more positions than they do, so as in the previous example, if you tried to highlight the  letter a based on the returned index position, it would be three characters off (or if it were at the end it would throw an error). One workaround is right before validation run another RegEx that matches all these accented characters and replaces them with similar looking letters or just with an asterisk or something else – this is just for running the RegEx test, as the displayed text is left alone. This allows you to at least get the correct index values, and also will match É for E since the ignore case flag doesn’t ignore accents.

I created a demo that shows the matching though highlighting, as well as lets you edit the word lists. Keep in mind, this probably could have more uses than just for profanity.

7 thoughts on “Thoughts on Creating a Profanity Filter”

  1. Not sure if you’re looking into that, but one thing that I also did when I had to implement a similar script in the past is to remove all non-alphanumeric characters and whitespace from the string before testing it. That eliminates other forms of filter obfuscation. Ie, “S_H_I_T” or “S H I T” wouldn’t be caught otherwise.

    (Of course his also means a few false positives – “Give me cash. It is good.” would also be caught).

    1. Yeah, I had thought of that, but as you mentioned it can create false positives. I think what I should do next is remove all _ and hyphens and characters like that.

  2. Any thoughts about abstracting out characters that could be replaced by other characters, so you just have a dictionary or something saying “a” can be replaced with “@” or “4”, “s” can be replaced with “$”, etc. Seems like that could make the profanity list easier to maintain, though it might slow down the checking…

  3. I tried to create a filter for my site and I realized I was spending more time on the filter than I was building the site! I ended up using WebPurify

    it made life a lot easier!

  4. I have found that the Profanity Filter Active is “too active” and there is just too much stuff missed all because of that. You can miss a lot of good conversation by that. Some people can’t produce a thought without the use of “colorful adjectives”. That makes it bad for not just them, but everyone; because you never get to read what they are stating.

Leave a Reply

Your email address will not be published. Required fields are marked *