Hello.

This is a multi-part question. I have researched each aspect of it before posting here, but I am posting this in case anyone may point out something I might have over-looked.

  • I am building a profanity filter to be used upon submission of a form - I assume, obviously, that this must be run before the actual data is submitted to the database. I am thinking this list of words (A seperate, include file of an array being called) should be located on the same page above the DB connection code. This I need to make sure about.

  • Also, upon submission, should there be profanity in the form fields I must notify the User; I have thought that if the profanity filter returns true for an offending word (From the array for any particular field) I would refresh the form page, the offending fields now empty, and display a jQuery message saying something to the effect of:"Please use appropriate content!". This part is confusing to me.

  • If the profanity filter catches a bad word in the form before submitting data to the DB, is it best to refresh/redirect the form/page like this? If so, where does this particular code go? Beneath the profanity filter code (Above the DB connection code) by using an AND OR to cycle through the array, for example?

I hope I am explaining this correctly and clearly.

Note: I have no code written for this yet to provide as an example for my question.

Thank you in advance for any advice or pointers in the right direction!
Matthew

I don't know if my approach is appropriate to the condition, but normally what I done is I will put the filter just before data inserted to database. And redirect back to the form with error message code as indicator for display the error message.

As for the processes of the filter. I will put the list of text into an array and perform with:

foreach ($array_text as $text){
    if(strpos(strtolower(htmlentities($POST['comment'])),$text) !== false){
        return true;
    }
    return false;
}
Member Avatar for diafol

Profanity filters don't really work as users will find a way around them. fcking hell, etc. It's the intent that should be policed not the word IMO. Some may find the correct spelling offensive, but those who are offended by profanity will equally be offended by a mispelt offering - otherwise they're just spelling snobs. Filtering all profane words with all various common spellig mistakes is pointless (IMO again) and you run the risk of false positives. Perhaps FCKEditor would be a problem or the mispelling of counting by dropping the 'o'.

However, they are in use in many forums, including this one, and they do prevent a lot of crap getting through.

You could do a client side filter in jQ and then repeat the filter on the server. This could prevent submission in the first place and lessen frustration for the user. Again it could be Ajaxed, where the filter is run on the server, without the user being aware of it. Client side offensive word filters are there for all to see in js, so a determined user could check the filter and post to his malicious heart's content.

A clear policy is a good idea with maybe a short message below the posting box. The ability for other users to report a message would also be useful. My 2p.

commented: Great advice. +7
commented: That took you a while. +1 :D +4

diafol:

A couple of questions if you don't mind...

  • If I am storing all of the profanity words in an array that is acting as an include file which is called before the data is submitted to the DB, how will anyone know how to find/view it in order to see how to by-pass the filter? I do not wish to hard code them directly into the page so the source can be easily viewed via right-click.
  • I was planning initially on doing this in PHP - Why do you suggest jQuery? (I am not very familiar at this point with JQ). What are the benefits of JQ in comparison to just using a PHP array to cycle through for the forbidden words, eliminate them from the form and show a message informing the User to correct the fields?
  • I do understand that some malcious people may atempt to by-pass the filter by misspelling curse words, but I am hoping to catch the most general, common and obvious words - I have a list of about 300 prohibited words to cycle through at this point.

Thanks much,
Matthew

Member Avatar for diafol

A couple of questions if you don't mind...

No prob. That's why we're here.

  1. Keep array in php and nobody will see it.
  2. No need to use jq at all.
  3. That's fine. Prevention is better than cure. You'll never catch all of them anyway. Just saying that you need to police intent and not purely rely on automated filters.
commented: Thank you! +8

Honestly, it's pretty pointless. I'd say that it's best to just ban those that swear if it means that much. I personally don't filter any profanity on any of my websites because I believe in a free internet. I silence those that swear to offend.

FÜ etc
in utf8, there are 1408 (thus far) ways to write eff ewe see kay and get past filters
If you are going to try to catch all possible forms of all comon profanities
:: a gigabyte sized array, processing will take so long your site will shut down

as said before, does not really work, even with a good filter, policy and policing is required
This comes up often, in classes
students seem to enjoy finding homographs for swear words

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.