Wednesday, April 1, 2009

One example of why government web filter would suck

Look at smaller examples to see how bigger ones work.

In many work places, internet filters are put in place to prevent access to things that shouldn't be allowed at work.  Obviously if you are going to look at certain things on a work computer you are an idiot, but blocking of some other 'time wasters' happens as well, so things like facebook and myspace get blocked.  At the end of the day if you really feel the need to look at something blocked, whether you believe it is rightfully blocked or not, you can view it on your personal internet when you get home.

Anyway, interesting thing is the false positives that happen.  The other day I was googling some semi-work related stuff to understand stress as an engineering/geological idea.  This lead me to wiki page on Buckling, which appeared oddly to be blocked by works filter.

Couldn't figure this out, but someone who had seen the filter rules said there was something in there to block "uck.in" .  Looking up uck.in when I got home, I saw that this is a site which provides lists of anonymous proxies, so understandably the kind of site you would want to block at work (where such proxies would allow you to bypass all works filters).  However the filter for "uck.in" does not just block the hostname.  This blocks any url with "uck.in" appearing anywhere in the address, be it the host, path, or part of the query string.

Confused about why uck.in would match buckling?  The filter list uses regular expressions.  In regular expressoin, a period is a wild card for ANY character.  So will match "uck.in" as well as "uckain" "uckbin" "ucklin" "uck?in" "uck/in" "uck5in" etc... you get the idea. 

A few days later, xkcd put up this comic: http://xkcd.com/537/ but the image would not display? What was going on?  The image was called duckling.png.  There's that nasty regex again, better block it!

So obviously there was a flaw in the filtering.  We raised a ticket with the relevant group in charge of the filter, pointing out how they had mistyped the pattern and were blocking extra content.  The response was "thanks, we'll take that as a suggestion", which effectively means we can't be bothered fixing it, you can't show you need access to a site for work purposes which is being blocked.

So the annoying part?  If during my day to day job I one day I hit a URL with that in it, how do I know if I need access to it?  It might have the exact answer to a problem I need, or it could be a bogus link trying to sell a product (this is what happens when googling for solutions).  It would be very hard to show that I do need access when I obviously can't no in advance what's behind that url.

So I can see the same thing happening with government filter.  They get over zealous, block extra content, but they have no motivation to fix it so why would they?  Once something is falsely blocked, how can you prove it is erroneously blocked when you cannot access it in the first place to prove their is no objectionable content there?

Many other examples of why government filter would suck, this is just one.