So I'm interested in learning about data validation. Especially in Python.

Python all ready has several common idioms for data validation.

There are several statements that evaluate data. For instance, isinstance(object, classinfo) will check that the given object is an instance of the class or type in classinfo.

One idiom is that it is better to ask forgiveness than permission, which basically means it is better to try something and catch the errors instead of trying to force the data to be valid. The main language feature that supports this idiom is the try statement and its associated statements.

It can be very bulky to write. I wonder if it would be possible to write a function wrapper to simplify the writing of such a check or if such wrapper would be worth it or appropriate.

In addition, it can only catch general errors. Sometimes what is technically valid input is invalid input according to design specifications. Now I've read that there is a way for a user to create their own errors and exceptions. Now I don't know if this is true or not, but if it is that would seem the most natural thing to do.

However, another option might be to create custom data validation functions. I'm not talking about the poorly coded "if x != 1 and x != 2 and x != 3", but powerful yet general functions to ensure that data is valid.

Here's the Wikipedia entry for data validation.

http://en.wikipedia.org/wiki/Data_validation

What are your thoughts?

The Pmw megawidgets toolkit for tkinter implements a mechanism for data validation. For example the background color of an entry field turns pink if you enter bad input. Although this module is old, it could be a good starting point for experimenting with data validation. You can create your own validators, and you can also browse the pmw source code to understand how it's implemented, which could give you ideas to develop your own system.

the poorly coded "if x != 1 and x != 2 and x != 3"

Why yes, that is poorly coded. Good thing this is Python!

>>> x != 1 and x != 2 and x != 3
True
>>> x not in [1,2,3]
True

Data validation is extremely important for code that is distributable.
If you're writing scripts for you and only you that will never be used by anybody else than you can skip this; however when you have other users you should consider that they did not sit with you while you designed your program. They might completely miss the concept of what you're trying to achieve. If they provide improper input it is highly likely that your program will crash.

There's also going to be malicious users. These 1337 h4x0rs could intentionally try to either crash the system distributing your code or piggy back through your code into the guts of your system, causing all kinds of havoc.

Just my $0.02

If data is truly coming from an untrusted source, then validate it to death as soon as the dataappears in your program, but don't sprinkle checks thoughout your source. Assume fellow programmers know what they are doing - don't add 'defensive code' to check arguments for example - assume that the arguments to functions are fair. This will allow for the later use of duck typing for example, and it is best to treat fellow programmers as competent.

- Paddy.

When validating, you want to include, not exclude. So you want a list of data that is acceptable, and the input has to be in the list. That way, when another unit or type of data is introduced it is automatically an error. If your program excludes, then the new unit would be automatically included, since you did not explicitly exclude it, which is not a good idea.

Does anyone have a good synonym for exclude or include. This post could certainly use one.

Does anyone have a good synonym for exclude or include. This post could certainly use one.

Permit and deny might be good synonyms for include and exclude in this instance.

This will allow for the later use of duck typing for example, and it is best to treat fellow programmers as competent.

- Paddy.

Duck typing. If it looks like a duck and quacks like a duck, it's a duck. When validating data using duck typing, you perform a set of actions to test whether the data acts a certain way. Unlike checking whether the data is an instance of a certain object, this allows developer and user created objects to work so long as they operate like the a certain object.

...When validating data using duck typing, you perform a set of actions to test whether the data acts a certain way.

Hi, actually you *don't* check. You assume and use data as if it is of the correct type. Pythons run-time checks would throw an exception if they are not, and the programmers necessary knowledge of the program would assure correct calls.

- Paddy.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.