Custom named tuples for record storage

hughesadam_87 2 Tallied Votes 532 Views Share

This code was inspired after several discussions on the forum, and I'm especially indebted to PyTony for all of his help with these aspects.

The motivation behind this code is quite simple. I wanted a robust, light and general way of storing CSV data that could be used over and over again and had to have the following properties:

The user sets fields and defaults values, and then field typecasting was automatically employed.
The data is stored in a regular, lightweight type (In this case a mutable namedtuple)
The data is managed by a custom dictionary object.

The MutableNamedTuple class is written by another author (http://code.activestate.com/recipes/500261/) via the factory function "recordtype" and can be used as if it were an official NamedTuple object. Lines 8-137 are this source code, and in practical use cases, one would usually save it in a separate module and import it.

The user merely has to change the stict_fields to whatever datatype is intended to be store. This variable implicitly stores types, so the user does not have to do it. Defaults can also be placed in this variable. Once set, a mutablenamedtuple will be generated, and from that, the program makes a subclass called Record, which is effectively a mutablenamedtuple with extra typchecking.

Record objects then behave similar to named tuples. They are exceedingly easy to interface to file I/O and sqlite modules(http://docs.python.org/library/collections.html#collections.namedtuple_).

The RecordManager class is a custom dictionary that is responsible for storing all Record objects. This class has builtin settings for stringency and typechecking in the records. Under stringent settings, the user can define one or several object types which the RecordManger will accept (for now, only Record objects keyed by strings are accepted). Under lax settings, any valid key value pair can be stored.

The lines 261-306 are test cases.

Again, I want to iterate the purpose of this program is for the user to rip it off, redfeine the static_fields variable, and then add any custom methods in the Record and Record manager classes that are useful to analysis.

TrustyTony commented: Good effort, even has little too much automagic things +12
__all__ = ['recordtype']

import sys
from textwrap import dedent
from keyword import iskeyword
from collections import namedtuple, OrderedDict

def recordtype(typename, field_names, verbose=False, **default_kwds):
    '''Returns a new class with named fields. Credit: http://code.activestate.com/recipes/500261/

    @keyword field_defaults: A mapping from (a subset of) field names to default
        values.
    @keyword default: If provided, the default value for all fields without an
        explicit default in `field_defaults`.

    >>> Point = recordtype('Point', 'x y', default=0)
    >>> Point.__doc__           # docstring for the new class
    'Point(x, y)'
    >>> Point()                 # instantiate with defaults
    Point(x=0, y=0)
    >>> p = Point(11, y=22)     # instantiate with positional args or keywords
    >>> p[0] + p.y              # accessible by name and index
    33
    >>> p.x = 100; p[1] =200    # modifiable by name and index
    >>> p
    Point(x=100, y=200)
    >>> x, y = p               # unpack
    >>> x, y
    (100, 200)
    >>> d = p.todict()         # convert to a dictionary
    >>> d['x']
    100
    >>> Point(**d) == p        # convert from a dictionary
    True
    '''
    # Parse and validate the field names.  Validation serves two purposes,
    # generating informative error messages and preventing template injection attacks.
    if isinstance(field_names, basestring):
        # names separated by whitespace and/or commas
        field_names = field_names.replace(',', ' ').split()        
    field_names = tuple(map(str, field_names))  
    if not field_names:
        raise ValueError('Records must have at least one field')
    for name in (typename,) + field_names:
        if not min(c.isalnum() or c=='_' for c in name):
            raise ValueError('Type names and field names can only contain '
                             'alphanumeric characters and underscores: %r' % name)
        if iskeyword(name):
            raise ValueError('Type names and field names cannot be a keyword: %r'
                             % name)
        if name[0].isdigit():
            raise ValueError('Type names and field names cannot start with a '
                             'number: %r' % name)

    seen_names = set()
    for name in field_names:
        if name.startswith('_'):
            raise ValueError('Field names cannot start with an underscore: %r'
                             % name)
        if name in seen_names:
            raise ValueError('Encountered duplicate field name: %r' % name)
        seen_names.add(name)
    field_defaults = default_kwds.pop('field_defaults', {})
    if 'default' in default_kwds:
        default = default_kwds.pop('default')
        init_defaults = tuple(field_defaults.get(f,default) for f in field_names)
    elif not field_defaults:
        init_defaults = None
    else:
        default_fields = field_names[-len(field_defaults):]
        if set(default_fields) != set(field_defaults):
            raise ValueError('Missing default parameter values')
        init_defaults = tuple(field_defaults[f] for f in default_fields)
    if default_kwds:
        raise ValueError('Invalid keyword arguments: %s' % default_kwds)
    # Create and fill-in the class template

    numfields = len(field_names)
    argtxt = ', '.join(field_names)
    reprtxt = ', '.join('%s=%%r' % f for f in field_names)
    dicttxt = ', '.join('%r: self.%s' % (f,f) for f in field_names)
    tupletxt = repr(tuple('self.%s' % f for f in field_names)).replace("'",'')
    inittxt = '; '.join('self.%s=%s' % (f,f) for f in field_names)
    itertxt = '; '.join('yield self.%s' % f for f in field_names)
    eqtxt   = ' and '.join('self.%s==other.%s' % (f,f) for f in field_names)
    template = dedent('''
        class %(typename)s(object):
            '%(typename)s(%(argtxt)s)'

            __slots__  = %(field_names)r

            def __init__(self, %(argtxt)s):
                %(inittxt)s

            def __len__(self):
                return %(numfields)d

            def __iter__(self):
                %(itertxt)s

            def __getitem__(self, index):
                return getattr(self, self.__slots__[index])

            def __setitem__(self, index, value):
                return setattr(self, self.__slots__[index], value)

            def todict(self):
                'Return a new dict which maps field names to their values'
                return {%(dicttxt)s}

            def __repr__(self):
                return '%(typename)s(%(reprtxt)s)' %% %(tupletxt)s

            def __eq__(self, other):
                return isinstance(other, self.__class__) and %(eqtxt)s

            def __ne__(self, other):
                return not self==other

            def __getstate__(self):
                return %(tupletxt)s

            def __setstate__(self, state):
                %(tupletxt)s = state
    ''') % locals()
    # Execute the template string in a temporary namespace
    namespace = {}
    try:
        exec template in namespace
        if verbose: print template
    except SyntaxError, e:
        raise SyntaxError(e.message + ':\n' + template)
    cls = namespace[typename]
    cls.__init__.im_func.func_defaults = init_defaults
    # For pickling to work, the __module__ variable needs to be set to the frame
    # where the named tuple is created.  Bypass this step in enviroments where
    # sys._getframe is not defined (Jython for example).
    if hasattr(sys, '_getframe') and sys.platform != 'cli':
        cls.__module__ = sys._getframe(1).f_globals['__name__']
    return cls

### Define my fields, types and default values in one go, all of which are passed to the Record class
### Types will be set permanently based on default value.
strict_fields=OrderedDict(
    (('Name',str('unnamed')), ('Age',int(18)),\
     ('Income',float(43000)), ('FamilyMembers',list())))


class Record(recordtype('Record', strict_fields.keys(), verbose=False, field_defaults=strict_fields)):
    ''' recordtype is a factory function returns a class very similar to namedtuple except that it is 
        mutable and also understands defaults values natively. Otherwise, it behaves the same way as a namedtuple-
        it is lightweight and very easy to subclass and interface to file input/output.  
        This class, Record, subclasses the mutable namedtuple and enforces typecasting and typchecking
        on all fields.  All types and defaults are automatically infered from the strict_fields global.'''

    def __init__(self, **kwargs):
        ''' Initializes all values based on __init__ inherited from MutableNamedTuple.  SetAttr will
        be called by default on each kwarg so I don't need to call _check from here.'''
        super(Record, self).__init__(**kwargs)
        print 'Creating %s'% self.__repr__()

    def __setattr__(self, attr, value):
        ''' Performs additional validation each time an attribute is set.'''
        value=self._check(attr, value)
        super(Record, self).__setattr__(attr, value)

    def _check(self, attr, value, warnings=True):
        ''' This ensures all set attributes may be recast into their default types as defined in the strict_fields
        variable.  The program will attempt to recast types (for example will convert str(90) to int(90) if possible.
        If attempts to recast result in an error, program will raise Attribute error.

        "Warnings":  Option to print a notice each time a variable is successfuly recast.  

        In the future, this
        can get really in depth, for example prohibiting the python-legal str to list conversion 
        (eg str(['hi']) = "['hi']") and really specifying the behavior of certain advanced recasts.'''

        attrfield=strict_fields[attr]
        fieldtype, argtype=type(attrfield), type(value)
        if not isinstance(value, fieldtype): #If type mismatch
            try:
                newvalue=fieldtype(value)  #Try recast
            except (ValueError,TypeError):   #Recast failed
                raise TypeError("Argument: %s is %s and %s could not be recast to, %s" % (value, argtype, fieldtype, attr))
            else:          #Recast successful
                if warnings:
                    print attr, value, argtype, newvalue, fieldtype, 'here'
                    print ('Recasting Attr. %s = %s (%s) to %s (%s)')%(attr, value, argtype, newvalue, fieldtype)
                    return newvalue
        return value

    def get_key_value(self):
        ''' Return record as a key value pair for easy input into a **kwargs of a dictionary.
            Can customize this for many use cases.  For now, will just key by Name_age'''
        return ( ('%s_%s')%(self.Name, self.Age), self)

class ManagerError(ValueError):
    '''Custom error for failed typechecking in RecordManager class'''
    def __init__(self, entry):
        self.entry=entry
    def __str__(self): 
        return repr('RecordManager requires Record object input not \
                %s %s' % (self.entry, str(type(self.entry)))  )    

class RecordManager(dict):
    ''' Dictionary subclass used to manage records.  Keys can be automatically assigned or passed in manually.
        typecheck: Gives the user the option to be very pedantic and make sure all records are strictly of the
        Record type.  It's good to have the option; however, easy enough to turn off and be more Pythonic.'''

    def __init__(self, **kwargs): 
        ''' **kwargs allow for flexible creation as I will demonstrate at runtime.  
            There is a special keyword argument called "typecheck" which if True will stringently
            enforce all values in dictionary are of the Record class.
            Sigh, in Python3 I'd be able to give typecheck a default value (eg typecheck=True) and
            still have **kwargs, but this is not possible in my 2.7 version.  Therefore, I have to make
            it a parameter.'''
        typecheck=False #Default value, may be overridden if found in kwargs
        if 'typecheck' in kwargs.keys():
            assert type(kwargs['typecheck'])==bool  #Replace with better exception later
            typecheck=kwargs.pop('typecheck')  #Set and pop
        self.typecheck=typecheck
        self.update(**kwargs) 

    __getattr__ = dict.__getitem__  

        ### OVERWRITING THIS MESSES STUFF UP ###
    def __setattr__(self, attr, value):
        ''' Allow for attribute access of values with optional typechecking.  This is called anytime a user
            defines a new attribute, unless we are actually setting the typechecking attribute itself.'''

        if attr != 'typecheck':
            if self.typecheck:
                self._check_me(value)
        self.__dict__[attr] = value

    def __setitem__(self, key, value):
        super(RecordManager, self).__setitem__(key, value)


    def update(self, **kwargs):
        ''' User can either pass a list of Records in (*args), then keys will automatically be
            generated from the name_and_age calculated field.  If a dictionary is passed (**kwargs) .  
            If typecheck is on, all entries will be screened.'''

        ### Kwargs lets user pass dictionary with custom keys    
        for key in kwargs:
            if self.typecheck:
                self._check_me(kwargs[key])
            self[key] = kwargs[key]

    def _check_me(self, value):
        '''Typecheck a value to ensure it is of the Record class.  Maybe pedantic; however, will be helpful
           to reduce errors for new users in the codebase who are unfamiliar with good input habits'''
        stricttype=Record  #Can make this a list if more than one type are acceptable
        if not isinstance(value, stricttype):  
            raise ManagerError(value)        

    def __str__(self):  
        ''' Custom printout, just for fun'''
        return '\n'.join('%s = %s Custom info here' % v for v in self.items())

####### TESTING ####
if __name__ == '__main__':	
    #####------- Initiate Records -------######
    print '\n Making some records manually'
    record1=Record()  #Default attributes use
    record2=Record(Name='Adam') #Manually set one attribute, 
    record3=Record(Name='Bill', Age=30, Income=32000.0, FamilyMembers=['Regina', 'Betty'] )

    #####------- Set Values -------######
    print '\nWe can get or set access information through attribute or index lookup'
    print record2[0], record2.Name
    print '\nChanging name to Brutus'
    record2.Name='Brutus'
    print record2[0], record2.Name

    #####------- Take semi-bad input like a boss -------######
    print '\nSmart enough to handle some mistyped inputs...'
    record4=Record(Name=7000, Age=str(35), Income=32000, FamilyMembers=tuple(['Sandy', 'Jessy']))

    #####------- Populate the manager class (manual keys) -------######
    print '\nOk, lets put all these records into the RecordManager storage object'
    print '\nI will pass some through by assigning them keys "r1, r2, r3"'
    all_records=RecordManager(r1=record1, r2=record2, r3=record3, typecheck=False)

    #####------- Populate the manager class (automatic keys)-------######
    (k,v)=record1.get_key_value() #This works just sep out key and value
    print '\nAdding a record with an automatically generated key, %s \n'% k
    all_records[k]=v
    print k, v

    #####------- Demonstrate how turning off typechecking allows flexible input######
    print '\nBecause I set typecheck to False, the dictionary doesnt care what type the values are.'
    print'\nI will overwrite the value by attribute and then by index'
    all_records.r1='Just a string' #Prevents bad overwrite
    print all_records.r1, 'new value for r1'


    #####------- Demonstrate how turning on typechecking forces stringent input######
    print '\nBecause I set typecheck to True, the dictionary errors when I pass a non-Record object for a value.'
    print'nI will overwrite the value by attribute and then by index'
    all_records.r1='Just a string' 
    print all_records.r1, 'new value for r1'  


    #####------- Extra (comment out above section) ##########
    print '\nJust an example of such bad input that the Record object cannot be created'
    r5=Record(Age='string input')
TrustyTony 888 pyMod Team Colleague Featured Poster

This is not about your own code, but I find this too hackish

 if not min(c.isalnum() or c=='_' for c in name):
        raise ValueError('Type names and field names can only contain '
                         'alphanumeric characters and underscores: %r' % name)

I would prefer to use something like (untested)

    if any(not c.isalnum() and c != '_' for c in name):
        raise ValueError('Type names and field names can only contain '
                         'alphanumeric characters and underscores: %r' % name)

Also I would prefer to use elif for the various ValueError alternative to stress that only maximum one will be raised.

hughesadam_87 54 Junior Poster

Also I would prefer to use elif for the various ValueError alternative to stress that only maximum one will be raised.

Can you show an example, I'm a bit confused on this.

TrustyTony 888 pyMod Team Colleague Featured Poster

I only mean that those ifs can not be simultanously True as the function exits by ValueError.

hughesadam_87 54 Junior Poster

So, I noticed the actual nametuple class has a _make method that will return a namedtuple when one passes in an interable.

        @classmethod
        def _make(cls, iterable, new=tuple.__new__, len=len):
            'Make a new %(typename)s object from a sequence or iterable'
            result = new(cls, iterable)
            if len(result) != %(numfields)d:
                raise TypeError('Expected %(numfields)d arguments, got %%d' %% len(result))
            return result \n

So, it can be used for example:

>>> t = [11, 22]
>>> Point._make(t)
Point(x=11, y=22)

I added this line to the codesnippet above, but get an error

TypeError: tuple.__new__(DomainCDD): DomainCDD is not a subtype of tuple

I suppose this makes sense since the record class is mutable, so I replaced tuple.new with object.__new___

But I get an attribute error when I try to pass in my iterables.

AttributeError: Query

This attribute error is being tripped by something in the make function. I don't know how to get _make to work in this case.

TrustyTony 888 pyMod Team Colleague Featured Poster

Also the test I mentioned is changed in namedtuple in collections:

if not all(c.isalnum() or c=='_' for c in name):

This is basically same as I suggested.

I think you should not call your class tupple as it is mutable and does not inherit from tuple.

hughesadam_87 54 Junior Poster

Ya you're right, I had considered that as well. The author of this code actually released a more official version in 2011. Despite the time I put into working with this, I think my best bet would be to make a true namedtuple subclass which typechecks the datafields, passed in the same way I did above. Yes, it's not mutable, but if mutability becomes a concern, I can use the class above as a standin. Unfortunately, collections.namedtuple isn't a subclass; rather, is a factory function so it's harder to go in and add features. I'll get around to doing it though.

hughesadam_87 54 Junior Poster

Actually, I think I've come up with the best solution for this...

I wrote a wrapper function to return a namedtuple with option defaults and recasting. I would like to post it as a separate code snippet; it is different enough from this one I think to warrant it. Do you mind?

TrustyTony 888 pyMod Team Colleague Featured Poster

Go ahead, why not! You can just add link to this other snippet in description.

TrustyTony 888 pyMod Team Colleague Featured Poster

I do not however like so much the way you are using the global variable strict_fields, how about if you have two classes with different type fields with same name, for example 'key'. Also I would eliminate the _check_me and inline the condition.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.