User Generated Forms with WTForms

18 Apr 2017

As part of my past work with the Office for National Statistics (ONS), I worked with the survey runner team to add additional features to their Electronic Questionnaire which enabled the business to take regular surveys electronically via the web. During my final months there, I took it upon myself to tackle what was seen to be one of the projects major pieces of technical debt, its use of a custom form renderer. This was identified early on as a feature that shared a lot in common with existing form libraries, but our current renderer wasn't able to be easily isolated from the rest of the system and was therefore pushed onto a debt backlog after a possible approach had been prototyped.

Due to the hard work of the entire team, we'd made some strategic changes to the system that meant this was possible in my later months. WTForms was the library we wanted to use, but we needed a number of features which weren't documented or supported out of the box. Chief among these was the ability to create forms dynamically, informed by a JSON description.

Dynamic form definition

WTForms generally expects metaclasses that describe form definitions, as per the following example from their docs:

class MyForm(Form):
  first_name = StringField(u'First Name', validators=[validators.input_required()])
  last_name  = StringField(u'Last Name', validators=[validators.optional()])

I found it easier to think as these definitions as 'schemas' for form classes. Typically you'd use them to describe a class which doesn't change, so WTForms use of them for describing our dynamic forms led to some problems.

When the instantiated form is passed to jinja and rendered as part of a template helper call, appropriate inputs will be rendered for each field definition, with names corresponding to the attribute names (first_name, last_name). We can then take the post request form and pass as an argument to our form class to validate during a submission.

def register(request):
  form = MyForm(request.POST)
  if request.method == 'POST' and form.validate():
      user = MyForm()
      user.first_name = form.first_name.data
      user.last_name = form.last_name.data
      user.save()
      redirect('register')
  return render_response('register.html', form=form)

The approach taken for the ONS was to read in a json description from file and set field attributes dynamically within a method upon what was essentially an empty form class. I've simplified the actual project code below to remove detail outside the example.

def get_answer_fields(question, data):
    answer_fields = {}
    for answer in question['answers']:
        name = answer.get('label') or question.get('title')
        answer_fields[answer['id']] = get_field(answer, name)
    return answer_fields

def generate_form(json_for_page, data):

    class QuestionnaireForm(Form):
        answer_fields = {}

        for question in SchemaHelper.get_questions(json_for_page):

            answer_fields.update(get_answer_fields(question, data))

            for answer_id, field in answer_fields.items():
                setattr(QuestionnaireForm, answer_id, field)

        if data:
            form = QuestionnaireForm(MultiDict(data))
        else:
            form = QuestionnaireForm()

        return form

Here you see that we retrieve details of the json structure for a survey page, and maintain a map of answer fields to be set on the class. The answer ids within the page structure will always be unique and is used as the key for the mapping to each field type. In this way, the resultant form is a collection of fields able to be driven dynamically from our json description.

Validating Dynamic Forms

As well as the fields on our forms being customisable, the validators used upon them were too. This meant for instance, that some fields were optional or required and the messages used for validation needed to be informed by the survey's json description.

Below you can see an example of the message for a DateRequired validator being updated based on the message from the schema. You can also see an example of modifying the behaviour of the validator, based on whether or not the date validator includes a day attribute as part of the data sent. This allows for the same validator to be used for different form representations of dates.

class DateRequired(object):
     def __init__(self, message=None):
         if not message:
             message = error_messages['MANDATORY']
         self.message = message

     def __call__(self, form, field):
         if hasattr(form, 'day'):
             if not form.day.data and not form.month.data and not form.year.data:
                 raise validators.StopValidation(self.message)
         else:
             if not form.month.data and not form.year.data:
                 raise validators.StopValidation(self.message)

In more complex situations, it was necessary to pass custom data to our validators to later check against. You can see below, when validating custom date ranges, we pass the 'to' date and initialise the validator with it, before the call to the validator is made with the form (in this case a subform for just a date).

class DateRangeCheck(object):
    def __init__(self, to_field_data=None, messages=None):
        self.to_field_data = to_field_data
        if not messages:
            messages = error_messages
        self.messages = messages

    def __call__(self, form, from_field):

        if form.day and form.month and form.year and self.to_field_data:
            to_date_str = "{:02d}/{:02d}/{}".format(int(self.to_field_data['day'] or 0), int(self.to_field_data['month'] or 0),
                                                    self.to_field_data['year'] or '')
            from_date_str = "{:02d}/{:02d}/{}".format(int(form.day.data or 0), int(form.month.data or 0),
                                                      form.year.data or '')

            from_date = datetime.strptime(from_date_str, "%d/%m/%Y")
            to_date = datetime.strptime(to_date_str, "%d/%m/%Y")

            date_diff = to_date - from_date

            if date_diff.total_seconds() == 0:
                raise validators.ValidationError(self.messages['INVALID_DATE_RANGE_TO_FROM_SAME'])
            elif date_diff.total_seconds() < 0:
                raise validators.ValidationError(self.messages['INVALID_DATE_RANGE_TO_BEFORE_FROM'])

We actually found it necessary to go one step further in that we needed to halt validation for forms in surveys which were considered optional. The solution was an OptionalForm, allowing empty forms to be considered valid if they had no content.

class OptionalForm(object):
     """
     Allows completely empty form and stops the validation chain from continuing.
     Will not stop the validation chain if any one of the fields is populated.
     """
     field_flags = ('optional',)

     def __call__(self, form, field):
         empty_form = True

         for formfield in form:
             has_raw_data = hasattr(formfield, 'raw_data')

             is_empty = has_raw_data and len(formfield.raw_data) == 0
             is_blank = has_raw_data and len(formfield.raw_data) >= 1 \
                 and isinstance(formfield.raw_data[0], string_types) and not formfield.raw_data[0]

             # By default we'll receive empty arrays for values not posted, so need to allow empty lists
             empty_field = True if is_empty else is_blank

             empty_form &= empty_field

         if empty_form:
             raise validators.StopValidation()

Final Thoughts

One of the main pain points with WTForms adoption was the lack of form-level validation supported out of the box. We had for instance a number of custom forms, which fell outside the generic examples above, where it was necessary to override the validate method on the form itself. This added complexity to the form that I felt belonged in a validator, similar to the field level ones WTForms supports defined in separate classes. Adding to option of form validators would be great and cover most of the bases and seems like an oft-requested feature.

I felt that the use of WTForms really helped create a far more structured approach to form design, creation and validation within our project. Getting to the point where we could isolate and shift it over to a common library was a real achievement.

Tagged python, wtforms, | Leave a comment

Processing Camera RAWs with OpenImageIO and Python

15 Sep 2014

I recently discovered the library OpenImageIO, an awesome tool for reading and writing image files. What makes this of particular interest is the sheer variety of image files supported (BMP, Cineon, JPG, JPG-2000, GIF, DPX, OpenEXR, Targa, TIFF) (as well as variety of camera raw formats) and the fact it can perform image transformations upon them very easily. Given it's designed for use in media and VFX environments it sounds like it will be useful for the type of work I've previously been involved with in stop motion.

Additionally, it comes with Python bindings, so you can do all of this without having to learn C++ if, like me, it's been 10 years since you last used it. It also means not having to resort to use of the commandline tools I've often used in the past for similar transformations.

Mac Installation

I'm on a mac and as such, brew is my weapon of choice for installing libraries. Unfortunately, openimageio is only available through a tap and an old version at that. I managed to compile outside of brew, but found that I couldn't get python to read the bindings correctly. I wanted to install a 1.5 version of the library, so found the best solution was to download the formula, modify it and tap homebrew/science to get the libraries it depends on. Your local openimageio formula takes precedence over the tapped version, so it won't get overwritten.

wget -O /usr/local/Library/Formula/openimageio.rb https://raw.githubusercontent.com/Homebrew/homebrew-science/master/openimageio.rb
brew tap homebrew/science

You can edit the formula here if required, using more recent version.

brew install openimageio

If all goes to plan, brew should install openimageio along with all dependencies. Additionally, I've edited my $DYLD_LIBRARY_PATH to include the path to the installation (/usr/local/Cellar/openimageio/1.5.3dev/lib).

Reading and Processing RAWs

OpenImageIO uses the ImageBuf class to handle representation and manipulation of images in memory and functions of the ImageBufAlgo class to transform them. It's best to demonstrate through some examples.

import OpenImageIO as oiio

# Read a camera raw, crop and write out to a tiff
buf = oiio.ImageBuf("Dino_001_01_X1_0066.cr2")
cropped = oiio.ImageBuf()
oiio.ImageBufAlgo.crop(cropped, buf, oiio.ROI(1208, 4901, 814, 2385))
cropped.write("cropped.tiff")

# Create a new larger buffer and paste the crop into it, vertically centred
extended = oiio.ImageBuf(oiio.ImageSpec (3693, 2077, 3, oiio.FLOAT))
oiio.ImageBufAlgo.paste(extended, 0, 253, 0, 0, cropped)

# Create a new buffer, resize the extended image to 1920x1080 and add some text
resized = oiio.ImageBuf(oiio.ImageSpec (1920, 1080, 3, oiio.FLOAT))
oiio.ImageBufAlgo.resize(resized, extended)
oiio.ImageBufAlgo.render_text(resized, 1300, 1030, "00066.cr2", 50, "Arial")
oiio.ImageBufAlgo.render_text(resized, 1600, 1030, "00:00:02:18", 50, "Arial")
resized.write("final.jpg")

THat ultimately takes the following lovely shot of a dino from RAW form to that with extended borders and burnt in information.

What's great about this is that we've used a single library to perform this series of transformations and we've not had to break out to use commandline operations to do so, instead manipulating them from the comfort of python.

Chaining commands

From using various javascript libraries (jQuery, Underscore), I've become used to the simplicity of being able to quickly chain a whole bunch of commands together. For that reason, I wrapped a number of the OpenImageIO ImageBufAlgo methods up into a single class OiioChain so that it's possible to perform the same set of above transformations using a somewhat more concise syntax. You can find it on github.

from oiio_chain import OiioChain

chain = OiioChain("Dino_001_01_X1_0066.cr2")

chain.crop(1208, 4901, 814, 2385)\
    .extend(3693, 2077).resize(1920, 1080)\
    .text(1300, 1030, "00066.cr2").text(1600, 1030, "00:00:02:18")\
    .write("final.jpg")

Displaying Images with Image Viewer

OpenImageIO even comes with it's own built in image viewer, to handle quick display of images or to incorporate viewing capabilities into your own software. You can access this from the terminal by using the executable "iv".

I'm really enjoying experimenting with OpenImageIO and am currently experimenting putting it into a service which performs on demand storage and transformation of media which I'll write about further at a later date. Currently, there's no means of being able to process stills into videos, for which further tools would be required, but I've heard word that work is ongoing to implement libav capabilities.

There's such a huge amount that's possible with the library which I've not covered and it should definitely be at the top of your list if you're looking at handling RAW media for your own projects.

Tagged openimageio, python, | Leave a comment

Splitting a date range in Python

01 Jul 2014

Dates are one of those annoying things that shouldn't be, but are regularly difficult in web apps. I used the following two methods in a recent page to neatly break a date range into distinct segments as part of a analytics app I'm currently working on. Hopefully someone else will find them helpful.

import datetime, calendar

# Find the delta between two dates based on a desired number of ranges
def datedelta(startdate, enddate, no_of_ranges):
    start_epoch = calendar.timegm(startdate.timetuple())
    end_epoch = calendar.timegm(enddate.timetuple())

    date_diff = end_epoch - start_epoch

    step = date_diff / no_of_ranges

    return datetime.timedelta(seconds=step)

date_delta allows me to create the timedeltas between two dates based on a desired number of segments.

# Edit 18/07/2014 - I realised dates needed the hrs, mins an secs correctly
# adjusted to beginning / end of day
def datespan(startdate, enddate, delta=datetime.timedelta(days=1)):
    currentdate = startdate

    while currentdate + delta < enddate:
        todate = (currentdate + delta).replace(hour=23, minute=59,second=59)

        yield currentdate, todate

        currentdate += delta
        currentdate.replace(hour=0, minute=0,second=0)

I can then pass the delta into datespan above, which returns an iterable I can then use.

# Get timedeltas based on splitting the range by 10
delta = date_delta(startdate, enddate, 10)

for from_datetime, to_datetime in datespan(startdate, enddate, delta):
    print from_datetime, to_datetime

The result forms part of the following d3 chart:

Tagged coding, python, | Leave a comment