Friday, September 5, 2008

Designing API for fragile content

Just as with any other profession, programming involves quite a bit of monotonous and repetitive work. Interesting problems do come up, of course, but not so frequently that encountering one always brings a smile to my face. One of my deep interests in the programming profession is API design, so it would be fair to say that when I recently encountered a tricky API design puzzle I got pretty excited. So I was tasked with building a forms-based editor in Eclipse for an XML file with a certain schema. I started out by extending the XML source editor that's part of WTP. That gave the source view tab for my editor and I could access XML DOM that the source editor exposed. Any changes I made to the DOM would propagate to the source buffer. That's a pretty good start, but I did not want my forms UI working directly with DOM. I don't know if DOM API has any fans, but I am certainly not one of them. I didn't want my UI code getting cluttered with it. Ok, easy enough. Just take DOM and wrap it in API custom-created for the schema. Many of the elements in this particular document schema are tightly-typed. There are integers, class names, file paths, etc. My first cut at the API used these types in the getters and setters...
Integer getMinDuration(); void setMinDuration( Integer minDuration );
That works well enough when content is well-formed, but this is an XML file that's edited directly by users. Handling of malformed content is very important. Let's say that the min-duration element is found, but it's content cannot be parsed as an integer. The only option that the above API left me was to return null. That might be acceptable in some cases, but it's produces a rather poor user experience in the context of an editor. The text field that would be bound to this property would be blank, forcing the user to either type in a new value or revert to the source view in order to fix the existing value. What I wanted to do is show the malformed value in the text field together with a problem decoration so that the user can see and fix it easily. Ok, so let's augment the API a bit...
Integer getMinDuration(); String getMinDurationUnparsed(); void setMinDuration( Integer minDuration ); void setMinDuration( String minDuration );
That's better, but min duration has a default value and only positive integers are valid. A bit more API augmentation was in order...
Integer getMinDuration(); String getMinDurationUnparsed(); Integer getMinDurationDefault(); void setMinDuration( Integer minDuration ); void setMinDuration( String minDuration ); IStatus validateMinDuration();
Now I had enough information in the API to build the UI that I needed, but the API was starting to smell a bit. That's six methods for one element in the schema that has dozens of elements. There has to be a better way to structure this API. After some head-scratching, I decided to try returning a surrogate object from the getter method instead of the actual value. The surrogate would handle parsing, default values and validation...
IntegerValue getMinDuration(); void setMinDuration( Integer minDuration ); void setMinDuration( String minDuration ); class IntegerValue {     String getString();     String getString( boolean useDefault );     Integer getParsedValue();     Integer getParsedValue( boolean useDefault );     IStatus validate(); }
The getMinDuration() method would always return a non-null surrogate object. The caller then decides what aspect of value they are interested in querying. The IntegerValue class supplies default validation logic for handling unparsable content, but additional validation can be added. For instance, in this case only integers greater than zero are valid. Since range is a pretty common constraint, I made the IntegerValue constructor take the min and max values (in addition to the raw string value of the property and the default value). More complicated validation scenarios can be handled by subclassing the IntegerValue class. Note that only the getter deals with surrogate object. I wanted to keep the surrogate objects immutable so that they can be handled in a manner similar to basic value types without worrying about synchronization. When setting a value, you either have a raw value (either it can't be parsed or the code in question doesn't want to deal with parsing it) or you have a tightly-typed value. An overloaded setter method takes care of both of these scenarios. As you can imagine, it was simple at this point to extend this pattern to other types. I created a base class for all value types, which made it possible for some code to handle variety of types without knowing what they actually are. A good example of this is text field data binding code. Since any value can be retrieved and set as a string, any value can be bound to a text field.
abstract class Value<T> {     String getString();     String getString( boolean useDefault );     T getParsedValue();     T getParsedValue( boolean useDefault );     IStatus validate(); } class IntegerValue extends Value<Integer> {     ... }
I actually ended up using the same pattern even for properties that were strings by creating a StringValue class. Even though there is no parsing involved, the benefit of having consistent access to default value handling and validation made it worth it. So what do you think?

3 comments:

Eric Rizzo said...

Did you get to integrating with the JFace Data Binding framework? It seems there is at least some overlap between that and what you've done...

Konstantin Komissarchik said...

I did not end up integrating with JFace Data Binding framework even though one of the goals of designing this API was to facilitate richer data binding. It turned out that there was no advantage to integration since I would not have been able to re-use any of the pre-built bindings that are in the framework. I had to write binding code from scratch so that I could make it be aware of additional information exposed by this API (the default values and validation).

Renat Zubairov said...

IMHO that's exactly an usecase of EMF. You have a domain model expressed as XML Schema. You just need to convert it to ECORE (automatically) and then generate a model. EMF generated model has a notion of features - which is actually an attribute of the classifier (for XML schema that's are simple types). And already now you can distinguish between two states of each EMF feature - set and unset. It means if we have a complex object reference it can be null it can be set to some value and it can be unset. Also features have default value and could be nullable or not.
Did I mentioned that when you generate ECORE from XMLSchema then EMF can serialize/deserialize EMF models to XML documents according to schema?