XML Validation in Mono and .Net
XML is a fairly standard way of storing data these days. Through the use of schemas you can define data files that are readable by humans (not in the same way as a story, but in as much as the human can see and understand each character), processable by computers, properly structured and easy to validate. Unfortunately, while trying to use some of the more complex sections of XML validation, I encountered numerous confusing and awkward situations. The following is a record of some of those issues
What are keys and keyrefs?
Before I get into the detail, not everyone will be familiar with keys and keyrefs. Where as a simple schema may use
xs:IDREF as the type of attributes, which are unique within the document and refer to any ID within the document respectively, a more complex schema can define keys and keyrefs. Each one uses a pair of XPath statements to define either the context of uniqueness and the attribute/elements that must be unique, or the location of the reference and the attribute/elements that refer to some other ID. If you have multiple elements with IDs that are supposed to be locally unique instead of globally unique then keys and keyrefs are a much better option.
Validation warnings - beware of being over-eager
The first problem I encountered with Microsoft's .Net framework was caused by over-eagerness on my part to very strictly validate documents. To do this I added the
XmlSchemaValidationFlags.ReportValidationWarning flag to the schema settings. Unfortunately, the method I set for my
ValidationEventHandler basically just threw a wrapped exception. This didn't appear to cause a problem in Mono, Windows was rejecting all of my documents that had additional attributes in a declared namespace, even though I had set the
processContents to "lax" on the
<xs:anyAttribute> element. These were only warnings, though, and so removing the flag or handling warnings without exceptioning stopped Windows complaining.
Cryptic messages with keyrefs
Some of the messages that Mono produces for XML validation can be a bit cryptic in terms of working out what the exception was and whether you're getting what you expect. When I was first testing, having made some failed attempts at keyrefs before, I received message on a test for a keyref that referenced an non-existant value:
XmlSchema error: Target key not found. Related schema item: SourceUri: file:///….xsd, Line 20, Position 3. If you only read the start of the error then it appears to be validating correctly and that it can't find the value, but in fact it can't find the key referred to by the keyref's "refer" attribute.
If you have a valid schema and attempt to test a keyref to a non-existant key then the message you should get from Mono is
XmlSchema error: Invalid identity constraints were found. Referenced key was not found: line 17, position 6 XML Line 31, Position 3. In this case "invalid identity constraints" means that the referenced key value didn't exist. The line numbers are provided, but only in the message. After first receiving the previous message, this message could certainly have been clearer, but now I know what I'm looking for then I can validate documents.
Validation messages and a lack of information
Being a mixed-language developer (I work day-to-day in Java and spend my evenings using C#) I sometimes notice interesting differences between the language. Sometimes they're good, but more often than not they're bad. The latest one, which may or may not hold true, is that C# exceptions seem to be less specific and useful than Java exceptions. The API's that I'm used to at work give me useful information, but the C# XML exceptions seem to hide it in the message content, and even then it is only there on some platforms (see the inconsistency section below).
When your validation fails (which it should, if you're doing unit testing of your file loader/parser and document handling) then you get a
XmlSchemaValidationException. The supporting data includes a line number and column. For key/keyref validation errors, this unfortunately seems to be the line number and column of the end of the file - not very useful for pinpointing the error. The position is logged in the message, but as it is generally bad practice to give the API exception messages directly to the user (since they're often meaningless to them) then the information you need to tell them is inaccessible (Mono) or not even included (.Net).
Inconsistency with exception messages
It's hardly surprising, given that the two frameworks are developed by different groups of people, but the .Net framework and the Mono framework exceptions use different messages for errors with exactly the same part of the validation. Unfortunately, due to the lack of information in exceptions mentioned above, this makes unit testing very difficult. You can't rely on tests against the information in the message, because it isn't there or isn't specific, and you can't rely on testing the exception message, because then your tests are tied to a single platform and will report failures on success if people use a different framework. The only solution I've found so far is to just hope that checking for an expected exception is enough and that you're getting it for the right reasons.
As an example, if a keyref references a key that doesn't exist then Mono will give you an exception with a message of
XmlSchema error: Invalid identity constraints were found. Referenced key was not found: line 17, position 6 XML Line 31, Position 3., where as .Net just gives you
The key sequence 'slot2' in Keyref fails to refer to some key. It's a win for Mono in terms of telling you where it is in the message, but a loss in that its second positions, the one the exception lets you access, is the end of the file where as .Net gives you the actual positions of the incorrect reference.
Mono-specific issues with validation
Finally, at the time of writing (June 2010) there are a couple of bugs in Mono that mean that validation doesn't work correctly. The way in which you use XML and validation will affect how important these bugs are to you.
The least serious to me is that the
SchemaInfo.Validity always returns
NotKnow. This may mess up some validation if it is checked, but my code parses the document itself and so assumes that no exceptions means valid code.
The more series to me is that using the
XmlSchemaValidationFlags.ReportValidationWarning value for the validation settings means that Mono will validate correctness of tags but fails to validate keyrefs. Warnings are only warnings, but if you'd like to know about warnings then you'd hope to be able to do it and still keep full validation.
In conclusion, you need to be quite careful as to what you do with XML validation. Some of the messages are a little more cryptic than they need to be, most errors could probably give you more information than they do at a programmatic level, and there are a number of cross-platform inconsistencies in both messages and data. With that in mind, though, it is possible to use XML schemas and keep your XML quite tightly validated on both Windows with Microsoft's .Net framework and on any platform that Mono supports.