Validity and the DTD

<<< There is no way, just by looking at the document, that you can answer questions like:

The only way to answer these questions and find out if the tags have been used in a valid manner is to have some other information that describes what combinations of tags, attributes, and values are correct. You need some sort of external specification to answer these questions. Such specifications are usually designed in great detail before people create documents that follow the specifications. We could use English to write the a specification for the weather report:

A weather <report> consists of a <datestamp>, <station>, <temperature>, and <wind> element (in that order).

The <station> element must have fullname and abbrev attributes, and contain both <latitude> and <longitude> elements.

The <temperature> element must contain (in this order) a <min>, <max>, <forecast-low> and <forecast-high>.

Finally, the <wind> element must contain a <speed> and may contain a <direction> element. (If the speed is zero, a direction is not necessary.)

Of course, computers can't scan this English specification, so we have to make a more rigorous, machine-readable version. The most common such form of this specification is called a DTD (Document Type Definition).

That's the purpose of the

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

which you so often see in the source of HTML pages. It tells which version of the HTML Document Type Definition should be used when a validating program is checking the document.

A Document Type Definition lists all the elements and attributes in a document and the contexts in which they are valid. Here's part of the DTD for the weather report markup language:

   <!ELEMENT station (latitutde, longitude)>
   <!ATTLIST station
                fullname CDATA #REQUIRED
                abbrev CDATA #REQUIRED>
   <!ELEMENT temperature (min, max, forecast-low, forecast-high)>
   <!ELEMENT (min, max, forecast-low, forecast-high) (#PCDATA)>
   <!ELEMENT wind (speed, direction?)>
   <!ELEMENT (speed, direction) (#PCDATA)>

The #PCDATA and CDATA mean that the items in question consist of character data.

Though not as easy to read as the English, the DTD is very compact, and does the job it's intended to do; it lets computer programs verify that a document uses only an approved set of tags and that the tags are used in the proper context.

If the XML rules are considered as punctuation, then the DTD serves the function of a spelling list and grammar reference. >>>

  1. Validating XML with RELAX
  2. Validity and the DTD
  3. Validity and RELAX
  4. Specifying Elements
  5. Making Validation More Specific
  6. Further Specifications
  7. Enumerations
  8. The Big Picture
  9. Summary