XML Overview
Tuesday, May 20, 2003
This article is a very brief and general discussion of XML and how you can use it.
XML stands for EXtensible Markup Language. But it’s not. A markup language, it’s a scheme for creating markup languages.
Yes, I know, this still makes no sense, but bear with me.
The point of marking up a document, whether in HTML for the web or in another way for print publications, is to identify the different pieces of information. You designate what is a header or a title of a section and what is a complete paragraph, etc. Other humans could usually figure out what is what by the other things that are around it but (and I’m sure you’ve heard it a thousand times) computers are stupid. They need to have these things pointed out to them.
So once you’ve got the information marked-up, what then?
Once your document is marked up in a way the computer can understand, you can use the computer to manipulate it. If you’re working with HTML, you can use Cascading Style Sheets to change the default displays of the elements to make the page more attractive or more easily used. Proper markup in electronic books gives the reader software not only instructions on how to display what you’re looking at but marks places within the document. For instance, if each of the chapters is labeled, you can go directly to the beginning of each one. You can also mark footnotes so that a reader can jump to the note and back to the place they were before.
And XML is?
A standard syntax that allows you to define your own markup code. Think of a MARC record. Each piece of information or field is identified individually. The end result is a file that isn’t very readable for humans but which can be manipulated and displayed many different ways by computers and even traded back and forth between systems and applications that understand the identification codes.
XML is a method of creating your own markup code to describe your particular set of documents. A Document Type Definition (DTD) is the authority file which states the nature, scope and structure of your language. With the DTD, you can validate your documents and make sure they follow the rules you have set. As long as your XML is valid, any program that understands XML will be able to import and use your information.
You still haven’t told me why I care.
Say you have a set of large word documents. And The Boss wants them posted to the website in HTML for browsing and as PDFs for printing. Oh, and a summary document to be inserted in the newsletter. And these all have to be kept up-to-date. If you markup them all up as XML, you can transform the XML files into HTML and pdf and extract the needed elements for the summary document. You then only have to make changes to the XML files and repeat the transformations. It’s a lot of work to set up and not really worth it right now if you don’t have a lot of information that needs to be available in the future or is needed in multiple formats, but XML is starting to come into its own. The newest version of FileMaker, for example can import and export XML natively, and the upcoming version of MS Word can export in XML (though the feature looks to only be available for corporate users).
Further Reading
Because I think that the best way to understand something is to get several different viewpoints of it, try here for different descriptions.
- A List Apart: What the Hell is XML?
- What’s this XML stuff, anyway?
- XML -Defining A Technology- Part One of Six
- A Child?s Garden of XML
- Using XML
- How XML Accommodates Human-Authored Content
- Validating XML: A Pretty Complete Primer
- Transformers: Using XSLT to Transform XML
Tutorials — laura
