Introduction to XML

gravatar

Scott Currie

Since Biml is based on XML, it helps to start with a strong foundation in XML syntax. This walkthrough will teach you everything you need to know about XML to get started using Biml.

published 08.06.15

last updated 10.01.15


Part of lesson Intro to XML.

Share

Introduction

Since Biml is a dialect of XML, the first step in learning Biml is to learn XML.

eXtensible Markup Language (XML) is a general purpose text format that can be used to store and transmit almost any type of information. After it was first standardized in 1996, XML became widely popular for a variety of applications, including data storage, data interchange and transmission, and even programming.

The structure of an XML document is straightforward. It consists of elements that are surrounded with angle-brackets ('<' and '>'). Each of these elements may have any number of child elements and any number of embedded attributes. Elements and child elements tend to be used to store complex information with many properties and relationships. Attributes store simple values with standard types such as string, numeric, and date. The only other thing you need to know upfront is that XML documents start with a single root element.

Syntax

Let's take a brief look at the syntax of these XML elements and attributes. This is among the simplest possible XML documents:

<root></root>

Note how there is a single element called root. The root element begins with the <root> tag and is closed by the </root> tag. Since the root element has no children, we can optionally use a more compact syntax for the exact same XML document:

<root />

Note that in this case we use the '/>' delimiter to specify that the element is "self-closing" meaning that a single tag both begins and closes the element. Again, this syntax is optional and can only be used if the element has no children.

But what if the element does have children? How do we write that? Here is an example where we add 3 child elements to the root element:

<root>
    <parent></parent>
    <parent></parent>
    <parent></parent>
</root>

Now we can start to see why XML became so popular. Once you know a small number of formatting rules, you can create arbitrarily complex documents. Let's continue down that path by adding some grandchildren:

<root>
    <parent>
        <child></child>
        <child></child>
    </parent>
    <parent>
        <child></child>
        <child></child>
        <child></child>
    </parent>
    <parent></parent>
</root>

These are all elements. To add some additional configuration information, we'd also like to store properties on each of these elements. Let's start with a name and age property for all elements and an additional favorite toy property for child elements:

<root>
    <parent name="John" age="35">
        <child name="Jack" age="11" favoriteToy="Video Game Console"></child>
        <child name="Jill" age="9" favoriteToy="Plush Animal"></child>
    </parent>
    <parent name="Jane" age="46">
        <child name="Sarah" age="16" favoriteToy="Video Game Console"></child>
        <child name="Helen" age="8" favoriteToy="Blocks"></child>
        <child name="James" age="6" favoriteToy="Small Plastic Animals"></child>
    </parent>
    <parent name="Chance" age="24"></parent>
</root>

Elements can also store text directly within themselves. This feature is normally used for big blocks of text. Let's add a FavoritePoem element for John. Remember that not all child elements and attributes are necessarily required, so there is no need to add favorite poems for everyone.

<root>
    <parent name="John" age="35">
        <child name="Jack" age="11" favoriteToy="Video Game Console"></child>
        <child name="Jill" age="9" favoriteToy="Plush Animal"></child>
        <FavoritePoem name="Roses">
            Roses are red.
            Violents are blue.
            I love Biml.
            How about you?
        </FavoritePoem>
    </parent>
    <parent name="Jane" age="46">
        <child name="Sarah" age="16" favoriteToy="Video Game Console"></child>
        <child name="Helen" age="8" favoriteToy="Blocks"></child>
        <child name="James" age="6" favoriteToy="Small Plastic Animals"></child>
    </parent>
    <parent name="Chance" age="24"></parent>
</root>

Now that we have a complete XML document with all the information we want to store, let's not forget to clean up the syntax using optional self-closing tags for empty elements.

<root>
    <parent name="John" age="35">
        <child name="Jack" age="11" favoriteToy="Video Game Console" />
        <child name="Jill" age="9" favoriteToy="Plush Animal" />
        <FavoritePoem name="Roses">
            Roses are red.
            Violents are blue.
            I love Biml.
            How about you?
        </FavoritePoem>
    </parent>
    <parent name="Jane" age="46">
        <child name="Sarah" age="16" favoriteToy="Video Game Console" />
        <child name="Helen" age="8" favoriteToy="Blocks" />
        <child name="James" age="6" favoriteToy="Small Plastic Animals" />
    </parent>
    <parent name="Chance" age="24" />
</root>

Comments

Sometimes you need to add descriptive comments to your code. Alternatively, you might want to disable some code without having to delete it (in case you need it again or want it for reference). Comments are the way to do this, and XML provides syntax for creating comments. Comment blocks begin with the <!-- tag and end with the --> tag. Xml comments cannot be nested. That is, you cannot place a comment inside another comment. Let's add a few comments to our XML document:

<root>
    <parent name="John" age="35">
        <child name="Jack" age="11" favoriteToy="Video Game Console" />
        <child name="Jill" age="9" favoriteToy="Plush Animal" />
        <FavoritePoem name="Roses">
            Roses are red.
            Violents are blue.
            I love Biml.
            How about you?
        </FavoritePoem>
    </parent>
    <!-- Jane does not have a favorite poem -->
    <parent name="Jane" age="46">
        <child name="Sarah" age="16" favoriteToy="Video Game Console" />
        <!-- Helen prefers the type of block that snaps together -->
        <child name="Helen" age="8" favoriteToy="Blocks" />
        <child name="James" age="6" favoriteToy="Small Plastic Animals" />
    </parent>
    <!--<parent name="Chance" age="24" />-->
</root>

Escaping Text Data

You've probably noticed that XML requires the use of a variety of special characters - specifically '<', '>', '=', and ' " '. What if you need to include these characters in your own text? Most XML parsers do their best to try to figure out when usage of a special character is permitted in a text block based on context. Much of the time, especially with '<', that's not possible. Consequently, we need some way to tell that '<' is being used as a less than operator in a SQL query rather than as the start tag of a child element.

There are two ways to do this. The first is to escape each character individually. In place of the special character, you can use the escaped equivalent according to the following mapping:

  • < becomes &lt;
  • > becomes &gt;
  • = becomes &eq;
  • " becomes &quot;
  • & becomes &amp;

Let's say that that we wanted to change the text of John's favorite poem. Instead of "I love Biml," we'd like to say "I <3 Biml" (i.e. I "heart" Biml). Here's how that would look with escaping:

<root>
    <parent name="John" age="35">
        <child name="Jack" age="11" favoriteToy="Video Game Console" />
        <child name="Jill" age="9" favoriteToy="Plush Animal" />
        <FavoritePoem name="Roses">
            Roses are red.
            Violents are blue.
            I &lt;3 Biml.
            How about you?
        </FavoritePoem>
    </parent>
    <!-- Jane does not have a favorite poem -->
    <parent name="Jane" age="46">
        <child name="Sarah" age="16" favoriteToy="Video Game Console" />
        <!-- Helen prefers the type of block that snaps together -->
        <child name="Helen" age="8" favoriteToy="Blocks" />
        <child name="James" age="6" favoriteToy="Small Plastic Animals" />
    </parent>
    <!--<parent name="Chance" age="24" />-->
</root>

While it's a bit awkward to read, if I only have a few escape characters, it's not a big deal. What if I have many escape characters? Is there a way to declare that an entire block of text should be escaped from special character processing? Yes! It's called CDATA which is short for Character DATA. You start a CDATA block with a <![CDATA[ tag and you end it with a ]]> tag. Using this syntax, let's take a look at our final XML document:

<root>
    <parent name="John" age="35">
        <child name="Jack" age="11" favoriteToy="Video Game Console" />
        <child name="Jill" age="9" favoriteToy="Plush Animal" />
        <FavoritePoem name="Roses"><![CDATA[
            Roses are red.
            Violents are blue.
            I <3 Biml.
            How about you?
     ]]></FavoritePoem>
    </parent>
    <!-- Jane does not have a favorite poem -->
    <parent name="Jane" age="46">
        <child name="Sarah" age="16" favoriteToy="Video Game Console" />
        <!-- Helen prefers the type of block that snaps together -->
        <child name="Helen" age="8" favoriteToy="Blocks" />
        <child name="James" age="6" favoriteToy="Small Plastic Animals" />
    </parent>
    <!--<parent name="Chance" age="24" />-->
</root>

XML Schemas and Validation

We've already covered everything you really need to know about XML to get started writing Biml. It is useful to supplement with one further topic. One of the reasons that XML became so popular is that it supports a mechanism to externally specify its structure. This is called an XML schema and usually lives in a file with an XSD file extension. For example the XML schema for Biml is stored in a file called Biml.xsd in your Mist or BIDSHelper installation directory.

XML schemas enable a developer to mandate that only certain elements can be children of certain other elements, which attributes are acceptable for a given element, which attributes are required vs. optional, the acceptable values for attributes, and much more. If you write an XML document that doesn't follow the rules for the desired XML schema, that's an error.

Another benefit of XML schemas is that they allow XML code editors to automatically provide intellisense and other features that make it much easier to write and consume XML code.

Among the more powerful features of XML schemas is that you can actually use more than one per XML document. To do so, you declare XML namespaces that include the schema definitions from one or more XSDs. The details of that are beyond the scope of this lesson, since Biml generally only uses a single XSD with a single namespace. So why did I bring it up? When you see the Biml root node in future lessons, it will look something like this:

<Biml xmlns="http://schemas.varigence.com/Biml.xsd" />

That xmlns attribute declares the default XML namespace for the entire document, which corresponds to the Biml XML schema. That's all we need to do to declare that our XML file is actually a Biml file and subject to Biml validation rules. Neat!

Conclusion

There is much more to XML than we can cover in a short lesson, but the reality is that you don't need to know any more than what's in this lesson to become an expert Biml developer.

To ensure that you've absorbed everything you need to know, try taking the brief the quiz below to test your knowledge before moving on to the next lesson.

Finished?

Complete the lesson Intro to XML:

You are not authorized to comment. A verification email has been sent to your email address. Please verify your account.

Comments

There are no comments yet.