Sunday, 13 April 2014

The main building blocks of XML

XML is a mark-up language, it will be much like HTML in that contents are included within a group of opening and closing tags. However XML is different from HTML in one important aspect, XML tags do not possess semantic meaning, in XML you could develop your own tags, in HTML you can just use valid HTML tags.

The very first building block of xml we will examine are elements, a typical xml file will contain a lot of elements, usually plenty of tags with each tag made up of bits of data.

For example, let’s say we have an xml file of products, this will comprise of supplier codes, price, description, colour etc. The colour could well be shown as follows in an xml file;

<colour> red</colour>

Both the tags above are the start and end tags, with the end tag made up of a "/" which indicates close of the tag.

Taken together with the start and end tag and the contents in-between, this is an element.

Elements are also referred to by their name or type, in this situation the element above is a name element or type.

It's advisable to stick to some simple rules and also guidelines on naming your elements, primary rule is always that names need to be descriptive as well as indicative of the subject matter within the element, this will give your xml field substance and make it easy for other people to follow and fully understand.

As an example, an element that contains serial numbers can be more appropriately named “serial numbers” rather then “numbers” or “56879” which has no meaning to anybody.

Additionally, there are a number of critical naming conventions that you must follow, valid names ought to begin with a letter or one of a few punctuation characters, followed by letters, digits, hyphens, underscores, or colons.

Be aware also that xml is case sensitive although as yet you won't notice any formal conventions on the way you should utilize uppercase, lower case or even mixed case characters.

Text can be divided into markup data and character data. Character data is the information kept in the document and markup data is the tags and syntax.

XML recognises any valid Unicode characters including all the 26 letters of the alphabet and 0-9 digits, and in addition all of the 33 characters of the Cyrillic alphabet.

You can read and process xml data with special software often known as an xml parser, the xml parser operates by parsing each character in order to create a representation of that data, the parsed data is known as PCDATA.

If you wish to detour around certain data i.e. you do not want it to be parser by the software, you can certainly accomplish that by using a CDATA section, the parser will likely then disregard any data comprised within a CDATA section.

The subsequent block we are going to take a look at is referred to as an attribute, they work well for associating name/value pairs with elements, as an illustration in an element concerning name, an attribute might be first and last name.

Attributes can be one of three different types: strings, tokenized types, or enumerations.