Intro to XML and JSON (7 Part Series)
In the previous article in this series, we used XML to define a “shirt” item (or, as one would say in XML, “element”) with some keys like “color” and “fabric” and “washable.”
Let’s learn how to fill in some values for those keys.
(Click here if you need a refresher on what I mean by items, keys, and values.)
- Tables vs. Lists (this post)
- XML 1
- XML 2
- XML 3
Warning: only put sample data into the “beautifier” links above. Never put your company’s confidential data into a stranger’s web site.
Now that we've learned how to use XML to define a "shirt" item (or, as one would say in XML, "element") with some keys like "color" and "fabric" and "washable," we probably want to fill in some values.
Warning: I shouldn't really say "fill in some values" as if our keys didn't have any.
Technically, all of our keys in the previous article's examples already had values.
- They're just blank values.
- Or the value of a key is some other item/element.
- (Think about the examples where an element named "washable" was nested inside of an element named "fabric" which was nested inside of an element named "shirt." From a computer's perspective, the value for the "shirt" element's "fabric" key wasn't really blank. The value for the "fabric" key was "an element called 'washable.'")
Nevertheless, when I say "values" for keys like "color" or "fabric" or "washable," you're probably thinking about plain text phrases like "blue" / "red" / "leather" / "yes" / "no."
So let's talk about those.
The way we fill in plain-text values for "keys" will depend which of XML's 2 approaches to definining "key-value pairs" we are using.
In the previous article in this series, we saw that every element's (item's) name functions as a "key" for for the element's "parent" and can have a corresponding value.
To set that value, we can type 1 piece of plain text (like "leather" or "blue") between the 2 tags that indicate the boundaries of the element whose name we are using as a "key."
The only other thing that can go "inside" the element besides its (optional) plain-text "value" is more elements.
Here's a really simple element with a plain-text value:
<Shape> Rectangle </Shape>
And here's one with a single plain-text value, but also some nested elements:
<Shape> Rectangle <Side>1</Side> <Side>2</Side> <Side>3</Side> <Side>4</Side> </Shape>
Please put your "plain-text values" before any "nested elements," for readability by humans.
The following XML is equivalent to the previous example, but very rude to any human eyes trying to read your XML because it buries the word "rectangle" in a sea of punctuation:
<Shape> <Side>1</Side> <Side>2</Side> Rectangle <Side>3</Side> <Side>4</Side> </Shape>
Below is even more XML that a computer can read without difficulty.
(It would probably be interpreted by a computer as plaintext "Best EverRectangle," possibly with a line break between "Ever" and "Rectangle.")
However, it is incredibly rude to humans:
<Shape> <Side>1</Side> <Side>2</Side> Best Ever <Side>3</Side> Rectangle <Side>4</Side> </Shape>
In XML, it's quite common to skip defining a plaintext value for an element.
Often times, there simply isn't anything a person could type that would be useful.
For example, valid XML always has just 1 outermost element.
Therefore, people who are writing data without a natural single outermost element often just surround everything in an arbitrary tagset and give it a name like "
root" or "
Is there really anything you could meaningfully type to give a "value" to the word "
RootElement" that would concisely describe the entire XML dataset?
Another reason that elements have no plaintext value is that the element represents a real-world object whose "essence" is hard to define in a plaintext value.
For example, you could describe a fleet of cars like this
<RootElement> <Car> First car's Vehicle Identification Number here </Car> <Car> Second car's Vehicle Identification Number here </Car> </RootElement>
But let's think about these cars philosophically.
The above code implies that the conceptual "items" that you've given names of "car" truly are their Vehicle Identification Numbers.
- Q: Is a car really its VIN?
- A: I don't think so. I think a car is a heavy chunk of steel taking up space in the real world.
See how there isn't really a plain-text value that captures what a car is?
Similarly ... if we had XML describing a closet full of shirts ... is there really a single thing we could write that would sum up the essence of what each "shirt" truly is?
Maybe we should just leave it at "
shirt" and only define plaintext values for "
<Closet> <Shirt> <Color> Blue </Color> </Shirt> <Shirt> <Color> Red </Color> </Shirt> </Closet>
Similarly, here's a better way to write the XML for our fleet of cars:
<RootElement> <Car> <VIN> First car's Vehicle Identification Number here </VIN> </Car> <Car> <VIN> Second car's Vehicle Identification Number here </VIN> </Car> </RootElement>
In the previous article in this series, we saw that additional "keys" for an element (item) can go inside the opening "tag" defining that element.
To set that value, we can type 1 piece of plain text (like "leather" or "blue") between the quotation marks to the right of an
= following a key's name.
<Shirt Color="Blue"> </Shirt>
Or, for short:
Remember that key-value pairs using the "attributes" notation have to be unique keys and can't nest any further.
Therefore, we can't use "attributes" to specify all the shirts in a closet, or all the cars in a fleet.
In the previous article of this series, I warned you not to mix the two approaches confusingly, but it's perfectly normal to use both "element nesting" and "attributes" within any one piece of XML-formatted data if it makes sense.
<Closet> <Shirt Color="Blue"/> <Shirt Color="Red"/> </Closet>
<RootElement> <Car VIN="First car's Vehicle Identification Number here"> </Car> <Car VIN="Second car's Vehicle Identification Number here"> </Car> </RootElement>
Or, for short:
<RootElement> <Car VIN="First car's Vehicle Identification Number here"/> <Car VIN="Second car's Vehicle Identification Number here"/> </RootElement>
Or, diving deep into a particularly complicated shirt:
<Shirt> <Color> Blue </Color> <Color> Red </Color> <Color> Green </Color> <Fabric Washable="No"> Leather </Fabric> <Fabric Washable="Yes"> Cotton </Fabric> </Shirt>
Note that with the way I described this shirt, you have to look 2 different places to size up whether you can wash it!
I certainly won't argue that storing "washable" as a property of each fabric, rather than as a property of the shirt overall, was a good design choice.
But be prepared to notice such issues when you read XML. You might run into real-world XML full of frustrating design choices like that.
If you get to write XML from scratch, you get to make design choices based on how your data will be used.
For example, the best way to represent "washable" would be your decision to make.
- Should "washable" be part of the shirt or part of each fabric?
- Should "washable" be written as standalone element or as a mere attribute?
You'll be trying to optimize:
- fetching/modifying the values using code
- human redability
All organization of information involves making judgment calls trading flexibility against simplicity.
There's no wrong answer if you plan for your own needs!
When it comes to writing software to process XML that someone else already wrote, you don't really get to choose anything.
In that case, you simply have to be able to recognize which approach the XML's author chose for storing key-value pairs.
- Q: Why?
- A: Largely because the commands for extracting the two approaches styles of writing key-value pairs are different from each other in tools and programming languages that help you read XML. Recognizing the difference will save you a lot of hassle!
For example, in Python's "ElementTree" package:
- The "
.text" command extracts the plaintext value written between two tags.
- The "
.get('key_name_here')" command extracts the plaintext value of an attribute.
The way you fill in a plain-text value for a "key-value" pair on an "element" depends which of 2 approaches you took to defining the "key-value pair":
- using items' names as keys for their "parent" items
- giving items attributes
- When using items' names as keys for their "parent" items, you type some text between the two "tags" in the "tagset" defining the element.
- Preferably at the top so people don't hate you.
- It's very common to leave elements without plaintext values between their tags, since elements are often used simply to "hold more elements."
- When using attributes, you type some quoted text after the "