To Talk about ARML, We Must First Discuss AR...
Most of us are pretty familiar with AR, or augmented reality, even if we don't realize it. Those Snapchat and Instagram filters that have been so incredibly popular as of late? Augmented Reality. The brief but overwhelming invasion of Pokemon Go? Augmented Reality. We see implementations of augmented reality in many aspects of our technical lives, yet the technology is still in its relative infancy. The practical applications of augmented reality have yet to be fully realized, although we are getting closer to understanding the full potential of the tech. Most believe we're going to see the largest growth in the mobile sector. This may well be true, and its certainly where we've seen the most action, although I think the true frontier of augmented reality is in car windshields- my first car was a 1992 Cutlass S Class Supreme International Edition (Cherry Red), which was a real hunk of junk but still had a working HUD display, which was an absolute novelty in 1992 - and totally restricted to displaying my speedometer on my windshield. Nevertheless, the technology stuck with me and I've been dreaming of seeing this technology expanded since then. Augmented reality is the breakthrough that will totally recontextualize how we drive (assuming, of course, that we are still driving our own cars in a few decades.) Imagine Google Maps displayed in real-time on your windshield, painting virtual arrows on the road in front of you in order to guide you to your destination. Cool stuff! But I digress...
ARML stands for Augmented Reality Markup Language, and it is the language we use to code in a 3D space. ARML was developed in accordance with the Open Geospatial Consortium, a group formed in 1994 to create guidelines for geospatial coding. ARML consists of two main components:
- XML, to portray objects as they would appear in the real space they are supposed to be displayed in.
- ECMAScript, for event handling and accessing the properties of virtual objects.
ARML has three main pillars into which its logic is largely divided. These are Features, Visual Assets, and Anchors. Features are the physical objects in the world that are to be augmented. Visual Assets are the virtual objects that are going to be inserted into the world. Anchors are the connections between the physical and virtual objects, which allow the two to 'interact'.
Features are a carryover from GML, or Geography Markup Language. GML is another XML grammar for dealing with spatial relationships. Features are physical objects within the environment (not the environment itself, this is a geometry object, which is considered separate. Features are the objects with which the virtual objects will interact, such as a table, a building, or even a person. Features always have at least one Anchor.
Anchors do largely what their name implies. Anchors are the connections that set the location properties of Features. There are four main types of Anchors. Geometries describe the location of an object through fixed coordinates, such as latitude, longitude, and altitude. An example, pulled from Wikipedia, is The Wiener Riesenrad, which is displayed like so: 48.216622 16.395901.
Trackables are the next type of Anchors. These are the tracking methods that are employed to keep track of features. These are usually implemented using the devices' cameras or sensors, whether it's the multitude of sensors and cameras attached to a Hololens terminal or the camera and accelerometer on your phone. Trackables can also employ software, such as 3D mapping or face tracking. These systems are made of up of multiple interlocking parts, so they are split into Trackers, which is the code implemented in order to track one or more Trackables, which is the pattern that the Tracker is seeking and trying to track. Here we can see some of the sensors that Microsoft has in place in order to use Trackables:
The next type of Anchor is called RelativeTo. RelativeTo Anchors, as you can guess, measure the relationship between the user and the other Anchors. These Anchors allow users to move through a 'scene', and move closer to or further away from virtual objects that are being rendered in the space. The last type of Anchor is the ScreenAnchor, which is probably the most familiar kind of Anchor to any of us who grew up playing video games. ScreenAnchors are the display of information on the screen itself, which can be used for status bars and information feeds. Master Chief's health and shield displays are examples of ScreenAnchors:
The last main component of ARML is Visual Assets. These are simply the virtual objects that are projected onto the world through the AR display. These can be text, images, or even HTML. And can be manipulated in a variety of ways, such as rotation or scale. Anything that could be done to manipulate a traditional polygonal model can be applied to a Visual Asset. Visual Assets are the bread and butter of AR. They are the Pikachu dancing on the street, awaiting capture: