Transforming JSON

By Michael Kay on November 13, 2017 at 01:02p.m.

In my [conference paper at XML Prague](https://www.saxonica.com/papers/xmlprague-2016mhk.pdf) in 2016 I examined a couple of use cases for transforming JSON structures using XSLT 3.0. The overall conclusion was not particularly encouraging: the easiest way to achieve the desired results was to convert the JSON to XML, transform the XML, and then convert it back to JSON.

Unfortunately this study came too late to get any new features into XSLT 3.0. However, I've been taking another look at the use cases to see whether we could design language extensions to handle them, and this is looking quite encouraging.

Use case 1: bulk update

We start with the JSON document

[ {
  "id": 3, "name": "A blue mouse", "price": 25.50,
  "dimensions": {"length": 3.1, "width": 1.0, "height": 1.0},
  "warehouseLocation": {"latitude": 54.4, "longitude": -32.7 }},
  {
  "id": 2, "name": "An ice sculpture", "price": 12.50,
  "tags": ["cold", "ice"],
  "dimensions": {"length": 7.0, "width": 12.0, "height": 9.5 },
  "warehouseLocation": {"latitude": -78.75, "longitude": 20.4 }
} ]

and the requirement: for all products having the tag "ice", increase the price by 10%, leaving all other data unchanged. I've prototyped a new XSLT instruction that allows this to be done as follows:

<saxon:deep-update
root="json-doc('input.json')
select=" ?*[?tags?* = 'ice']"
action="map:put(., 'price', ?price * 1.1)"/>

How does this work?

First the instruction evaluates the root expression, which in this case returns the map/array representation of the input JSON document. With this root item as context item, it then evaluates the select expression to obtain a sequence of contained maps or arrays to be updated: these can appear at any depth under the root item. With each of these selected maps or arrays as the context item, it then evaluates the action expression, and uses the returned value as a replacement for the selected map or array. This update then percolates back up to the root item, and the result of the instruction is a map or array that is the same as the original except for the replacement of the selected items.

The magic here is in the way that the update is percolated back up to the root. Because maps and arrays are immutable and have no persistent identity, the only way to do this is to keep track of the maps and arrays selected en-route from the root item to the items selected for modification as we do the downward selection, and then modify these maps and arrays in reverse order on the way back up. Moreover we need to keep track of the cases where multiple updates are made to the same containing map or array. All this magic, however, is largely hidden from the user. The only thing the user needs to be aware of is that the select expression is constrained to use a limited set of constructs when making downward selections.

The select expression select="?*[?tags?* = 'ice']" perhaps needs a little bit of explanation. The root of the JSON tree is an array of maps, and the initial ?* turns this into a sequence of maps. We then want to filter this sequence of maps to include only those where the value of the "tags" field is an array containing the string "ice" as one of its members. The easiest way to test this predicate is to convert the value from an array of strings to a sequence of strings (so ?tags?*) and then use the XPath existential "=" operator to compare with the string "ice".

The action expression map:put(., 'price', ?price * 1.1) takes as input the selected map, and replaces it with a map in which the price entry is replaced with a new entry having the key "price" and the associated value computed as the old price multiplied by 1.1.

Use case 2: Hierarchic Inversion

The second use case in the XML Prague 2016 paper was a hierarchic inversion (aka grouping) problem. Specifically: we'll look at a structural transformation changing a JSON structure with information about the students enrolled for each course to its inverse, a structure with information about the courses for which each student is enrolled.

Here is the input dataset:

[
  {
    "faculty": "humanities",
    "courses": [
      {
        "course": "English",
        "students": [
          {
            "first": "Mary",
            "last": "Smith",
            "email": "mary_smith@gmail.com"
          },
          {
            "first": "Ann",
            "last": "Jones",
            "email": "ann_jones@gmail.com"
          }
        ]
      },
      {
        "course": "History",
        "students": [
          {
            "first": "Ann",
            "last": "Jones",
            "email": "ann_jones@gmail.com"
          },
          {
            "first": "John",
            "last": "Taylor",
            "email": "john_taylor@gmail.com"
          }
        ]
      }
    ]
  },
  {
    "faculty": "science",
    "courses": [
      {
        "course": "Physics",
        "students": [
          {
            "first": "Anil",
            "last": "Singh",
            "email": "anil_singh@gmail.com"
          },
          {
            "first": "Amisha",
            "last": "Patel",
            "email": "amisha_patel@gmail.com"
          }
        ]
      },
      {
        "course": "Chemistry",
        "students": [
          {
            "first": "John",
            "last": "Taylor",
            "email": "john_taylor@gmail.com"
          },
          {
            "first": "Anil",
            "last": "Singh",
            "email": "anil_singh@gmail.com"
          }
        ]
      }
    ]
  }
]

The goal is to produce a list of students, sorted by last name then irst name, each containing a list of courses taken by that student, like this:

[
  { "email": "anil_singh@gmail.com",
    "courses": ["Physics", "Chemistry" ]},
  { "email": "john_taylor@gmail.com",
    "courses": ["History", "Chemistry" ]},
  ...
]

The classic way of handling this is in two phases: first reduce the hierarchic input to a flat sequence in which all the required information is contained at one level, and then apply grouping to this flat sequence.

To achieve the flattening we introduce another new XSLT instruction:

<saxon:tabulate-maps
root="json-doc('input.json')"
select="?* ! map:find(., 'students)?*"/>

Again the root expression delivers a representation of the JSON document as an array of maps. The select expression first selects these maps ("?*"), then for each one it calls map:find() to get an array of maps each representing a student. The result of the instruction is a sequence of maps corresponding to these student maps in the input, where each output map contains not only the fields present in the input (first, last, email), but also fields inherited from parents and ancestors (faculty, course). For good measure it also contains a field _keys containing an array of keys representing the path from root to leaf, but we don't actually use that in this example.

Once we have this flat structure, we can construct a new hierarchy using XSLT grouping:

<xsl:for-each-group select="$students" group-by="?email">
<xsl:map>
<xsl:map-entry key="'email'" select="?email"/>
<xsl:map-entry key="'first'" select="?first"/>
<xsl:map-entry key="'last'" select="?last"/>
<xsl:map-entry key="'courses'">
<saxon:array>
<xsl:for-each select="current-group()">
<saxon:array-member select="?course"/>
</xsl:for-each>
</saxon:array>
</xsl:map-entry>
</xsl:map>
</xsl:for-each-group>

This can then be serialized using the JSON output method to produce to required output.

Note: the saxon:array and saxon:array-member instructions already exist in Saxon 9.8. They fill an obvious gap in the XSLT 3.0 facilities for handling arrays - a gap that exists largely because the XSL WG was unwilling to create a dependency XPath 3.1.

Use Case 3: conversion to HTML

This use case isn't in the XML Prague paper, but is included here for completeness.

The aim here is to construct an HTML page containing the information from a JSON document, without significant structural alteration. This is a classic use case for the recursive application of template rules, so the aim is to make it easy to traverse the JSON structure using templates with appropriate match patterns.

Unfortunately, although the XSLT 3.0 facilities allow patterns that match maps and arrays, they are cumbersome to use. Firstly, the syntax is awkward:

match=".[. instance of map(...)]"

We can solve this with a Saxon extension allowing the syntax

match="map()"

Secondly, the type of a map isn't enough to distinguish one map from another. To identify a map representing a student, for example, we aren't really interested in knowing that it is a map(xs:string, item()*). What we need to know is that it has fields (email, first, last). Fortunately another Saxon extension comes to our aid: tuple types, described here: https://blog.saxonica.com/mike/2016/09/tuple-types-and-type-aliases.html With tuple types we can change the match pattern to

match="tuple(email, first, last)"

Even better, we can use type aliases:

<saxon:type-alias name="student" as="tuple(email, first, last)"/>
<xsl:template match="~student">...</xsl:template>

With this extension we can now render this input JSON into HTML using the stylesheet:

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/"
exclude-result-prefixes="#all"
expand-text="yes"

<saxon:type-alias name="faculty" type="tuple(faculty, courses)"/>
<saxon:type-alias name="course" type="tuple(course, students)"/>
<saxon:type-alias name="student" type="tuple(first, last, email)"/>

<xsl:template match="~faculty">
<h1>{?faculty} Faculty</h1>
<xsl:apply-templates select="?courses?*"/>
</xsl:template>

<xsl:template match="~course">
<h2>{?course} Course</h2>
<p>List of students:</p>
<table>
<thead>
<tr>
<th>Name</th>
<th>Email</th>
</tr>
</thead>
<tbody>
<xsl:apply-templates select="?students?*">
<xsl:sort select="?last"/>
<xsl:sort select="?first"/>
</xsl:apply-templates>
</tbody>
</table>
</xsl:template>

<xsl:template match="~student">
<tr>
<td>{?first} {?last}</td>
<td>{?email}</td>
</tr>
</xsl:template>

<xsl:template name="xsl:initial-template">
<xsl:apply-templates select="json-doc('courses.json')"/>
</xsl:template>

</xsl:stylesheet>

Conclusions

With only the facilities of the published XSLT 3.0 recommendation, the easiest way to transform JSON is often to convert it first to XML node trees, and then use the traditional XSLT techniques to transform the XML, before converting it back to JSON.

With a few judiciously chosen extensions to the language, however, a wide range of JSON transformations can be achieved natively.