##headerLines##
Abbey Workshop

XQuery Basics with Saxon

One of the easiest ways to get started with XQuery is with the Saxon XSLT processor written by Michael Kay. This utility written in Java has been used for XSLT for years, but Michael has added XQuery to Saxon. With Saxon, you can perform queries on individual files or on collections of files without having to install a fancy XML database. This is very convenient if you are just starting out and want to start learning.

To get started, download the latest version of Saxon from: http://saxon.sourceforge.net/. Get the Saxon B open source version. The other SA version is a commercial product.

After downloading the file, will need to unzip the file and store the unzipped results in a directory of your choice.

Setting Up Run Scripts

Next, create a script or batch file to so you can easily launch the Saxon XQuery engine. By default, the command line to run Saxon using java would be a bit long. For example, if Saxon was installed in the /saxon directory, you would need to issue the following command to launch it from the command line:

java -cp /saxon/saxon8.jar net.sf.saxon.Query myquery.xql

I don't know about you, but that is not something I would want to type every time. So if you are using a Unix based operating system using the bash shell, you could create a script like this (saxon.sh):

#!/bin/bash
java -cp /saxon/saxon8.jar net.sf.saxon.Query $1 $2 $3 $4

If you are using Windows, your batch file will look something like this (saxon.bat):

java -cp c:\saxon\saxon8.jar net.sf.saxon.Query %1 %2 %3 %4

This should automate our testing of scripts.

Querying a Document

With the software installed and a command line tool for launching the engine, we are ready to go. I would like to note that currently the XQuery standard has not been finalized as a standard. So some of the information on the Net may vary a bit in terms of syntax and such. Saxon is one of the more standards compliant XQuery engines out there. Just be aware that if you are using a native XML database or other XQuery tool, the syntax of your queries may differ slightly from what is shown here.

For a first example, a query of an individual document seems to be a good place to start. First, take a look at our sample XML document.

Listing for: student_directory.xml

   1 <student_list>
   2   <student>
   3     <name>George Washington</name>
   4     <major>Politics</major>
   5     <phone>312-123-4567</phone>
   6     <email>gw@example.edu</email>
   7   </student>
   8   <student>
   9     <name>Janet Jones</name>
  10     <major>Undeclared</major>
  11     <phone>311-122-2233</phone>
  12     <email>janetj@example.edu</email>
  13   </student>
  14   <student>
  15     <name>Joe Taylor</name>
  16     <major>Engineering</major>
  17     <phone>211-111-2333</phone>
  18     <email>joe@example.edu</email>
  19   </student>
  20 </student_list>

This file represents a directory of students a some mythical university. The only information kept in the file is name, major, phone number, and e-mail address. For a starter script, we will get a list of all the student's names in the student directory. Here is what the script looks like.

Listing for: query1.xql

   1 (: This is a comment :)
   2 <student_names>
   3 { doc('student_directory.xml')//student_list/student/name }
   4 </student_names>

To run the query using the batch file created above, use the following command:

saxon.sh xquery1.xql

XQuery returns XML elements, so if individual elements are return, the results need to wrapped in a root element. In this example, the <student_names> element will wrap the results. The XQuery doc function is used to specify the document to be queried. The path at the end of the doc function //student_list/student/name, is an XPath expression specifying that the name be returned from each <student element. The entire statement is enclose in curly braces {}. This is an XQuery expression and the effect of putting a statement in braces is that the result of the expression are inserted into the result document.

Here are the results of the query.

Listing for: results1.xml

   1 <?xml version="1.0" encoding="UTF-8"?>
   2 <student_names>
   3    <name>George Washington</name>
   4    <name>Janet Jones</name>
   5    <name>Joe Taylor</name>
   6 </student_names>

The document is searched and the names are returned as shown.

Querying a Collection

Querying a document is cool, but not all that amazing. That sort of thing could be done with a regular expression or XSLT. So why do I need XQuery? Good question. XQuery is designed to query not just one document, but a series of XML documents called a collection. Thus, XQuery can pull information across XML documents and combine them. Normally, XQuery would be used with a relational or XML database so the documents would be stored in a database file. Since this example uses a file system, how is a collection created?

In XQuery, a special XML file can be used as a collection. The following is an example of such a file.

Listing for: student_collection.xml

   1 <collection>
   2   <doc href="student1.xml"/>
   3   <doc href="student2.xml"/>
   4   <doc href="student3.xml"/>
   5 </collection>

A collection document is composed of a <collection> root element with a series of <doc> elements. Each document has a URL which links the the collection to each individual file in the collection. For this example, the student_directory.xml file has been broken into individual student files. For example, here is what the file for George Washington looks like.

Listing for: student1.xml

   1   <student>
   2     <name>George Washington</name>
   3     <major>Politics</major>
   4     <phone>312-123-4567</phone>
   5     <email>gw@example.edu</email>
   6   </student>

The <student> element now becomes the root element of each document. Each student element is now is own file. These files are converted into a single collection using the file described above. Now, to perform the query on a collection.

Listing for: getMajors.xql

   1 (: This is a comment :)
   2 <major_list>
   3 {
   4   for $doc in collection('student_collection.xml')
   5   return $doc/student/major
   6 }
   7 </major_list>
   8 

This query uses the collection function to search all the files in the collection. Instead of using an XQuery expression to return results, a FLOWR statement is used. FLOWR stands for:

  • for
  • let
  • order by
  • where
  • return

If you have used SQL, many of these statements should be familiar to you. The for statement is used to iterate all the documents in the collection. Then, in the return statement, an XPath expression is used to return each student's major. The results of this script are shown below.

Listing for: majors.xml

   1 <?xml version="1.0" encoding="UTF-8"?>
   2 <major_list>
   3    <major>Politics</major>
   4    <major>Undeclared</major>
   5    <major>Engineering</major>
   6 </major_list>

Download zip of collection example: sample.zip

Well that pretty much covers of the basics of getting Saxon setup. From here you should be able to begin exploring XQuery.