April 11, 2008

XQuery: The Real X in AJAX

Like the real Napster in the movie The Italian Job (the remake), XQuery might have a bit of a chip on its shoulder about the X in AJAX.

Sure, it stands for XML since the idea is that return an XML fragment to the browser to update content in a div, fill in form fields or even create drop down menu options on the fly.

But how do you create that XML?  Using static XML files works, but the whole idea is to dynamically respond to user actions and give them information without reloading the whole page.

And what's the best way to dynamically create XML?  XQuery of course!

To prove it, lets do a simple example to create a drop-down form field for Shakespeare's characters (using the Shakespeare XML we loaded in the tutorial).  This field will auto-complete using AJAX and XQuery and give the user the characters found in the XML that begin with the letters entered into the field.

To get a head start, we'll use the popular AJAX tool Scriptaculous that takes care of all of the hard javascript stuff and lets us just work on creating the backend to deliver the content.

We'll also make use of MarkLogic Server's app server built-ins and its ability to run an HTTP server to make a complete application that presents the HTML, including Scriptaculous.  To do this, we'll start with the /modules directory we created and accessed with WebDav in the tutorial (where the CQ application was placed).

Assuming you have that set up, here are the steps to get the client side set up:

  1. create a js/ directory and install all of the .js scriptaculous libraries that came with the distribution (located here)
  2. Make a lookup.css in /modules with the sample styles from this page
  3. create lookup.xqy under the /modules directory with the following HTML in it:

xdmp:set-response-content-type("text/html; charset=utf-8")
(:  sets the mime type :)
,
<html>
    <head>
        <title>Shakespeare Lookup</title>
        (: reference the stylesheet :)

       

<link rel="stylesheet" type="text/css" href="lookup.css" media="screen"/>

         (: get the scriptaculous scripts loaded - the " " is to prevent them being optimized into an
        empty XML node like <script...  /> which some browsers don't like :)

        <script src="js/prototype.js">{" "}</script>
        <script src="js/scriptaculous.js">{" "}</script>

    </head>
<body>
    <h1>Shakespeare Character Lookup</h1>

    <div>
        <form>
        (: create the placeholder for the autocomplete field :)
        <input type="text" id="autocomplete" name="autocomplete_parameter"/>
        <input type="submit" value="select character"/>
        </form>
        <div id="autocomplete_choices" class="autocomplete"></div>

        (: run the scriptaculous autocomplete :)
        <script type="text/javascript">
            new Ajax.Autocompleter("autocomplete", "autocomplete_choices",                
            "request.xqy", {{}});
        </script>
    </div>
</body>
</html>

If you are like me and learned HTML and javascript way way back, it may take you a moment to realize that the <input> element named 'autocomplete' is NOT what actually shows up in the browser.  The script Ajax.Autocompleter replaces that element with the fully decked out, onClick enabled <input> element that does all of the auto-lookuping including making the calls to your request.xqy.  You can create all this yourself . . . but Scriptaculous does it for you, so go ahead and enjoy it!

This should all result in a simple web page with a text field on it that you can get to at http://localhost:8002/lookup.xqy (or wherever your MarkLogic Server is installed).  But it won't do anything until we create the backend XQuery.

In the Ajax.autocompleter code, we gave request.xqy as our source for the lookups.  We need to create this in the /modules directory and it's contents can be something as simple as this:

(: get the value of the field - sent to us as a POST from the Scriptaculous autocompleter :)

let $query-base := substring-after(xdmp:get-request-body(),"autocomplete_parameter=")

(: add an '*' to it to create a wildcard search :)

let $query := fn:concat($query-base, "*")

return
    <ul>{

        (: use the MarkLogic built-in cts:element-value-match()* to search all of the values in the PERSONA element in the loaded XML plays :)

        for $item in cts:element-value-match(xs:QName("PERSONA"),$query)
        return

        (: return the <li> elements scriptaculous expects for its list :)

        <li>{fn:string($item)}</li>
        }
    </ul>

Yup - thats all there is to it:  9 lines of code to query some XML and return XML.

When all this is hooked up and running, you should have a mini-application that looks something like this:

Shakespearelookup


It's now up to you how you will use the power of XQuery to create dynamic content elements for AJAX.    Will you populate complex taxonomies and even bring back the content from leaf nodes?  Will you create search interfaces that give you the answers in the form?

How about an amazing interface for searching XML content?  Check out markmail.org, which is all XQuery and AJAX, for some inspiration.

So XQuery really must be the real X in AJAX, right?

Well it turns out there is a bit of room under the X in AJAX these days.  JSON is a popular alternative to XML (and XQuery's got that covered - check out this library Jason Hunter wrote to generate JSON from XQuery) and there are lots of ways to generate both XML and JSON.

But for those of us in the know, XQuery is the only way to go.  And with the growing number of XQuery powered content applications, no one can shut down the real X!

Matt

*NOTE: cts:element-value-match() is a MarkLogic built-in that requires an Element Range Index be configured for the element in question.  This is pretty straight forward: select your database under the Databases tab in the MarkLogic admin interface (also covered in the tutorial). Under Element Range Index, select Add.  For the scalar type select string, namespace can be blank and enter PERSONA for localname.  This will create an index for ordering that can also be used to perform the character lookup.

*ALSO NOTE:  there are plenty of other ways to get a list of PERSONA values based on user's input such as:

//PERSONA[cts:contains(., "p*")] (: still uses MarkLogic search built-ins :)
//PERSONA[fn:starts-with(fn:lower-case(string(.)), "p")] (: standard XQuery :)

But like scriptaculous, MarkLogic's search built-ins do the work for you (and also do it much more efficiently) so let's just enjoy using them too!

December 21, 2007

Santa's Really Big XQuery List

Let's say you are the head of an international syndicate headquartered up North that gets real busy this time of year getting a gift for every kid in the world (!!).

And let's say you really need to be efficient about it - Wired estimates it would take $27 billion just for the U.S. alone!

Maybe you could use some XQuery to move the process along and make that really big delivery list for all the kids?

If you had a gygnormous database of kids like this:

<kid>
    <name>Jane</name>
    <naughty/>
</kid>
<kid>
    <name>Josh</name>
    <nice/>
</kid>
<kid>
    <name>Michael</name>
    <nice/>
</kid>
<kid>
    <name>Lila</name>
    <nice/>
</kid>

And an equally huge list of toys that looks like this:

<toys>
    <toy>Jakks EyeClops Bionic Eye</toy>
    <toy>IlluStory Make Your Own Story Kit</toy>
    <toy>Blokus Strategy Board Game</toy>
    <toy>LeapFrog ClickStart My First Computer</toy>
    ...
</toys>


(For this example I've used Mark Logic's capabilities to scrape content from the web and grabbed Amazon's top gifts for kids - we all know Santa has his own sources).

You could use a bit of XQuery to bring them together and make sure we get one toy per kid:

(: start by looping through only the kids that get toys.  So limit it to the ones with <nice> elements and that DON'T yet have a <toy>
This method of selecting entries comes in handy for large content sets as we'll see below :)

for $kid in /kid[./nice][not(./toy)]

(: get a quick count of the toys for our random feature :)

let $toys := count(/toys/toy)
return

    (: get a toy based on a random number using MarkLogic's random function :)

    let $toy :=
        let $random := xdmp:random($toys)
        return
            (/toys/toy)[$random]

            return

                    (: add the toy node to the kid :)

                xdmp:node-insert-child($kid, $toy)

Run this, and all our kids now have toys:

<kid><name>Jane</name><naughty/></kid>
<kid><name>Josh</name><nice/><toy>Hasbro Playskool Step Start Walk 'n Ride</toy></kid>
<kid><name>Michael</name><nice/><toy>Scrabble</toy></kid>
<kid><name>Lila</name><nice/><toy>Fisher-Price Little Superstar Sing-Along Stage</toy></kid>

. . . except for naughty Jane of course.

But those of you with an eye for big things might be thinking, this is well and good for 4 kids . .. and maybe 400, but what about a couple billion!

Well that first test is the key.  It lets you grab a set of records to process without having to iterate through records and do commits after x number (like you would with a relational system).

Instead, you can have this code work on a small batch at a time but adding a predicate to limit the number returned by the query:

for $kid in /kid[./nice][not(./toy)][1 to 1000]

Then all you need to do is run it over and over - each run always only picking up the kids that need toys.  When everyone has a toy, it's done.

MarkLogic Server provides some ways of automating this with a function called xdmp:spawn that lets you put code onto a task server.  So to get through the billions of kids, you would take the above code (with the predicate for 1000), save it as a module and then run that module over and over until it there were no more kids who needed toys:

if (/kid[./nice][not(./toy)]) then
    xdmp:spawn("match-toy.xqy")
else "All Done - ready for Christmas Deliver!!"

Just like the big guy up North, XQuery keeps going until every kid has a toy.

Happy Holidays Everyone!

Matt




December 09, 2007

XQuery and Microsoft Office (2007) XML

Last week at the XML conference I gave a talk entitled First Encounters with Office Open XML. 

I learned a ton about this new format and had a lot of fun taking it apart and using XQuery to manipulate content and create new Office Open XML documents that Microsoft Word (2007) can open.

The basics are that the format is XML contained in a zip file.  So you can, with desktop zip tools, open up any Office 2007 file and work with the XML outside of the MS Office 2007 tools.

Combining this with a content server like MarkLogic Server, which has included built-ins to do the unzipping and zipping, you can load Office Open XML into an XML repository, query across multiple documents with XQuery and use XQuery to create new Office Open XML documents.

As I said in my talk, Office Open XML *is* XML and you can open it up and mess with it!

MarkLogic's Pete Aven is starting a series of posts called MarkLogic Server and Office XML.

The first post shows how to open up, query and repackage Office Open XML focusing on the Microsoft Word format.  It is here under Office Logic.

The second post explores the details around Excel and is called Excel-ing with XQuery.

My slides from the presentation are here:

They are also posted to the XML conference site: http://2007.xmlconference.org/public/schedule/detail/362.

In my presentation I had a lot of fun going between the Office Open XML format XML and the tools inside Microsoft Word to view and manage content.  And, as Pete's posts show, it's pretty easy.

One thing I did was to generate Office Open XML format from Shakespeare content I used in the original tutorial.

To do this, I used the same transformation I used in the XQuery Transformers tutorial.  This starts with a recursive function and a typeswitch for each of the elements:

define function recursion($x, $options){
    for $z in $x/node() return mapping($z, $options)
}

define function mapping($x, $options)
{
  typeswitch ($x)
      case element(PLAY) return play($x, $options) 
      case element(TITLE) return title($x, $options) 
      case element(ACT) return section($x, $options)
...
}

But instead of returning an HTML element for each mapping, I created an Office Open element, in this example, the Title element.

First we wrap it in a <customXml> element to add the structure to the Word file.  Then we make a <w:p> element which is the basic building block of a word document.

Because this is a heading, we want some style - so depending on where we encounter this element, we are create style inside of the <w:pStyle> element using the standard word styles "Heading1", "Heading2" etc..

Finally we create a 'run' to hold the text with a <w:r> and put the text inside of the <w:t> element:

define function title($x as element(), $params as node()) as element()
{
    <w:customXml w:element="TITLE">
    <w:p w:rsidR="00592BC3" w:rsidRDefault="009E30CC" w:rsidP="009E30CC">
          <w:pPr>
          <w:pStyle w:val="{
              if ($x/parent::PLAY) then
                   "Heading1"
              else if ($x/parent::ACT) then
                   "Heading2"
              else if ($x/parent::PERSONAE or $x/parent::SCENE) then
                    "Heading3"
              else
                    "Heading4"
           }"/>
           </w:pPr>
           <w:r>
               <w:t>{passthru($x, $params)}</w:t>
           </w:r>
    </w:p>
    </customXml>
}

 

The result is some XML that we can zip up in an Office Open Package (see Pete's posts or the slides on how to do this) and open inside of word:

Asyoulikeit

Pretty neat!

Office Open XML really opens things up in the new Office 2007 suite letting you access and query the documents created in the suite, manipulate that content or create new Office content from other source.

Its XML after all, have fun messing with it!

Matt

November 09, 2007

Code with the XQuery Experts: 11/29, London U.K.

Right after the (American) Thanksgiving holiday and just before London Online, the world's best XQuery coders, Jason Hunter and Ryan Grimm, will be over in Royal London hosting an XQuery day.  The event details are:

Code with the XQuery Experts
Friday, November 30, 2007
8:30 am PT - 5:00 pm
Olympia Grand Hall
London, England

Sign up for it here.

Jason and Ryan have been using XQuery (and MarkLogic Server) from the very start and it should be an excellent event to see firsthand why XQuery is THE application language for content applications.

What's more, you can bring your own XQuery chops and win an 8GB iPhone for the Best XQuery App at the event.  There are some helpful tools in the Mark Logic code workshop to get you started and I expect full credit if something in my tutorial helps you win first place!

But you might want to 'enhance' this one:  I asked Jason and Ryan for something neat from the amazing XQuery powered email discovery application they've built called MarkMail and they send me this very elegant FAQ generator.

The first cool thing is that it's a complete FAQ in a single complete XQuery - starting with the content and then the code to present it:

(: content as XHTML in a div - edited by MT to be a sample :)

let $content :=
<div id="content">
<a name="general"/>
<h1>GENERAL FAQ</h1>

<a name="quick"/>
<h2>Given 15 seconds, what should I know?</h2>
<ul><li>MarkMail lets you search 4,000,000+ emails across 500+ Apache mailing lists</li>
...</ul>
<a name="whatisit"/>
<h2>What is MarkMail?</h2>
<p>
MarkMail is a community-focused searchable message archive, accessible at <a
href="http://markmail.org">http://markmail.org</a>, developed and hosted by <a
href="http://www.marklogic.com">Mark Logic Corporation</a>.
...
</p>
...
<a name="techie"/>

<h1>TECHIE FAQ</h1>
<a name="whatshard"/>
<h2>What's hard about searching email?</h2>
<p>
Email doesn't work well in a relational model because there's too much free
text.  It doesn't work well in a search engine either because there's too much ad hoc structure and hierarchy ... We've found email works naturally as XML.
...
</p>
<a name="store"/>
<h2>How do you store the emails?</h2>
<p>
Each email is stored an XML document inside MarkLogic Server.
...
</p>
...
</div>

(: from this XHTML node we can generate the table of contents including a split between regular and techie FAQ
:)

let $toc :=
    <div class="toc">
        <h1>Table of Contents</h1>
        <ul>
        {
            for $head in $content/h2[. << $content/h1[. = "TECHIE FAQ"]]
            let $name := $head/preceding::*[1][name(.) = "a"]/@name
            return <li><a href="#{ $name }">{ string($head) }</a></li>
        }
        </ul>
        <h3>Techie FAQ</h3>
        <ul>
        {
            for $head in $content/h2[. >> $content/h1[. = "TECHIE FAQ"]]
            let $name := $head/preceding::*[1][name(.) = "a"]/@name
            return <li><a href="#{ $name }">{ string($head) }</a></li>
        }
        </ul>
    </div>

(: then we put it all together :)

let $body := (
    <div id="docs">
        { $toc }
        { $content }
    </div>,
    <div style="clear: both"/>
    )

return

(: and output it :)

$body

I like that XQuery gives you a compete tool kit for content:   even if you just have simple HTML you can do things like use the << and >> order comparison operators to pull out all of the <h2> elements that come before the Techie FAQ H1 and grab the <a> element right before the <h2> using the preceding axis.

And you're creating the output as you go - with an XHTML FAQ generated in less than 30 lines.  To see the 'live' FAQ click here.

 

I hope you can make it out to the event at Olympia and can see first hand the many cool things you can do with XQuery, the right tool for the content application job.

Matt

October 05, 2007

XQuery at Work

I got some good feedback on my post about enrichment (thanks!) and I thought I would expand a bit on the first step of getting the list of items to power the enrichment.

I like this example because it just seems to be something that comes up over and over again.  I've been working with web technologies for  10+ years now and this was one of the first things I learned how to do . . . and its something I am still doing day to day.  Its a truism of working with the web: at some point you will need to reach out and grab something off another website. 

XQuery is great at this because no matter what the context - from SOA to complex sounding (but pretty simple) federated search to just needing to get a list of weapons to enrich Shakespeare - the basics are make an HTTP connect and request and parse the usually XML response.

And there is nothing better for parsing and processing XML than XQuery.

So in this example:

fn:string-join(
for $weapon in xdmp:tidy(xdmp:http-get( "http://en.wikipedia.org/wiki/List_of_medieval_weapons")[2])[2]//*:li/*:a
return
fn:string($weapon), '","')

the idea is to get the web page and make it into a sequence of strings.  In case you are wondering, every XQuery returns a sequence - even if its sequence of 1.  A sequence can have any number of items of any type.  Pretty useful as we will see.

The first step is to get the page with xdmp:http-get() - a MarkLogic XQuery extension.  This returns a sequence of two nodes.  The first is the response node with the header info.  The second is the actual page/image/whatever that you requested:

xdmp:http-get( "http://en.wikipedia.org/wiki/List_of_medieval_weapons")

returns

(<response xmlns="xdmp:http">
<code>200</code>
...
<headers>
<date>Wed, 03 Oct 2007 00:44:16 GMT</date>
<server>Apache</server>
. . .
</headers>
</response>
,
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...
</head>
<body>
...
</body>
</html>)

So while it looks like XML, that second item it really text.  We need to turn it into XML with tidy:

xdmp:tidy(xdmp:http-get( "http://en.wikipedia.org/wiki/List_of_medieval_weapons")[2])

The [2] gives us the second node of the response sequence and tidy returns:

(<status xmlns="xdmp:tidy">
<message>Info: . . .
</message>
</status>
,
<html xml:lang="en" lang="en" dir="ltr" version="-//W3C//DTD XHTML 1.1//EN" xmlns="http://www.w3.org/1999/xhtml">
<head>
...
</head>
<body>
</body>
</html>
)

Yup, another sequence . .. but now that second node is XML as produced by tidy (with all of the errors etc noted in the <status>).

From here on, we can use XQuery to process XML into that sequence of we need to do matches inside the text of the Shakespeare plays.

First we need to get the list out of the page.  It turns out that within that Wikipedia page, all of the items are listed within <li> elements AND are with link anchors - <a>.

Our first step is to use XPath to get just the sequence of <a> elements - and to do this I'm using * as the namespace - it's likely the XHTML namespace, but this way I don't have to even check:

xdmp:tidy(xdmp:http-get( "http://en.wikipedia.org/wiki/List_of_medieval_weapons")[2])[2]//*:li/*:a

This gives us another sequence - this time of <a> elements:

(<a href="#Axes" xmlns="http://www.w3.org/1999/xhtml"><span class="tocnumber">1</span> <span class="toctext">Axes</span></a>
,<a href="#Daggers_and_knives" xmlns="http://www.w3.org/1999/xhtml"><span class="tocnumber">2</span> <span class="toctext">Daggers and knives</span></a>
,<a href="#Swords" xmlns="http://www.w3.org/1999/xhtml"><span class="tocnumber">3</span> <span class="toctext">Swords</span></a>
,...
)

We can now use the FLOWR structure of XQuery to process this sequence one <a> at a time.  FLOWR stands for For Let Order by Where Return.  For assigns each item of a sequence to a variable, let (not used in our example) can hold additional values related to that item, order by allows you sort (default is document order) where allows you to filter and return is the output for each item in the sequence.

In our example we assign each <a> element to a variable, then use the fn:string() function to get the string value.  Running this over the entire sequence of <a> elements creates a sequence of strings:

for $weapon in
xdmp:tidy(xdmp:http-get( "http://en.wikipedia.org/wiki/List_of_medieval_weapons")[2])[2]//*:li/*:a
return
fn:string($weapon)

This returns:

("1 Axes","2 Daggers and knives", "3 Swords", .... "Broadsword", "Claymore", "Cutlass", "Falchion", ...)

We're getting the header values, but thats OK - they just won't match anything.

And we now have a sequence of strings.

Except that I wanted to save this as a file so I could run it over and over again while I got the enrichment right.

So while I've written it out as a sequence of strings (and this is the accurate state within the server) if I output this, I actually get this:

1 Axes
2 Daggers and knives
3 Swords
...
Broadsword
Claymore
Cutlass
Falchion
...

This is also correct . . . when output, the strings are represented as, well, strings.  And certainly not as the  comma delimited list of quoted items that is really a string sequence constructor . . . and what I need.  So I need to make a single string that *is* comma delimited and has values in quotes so I can then stick that inside my code and it will become a sequence of strings.

To do this I just use my favorite XQuery function, fn:string-join().  This takes ANY sequence and makes it into a nice single string with whatever delimiter you select between each element:

fn:string-join(
for $weapon in
xdmp:tidy(xdmp:http-get( "http://en.wikipedia.org/wiki/List_of_medieval_weapons")[2])[2]//*:li/*:a
return
fn:string($weapon)
, '","')

The delimiter here is ",".  Like many content friendly languages, XQuery lets you use single and double quotes so you can return these special characters.

And unlike many other languages, fn:string-join correctly puts the delimiters *between* every item in the list . .. and doesn't also put it on the last item which you then have to correct when you write this function yourself in java, perl . . . just about anything else.

The result is *almost* a single string that represents a sequence of string that can be put into a file or another XQuery:

1 Axes","2 Daggers and knives","3 Swords","4 Blunt weapons" ...

I didn't do the extra step of adding the first '("' and the last '")' . . . we can just concat these on:

concat(
'("',
fn:string-join(
for $weapon
...
,'")'
)

And now we have out string for enrichment:

for $doc in xdmp:directory("/content/bill/")
let $weapons := ("axe", "sword","dagger", "falchion" , "etc., etc.")
return
    ....

To read the rest of the story about what to do with a list of weapons and the works of Shakespeare, Click here.

For my part, I use this pattern and process all the time.  Just last week someone suggested that it would be really cool to use Flickr to augment information on cameras by showing actual photos taken with that camera.

What a good idea!  And, once you know all these tricks in XQuery, its a one-liner, thanks to flickr's nicely structured pages (where sample images are in a "DetailPic" class):

xdmp:tidy(xdmp:http-get("http://www.flickr.com/search/?q=photo&cm=nikon%2Fd200")[2])[2]//*:td[@class="DetailPic"]//*:img

I'm asking for my camera (Nikon D200) and a generic term 'photo' and I get these nice pictures:

1486184226_c43b1021a6_m_2 1465535989_5cfcad565e_m_2 1468851191_eb867466df_m











All ready to be inserted into a rich, XQuery powered, content application.

Matt

August 22, 2007

XQuery and Lazy Enrichment: Keeping me Busy

I was searching for a blog topic and my son Josh suggested that I write a blog about how busy I've been in the last couple of weeks.

Excellent idea Josh!  Because it's XQuery and the cool things you can do with it that have been keeping me busy.

On top of helping people explore new ways to get the most of their XML content from with XQuery powered projects in Digital Asset Delivery (that take advantage of the benefits of the XML meta-data model I wrote about a while ago) and working on XML powered content production workflows (similar to the ground breaking XQuery powered SafariU from O'Reilly), I've been also been exploring using XQuery to enhance XML with something called lazy enrichment (a term coined by Mark Logic CEO Dave Kellogg).

The idea goes something like this:  if you have some content (say the complete works of Shakespeare) and they are loaded into an XML content server (say MarkLogic Server as in this tutorial), then wouldn't be interesting to cross cut the content with a topic like medieval weapons and be able to explore the Shakespeare texts in this new context?

This is text analytics: process text to get match topics and categories of items and give that content new meaning.  There are great engines out there that do this.  They analyze text, sentence structure and patterns and can automatically create categories, lists of people, specific places and even market specific items like company names or medical terms.

But using XQuery and some XML you can do it yourself with lazy enrichment and end up with something much better than lists of items.

The first step is to get some information on your chosen topic.  While the engines that you pay lots of money for have detailed databases on specific topics, they all start with a taxonomy or controlled vocabulary of the topic.  But if you know what your topic is, you can make your own list (and chances are you already have one).

For our example, lets go get a list of medieval weapons from wikipedia:

fn:string-join(
for $weapon in xdmp:tidy(xdmp:http-get( "http://en.wikipedia.org/wiki/List_of_medieval_weapons")[2])[2]//*:li/*:a
return
fn:string($weapon), '","')

This uses the MarkLogic Server HTTP built-ins to get the page and then does some processing to get just the list of weapons.  It's not called the best scraper on the web for nothing.  While we're at it, lets make a nice comma delimited list with quotes since we will want a text sequence for our enrichment process.

Now that we our list of weapons, lets process the text:

for $doc in xdmp:directory("/content/bill/")
let $weapons := ("axe", "sword","dagger", "falchion" , "etc., etc.")
return
    xdmp:document-insert(xdmp:node-uri($doc)),
        cts:highlight($doc,
            cts:or-query(
            for $test in $weapons
            return
            cts:word-query(fn:lower-case(fn:string($test)))
            )
            ,<weapon>{$cts:text}</weapon>
        )
    )

This takes every play I've loaded and, using MarkLogic Server's search features, finds the words in the play that match the list and creates new markup around that word to enrich the content with <weapon> elements.

This is lazy enrichment - taking our own knowledge of a topic, creating some rules around the matching (that can go well beyond simple search string matching) and then enriching the content in place.  We're not creating any separate lists or extracting this to a database - the content now has the topic embedded in it.

What can we do with this?  Well how about some really cool queries:

The first one is to actually create those topic lists - they are after all very useful.  So what are all the weapons that were in the plays?

fn:string-join(cts:element-values(xs:QName("weapon")), ",")

Using MarkLogic Server's element-value indexes, this gives us a report on the weapons found.

But how about something a bit more interesting:  which characters talk about weapons (so we can stay away from them)?

for $speaker in distinct-values(//SPEECH[./LINE/weapon]/SPEAKER)
return
<violent>{fn:string($speaker)}</violent>

Or maybe we'd like to know more about a specific weapon - in the list created in the first query in my database something called a 'falchion' showed up.  What the heck is a 'falchion'?  Lets make a report:

for $weapon in //SPEECH/LINE/weapon[.="falchion"]
return
<weapon>
    <name>{fn:string($weapon)}</name>
    <play>{fn:string($weapon/ancestor::PLAY/TITLE)}</play>
    <character>{fn:string($weapon/ancestor::SPEECH/SPEAKER)}</character>
    <speech>
        {$weapon/ancestor::SPEECH/LINE}
    </speech>
</weapon>

This returns us a nice report:

<weapon>
    <name>falchion</name>
    <play>The Tragedy of King Lear</play>
    <character>KING LEAR</character>
    <speech><LINE>Did I not, fellow?</LINE>
                <LINE>I have seen the day, with my good biting          
                <weapon>falchion</weapon></LINE>
                <LINE>I would have made them skip: I am old now,</LINE>
                <LINE>And these same crosses spoil me. Who are you?</LINE>
                <LINE>Mine eyes are not o' the best: I'll tell you straight.</LINE>
    </speech>
</weapon>

Because the weapon is marked up in the actual content, I can leverage the structure, the content around it and XQuery to give me as complex a report and analysis as I can possibly want.

But I still don't really know what a falchion is - besides the fact that its 'biting' and that Lear was fierce with it when he was young.

So for some extra credit, lets reach out to another source and, in place, augment our reading of Shakespeare to give us some new understanding.  In our transformation of the plays for display (using the XQuery transformers), let's go ask google what our weapon is:

define function weapon($x as element(), $params as node())
{
      <span>
             <span onclick="toggleDisplay(document.getElementById('pop-up'))">
             { passthru($x, $params) }</span>
             <span id="pop-up" style="position: absolute; display:none; border-style: outset; border-width: 2; background: #ffffff; font-family: Arial; font-size: 11px">
             <div style="background-color: #FFC770;">
              <h3>Google Search Results</h3></div>
             <div><ul class="basicMenu">
             {
             for $a in
             (xdmp:tidy(xdmp:http-get(fn:concat("http://www.google.com/images?q=", fn:string($x)))[2])[2]//*:div//*:td)[1 to 2]
             return
             <p>{$a}</p>
             }
             {for $a in
             (xdmp:tidy(xdmp:http-get(fn:concat("http://www.google.com/search?q=", fn:string($x) , "&submit=Google+Search"))[2])[2]//*:div[.//*:table]/*:a)[1 to 3]
             return
             <p>{$a}</p>
             }            
             </ul></div></span>
      </span>
}

This gives us a nice reading of the plays that looks like this:

Falchion


And we can see that a Falchion is a mean looking, huge sword.  Watch out for Lear!

Lazy enrichment turns out to be pretty powerful stuff!  You can annotate and augment texts with your own concepts and build rich displays to get new meaning out of even tried and true Shakespeare.

Thanks for the idea for the blog Josh - it turns out that even a 'lazy' idea in XQuery is exciting stuff that certainly does keep you busy!

April 30, 2007

(XQuery) Transformers! More than meets the eye!

Hoping to capitalize on marketing for the upcoming Transformer movie, I thought I would take a look at what makes XQuery the ultimate XML transformer.  (What?  You haven't heard?  It's actually not bad looking looking.)

Just like the nutty car and truck toys, XML has always been made to be transformed into something else.  And while the end formats of HTML, mobile devices and print ready PDFs aren't as much fun as 5 story tall robots, the idea is pretty much the same: your content is in a presentation free format (just like Bumblebee's Camaro) and can instantaneously be transformed into something else (lets just say . . . a huge yellow robot).  Everything needed to do the transformation is actually in the content . . . you just need an engine and some rules to make it happen.

A couple of weeks back I posted about making PDFs from the Shakespeare XML in the tutorial that looked a bit like this:

   for $speech in //SPEECH
    return
      <formatting-for-speech>
            {fn:string($speech)}
        </formatting-for-speech>

While this worked, it's more of a report type of XQuery.  What we want to do is feed in some XML and get out the new format.

To do this, we'll use a design pattern that is very similar to what an XSLT engine does behind the scenes:  we'll recurse through the XML and then perform formatting as we hit certain elements.

The first part is to set up the recursion:

define function recursion($x, $options) {
    for $z in $x/node() return mapping($z, $options) 
}

Then we'll make its counterpart - the mapping:

define function mapping($x, $options)
{
  typeswitch ($x)
      case element(PLAY) return play($x, $options) 
      case element(TITLE) return title($x, $options) 
      case element(ACT) return section($x, $options) 
      case element(SPEECH) return p($x, $options) 
      case element(LINE) return p($x, $options) 
      case element(PERSONAE) return section($x,  $options) 
      case element(PERSONA) return p($x, $options) 
      case element(STAGEDIR) return i($x, $options) 
      case text() return output-text($x, $options)
      case element(FM) return ()
      default return p($x, $options)
}

While it's similar to an XSL transformation, the important thing is that we control the recursion.  If we wanted to add some (admittedly silly) logic to recursion() we can just go ahead and do it:

define function recursion($x, $options) {
    if (fn:contains($x, "blood")) then
        <p>warning: blood coming up: {for $z in $x/node() return mapping($z, $options)}</p>
    else
        for $z in $x/node() return mapping($z, $options)
}

We now have the list of parts (just like the Transformers wheels, door handles, etc) and we now need to make our 'template' functions to make them into something new.  Instead of a fancy robot, we'll just make some HTML:

define function p($x, $options) {
        <p>{recursion($x, $options)}</p>
}
define function i($x, $options) {
        <i>{recursion($x, $options)}</i>
}
define function output-text($x, $options) {
  if (fn:empty($x)) then () else text {$x}
}

These basic transformations work with the mapping --> if we have a <LINE> it is now a <p> because it went through the p() function.

These are, after all, XQuery functions so we can also do some tests and complex formatting as well:

define function title($x, $options) {
    if ($x/parent::PLAY) then
        <h1>{recursion($x, $options)}</h1>
    else
        <h3>{recursion($x, $options)}</h3>
}
define function section($x, $options){
        <div style="margin-top: 25px;">{recursion($x, $options)}<hr/></div>
}

The super cool thing is that $x node we are passing around carries it's context.  Here we are doing a test of it's parent element.  We could reach up to the root node and get the play title or anything else . . .  the possibilities are endless and since transformations always have a catch, this flexibility comes in very handy.  Also, I'm pretty sure this is how the Transformers can do such cool tricks :).

Finally, we'll do the equivalent of a Transformer Powerlink (sure to be the climax of the movie).  This is where two Transformers bend and twist and make a new, bigger, more powerful robot.

To make our more powerful XML transformation, we'll use that $options node that we've been passing around:

define function play($x, $options) {
<html><head>
<title>{fn:string($options/desc)}: {fn:string($x/TITLE)}</title>
<meta name="keywords" content="{$options/meta-info}"/>
</head><body>
{recursion($x, $options)}
</body></html>
}

(: main function :)
let $meta-info :=
    let $personae :=  //PERSONA/text()
    return
        fn:string-join($personae, ",")
let $options := <options><desc>XQuery
                        Transformers!</desc>
                        <meta-info>{$meta-info}</meta-info></options>
for $play in (/PLAY)[1]
return
    mapping($play, $options)

We are pulling all the Shakespeare characters for extra large meta-data and passing along my new tag line: XQuery Transformers!  In the transformation for the PLAY element we stick this in the head of the document (a common PowerLink move) to make an HTML wrapper that packs some extra power.

Putting this all together in a single stored XQuery module will give you the building blocks for powerful , fully programmable transformations. 

XQuery Transformers: more than meets the eye!

March 21, 2007

XQuery Loves Java! (we're not sure how java feels)

I've always had a bit of a tough time with Java.

From the first time a 'little' java app crippled my production environment at PC World (back in the days of the memory leak prone JVM) to the several minutes weblogic used to take to just start (!!) to some of the amazing class and method spagetti mess I've seen really smart people create . . . well, let's just say I've never been a Java fan.

This is in part why I like XQuery so much -> it's purpose built for content and has just the right tools to get the job done.

But, outside of working with content, Java does do some really neat stuff . . . and does it well.

And, more importantly, it does a lot of stuff XQuery can't.

Let's say you have some XML that has pointers to images and need to create a distribution package with syndication XML *and* resized images.

XQuery has no problem transforming the XML.

And now, using a neat XQuery package called MLJAM developed by MarkLogic's Jason Hunter and Ryan Grimm, XQuery can also resize the images . . . by having Java do it:

import module namespace jam = "http://xqdev.com/jam" at "/mljam/jam.xqy"
import module namespace jamu = "http://xqdev.com/jam-utils" at "/mljam/jam-utils.xqy"

jam:start("[java server URL]/mljam/mljam", "", ""),

for $img in //mediaobject/imageobject
let $uri := fn:data($img/imagedata/@fileref)
let $caption := $img/caption
let $image := jamu:image-resize-percent(fn:doc($uri),50, "jpg")
return
<image>
     <path>{$uri}</path>
     <image-caption>{fn:string($caption)}</image-caption>
     <fifty-percent-binary>
         {xdmp:base64-encode(xs:string($image))}
     </fifty-percent-binary>
</image>
,
jam:end()

In this XQuery, we are taking docbook mediaobject elements, pulling the binary out and, using the MLJAM utilities module to make a new 50% version of the image.

Then we are making a new XML node, including the meta-data, and putting the encoded version of the image right into it. 

Neat!

To get this done, the MLJAM utility is, behind the scenes, making a connection to a Java servlet (the jam:start bit) and running some dynamically created java using the BeanShell Llibrary. 

For more details (including how to write your own java in addition to the utilities discussed here) see the very good MLAJM tutorial.

Another task content application developers run into all the time is running an XSL:FO engine to make a PDF from XML.

The transformation of XML to XSL:FO is a perfect task for XQuery since we can natively query the XML and dynamically output the new XML.

However, all the best engines to create the actual PDF are Java based applications.  So you used to have to create an entire Java japplication just to fetch the newly created XSL:FO from XQuery and send it to the rendering engine.

But not any more. 

Using MLJAM, we can made a pdf of the shakespeare plays we loaded in the Tutorial as simply as this:

import module namespace jam = "http://xqdev.com/jam" at "/mljam/jam.xqy"
import module namespace jamu = "http://xqdev.com/jam-utils" at "/mljam/jam-utils.xqy"

xdmp:set-response-content-type("application/pdf"),

let $fo :=
   for $play in //PLAY
   return
   <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
      <fo:layout-master-set>
         <fo:simple-page-master master-name="standard" page-height="11in" page-width="8.5in"
            margin-top="1cm" margin-bottom="2cm" margin-left="2.5cm" margin-right="2.5cm">
            <fo:region-body margin-top="0.5cm" margin-bottom="0.5cm"/>
         </fo:simple-page-master>
      </fo:layout-master-set>
      <fo:page-sequence master-reference="standard" initial-page-number="1">
         <fo:flow font-size="12pt" font-family="Times, New Roman" flow-name="xsl-region-body">
            <fo:block font-family="Times, New Roman">
               <fo:block text-align="center" font-weight="bold" font-size="26pt"
               color="#990033" margin-top="80px">
               {fn:string($play/TITLE)}
               </fo:block>
               <fo:block text-align="center" font-size="12pt" margin-top="400px">
               Sample document from XQuery loves Java!
               </fo:block>
               <fo:block break-before="page"/>
               <fo:block font-family="Arial" font-weight="bold" font-size="14pt">
                  {fn:string($play/TITLE)}
               </fo:block>
               {for $scene in $play//SCENE
               return
                  <fo:block text-align="left" padding-top="5mm">
                     <fo:inline font-weight="bold">
                        {fn:string($scene/parent::ACT/TITLE)}: {fn:string($scene/TITLE)}
                     </fo:inline>
                     {for $speech in $scene//SPEECH
                     return
                        <fo:block text-align="left" padding-left="5mm" padding-top="5mm">
                        {fn:string($speech/SPEAKER)}:
                           { for $line in $speech/LINE
                           return
                           <fo:block text-align="left" padding-top="2mm">
                              {fn:string($line)}
                           </fo:block>
                           }
                        </fo:block>
                      }
                  </fo:block>
               }
            </fo:block>            
         </fo:flow>
      </fo:page-sequence>
   </fo:root>
return
   jam:start("http://localhost:8080/mljam/mljam", "", ""),
   jamu:fop($fo),
   jam:end()

The most complex part here is the actual formatting . . . those three little lines at the end are what used to be an *entire* java application!

Just like that, you've got a dynamically generated PDF in your browser and without even a tiny bit of messing with Java.

Putting the power of Java to work within xQuery content applications?  Brilliant! (as the guinness guys would say).

Matt

P.S. Thanks to MarkLogic's Frank Rubino for helping me puzzle out how to inline a binary - for more info check out this article.

February 16, 2007

Who let the XQuery out?

One of the reasons I started to this blog was to get some XQuery code samples out there so people can see how cool it is.

With XQuery becoming a standard and a growing list of XQuery based content applications launching (the latest, Bowker's Global Books in Print), I thought we'd take a look at the XQuery out there in the wild:

In support of the really really rapid world of AJAX development, Jason Hunter put together this neat XML -> JSON library.  While you can power AJAX directly with XML generated with XQuery, JSON lets you just plug the content right in:  http://developer.marklogic.com/svn/commons/trunk/json/.

And how about making a content app with XQuery for all that cool AJAX?  Here's a fully featured app from MarkLogic's own Danny Sokolsky based on the XML shakespeare encoded by John Bosak.  It has the pillars of an XQuery app like traversing the content to create TOCs and transformations as well as search using MarkLogic's XQuery search extensions:  http://developer.marklogic.com/svn/bill/trunk/.

Since XQuery is built for XML, web services are a snap.  So how about an XQuery SOAP router: http://developer.marklogic.com/svn/soaprouter/trunk from Darin McBeath of Elsevier.

And what about managing all that XQuery code?  Also from Darin is the XQDoc project for creating automated documentation for XQuery and an XQDoc Web Service to keep your documentation in sync with your code.

And finally, since this is a blog, how about an XQuery blog engine?  This all XQuery app from Raffaele Sena shows that XQuery is much more than just a way to access content - it's a great way to make content applications: http://developer.marklogic.com/svn/xqlog/trunk/.

So who let the XQuery out?  Us content developers, that's who.  If you have a project you'd like to get out there get on the xquery developer list (by joining here) and let everyone know!

MT

December 22, 2006

$wish eq "Happy Holidays" and $holidays = ("Christmas", "Hanukkah", "Kwanza", "New Year")

One of the first things I ran into with XQuery was how to compare a single string value against a list of matching values.

In any language there is usually some trick involving arrays, lists and looping.

So imagine my surprise when I learned that, in XQuery, you can do this:

let $input := "christmas"
let $holidays := ("christmas", "hanukkah", "chanukah", "kwanza", "new year")
return
if ($input = $holidays) then
    <sentiment>We've got holiday!</sentiment>
else
    <sentiment>looks like coal</sentiment>

XQuery is made up of expressions, and the value of every expression is a sequence.  A sequence is one or more items . . . and an item is one of the XQuery types (strings, numerics, elements, attributes etc).  See this very complete section of the XQuery language for the complete lowdown.

So, since everything is a sequence (even the sequence of 1 that holds "christmas") the '=' operator, called the general comparison, evaluates the left hand side of the equation against *every* value in the sequence on the right.

Compare a value against a list is built into XQuery!

You can still do equality with the 'eq' or value comparison but this is reserved for a single value like:

let $input := "christmas"
return
if ($input eq "christmas") then
    <sentiment>We've got stockings!</sentiment>
else
    <sentiment>looks like coal, try again</sentiment>

Useful, but in a multicultural content filled world, I think '=' is the better match.

Happy Holidays!

Matt