June 23, 2008

Have it Your Way with Article 2.0 and XQuery

At the MarkLogic User Conference, Darin McBeath of Elsevier asked the question what would you do if you were the publisher?  How would you present an article?  How would you render figures and references?  What would you do if you were in control?

To answer, Darin showed some very cool examples of cutting across articles to present figures in new ways, a really cool interactive reference information browsing interface and a new take on the word cloud within an article to analyze article content.

And it was all done with XQuery.  The source XML is stored in a MarkLogic Server instance and served up via XQuery powered web services.  Each demo was then executed with XQuery running on another instance of MarkLogic that got the content from the first and, using XQuery, presented the new versions of the article from the source XML.  Darin said that each demo was just a few lines of code and didn't take any more than a couple of days to complete.  Very nice!

But what's really cool is that Elsevier is giving everyone the access to the same tools and a chance to make a new article presentation in the  ... (drumroll please) ... Article 2.0 contest!

Check out this super cool idea:  you get access to source XML content and, using XML tools like XQuery, you create your own idea of how an academic research article should be displayed.   How would you present an article?  Its really up to you.  And with prize money at stake you can bet you won't be alone in showing how some new ideas can shake up scientific publishing.

Check out the full details here.  

It all starts in September so you have plenty of time to plan your XQuery masterpiece . . . and Discovering XQuery can help with some examples of how to transform and render XML, how to grab some content from the web and present it, how to enrich content to power cool displays and a tutorial on how to get started with my favorite XQuery engine, MarkLogic Server.

Good luck and happy coding,

Matt

May 28, 2008

Mark Logic User Conference 2008: Generate Some XQuery Buzz

The Mark Logic User Conference, the world's largest gathering of XQuery users, experts and fans, is only two weeks away!

The jam packed agenda of sessions on ground breaking and diverse XQuery applications is a feast for XQuery fans.  You'll get to see real-world XQuery applications in action from content syndication at Simon & Schuster to the Electronic Flight Bag at United Airlines to the Army's Knowledge Management Systems.  There are also best practice sessions and sessions on the latest XQuery tools from Mark Logic.

So sign up here to and come on out to San Francisco to see how XQuery is generating some buzz across a wide range of industries and uses.

And really why would we expect anything less?  We've seen that XQuery (with some help from the easy to use MarkLogic Server) can quickly get you up and querying XML in minutes, scrape the web with ease, create new XML, enrich your content and power AJAX.   All from just plain old XML (which is the right model for content of course).

And as for generating buzz . . . well how about generating XQuery?  Like all good languages XQuery can be used in generative (or automatic) programming.  Thankfully with AJAX you don't need to generate Javascript so much any more (phew!) . . . but this approach still has plenty of uses.  For instance, I recently had to generate some XQuery for the very handy performancemeters to run on some sample content.  To generate the script, I used XQuery (of course):
<h:script xmlns:h="http://marklogic.com/xdmp/harness"> {
for $count in (1 to 100)  (: get 100 calls :)
let $selected := xdmp:random(3001) (: get a random number to choose from the 3000 documents :)
let $uri := let $uri := fn:base-uri(doc()[$selected]) (: get the uri of the selected document :)
return
   <h:test>
     <h:name>sample test</h:name>
     <h:comment-expected-result>transform {$uri}</h:comment-expected-result>
     <h:set-up/>
     <h:query>
     (: generate the XQuery to run for the test :)
      import module namespace tx="http://www.marklogic.com/test/transformtest" at  
         "transform-test.xqy"
    
         tx:transform("{$uri}")
     </h:query>
     <h:tear-down/>
   </h:test>
}</h:script>
In this simple XQuery I'm collecting some random URIs and and creating the calls to the transform function (which as a simple transformation ala XQuery Transformers) but you can see that this would be very handy to use the values in the XML to write custom XQuery for just a test . . . or even as part of your XQuery content application.

With all that XQuery can do its no wonder its good at generating buzz!

See you at the user conference,

Matt

April 11, 2008

XQuery: The Real X in AJAX

Like the real Napster in the movie The Italian Job (the remake), XQuery might have a bit of a chip on its shoulder about the X in AJAX.

Sure, it stands for XML since the idea is that return an XML fragment to the browser to update content in a div, fill in form fields or even create drop down menu options on the fly.

But how do you create that XML?  Using static XML files works, but the whole idea is to dynamically respond to user actions and give them information without reloading the whole page.

And what's the best way to dynamically create XML?  XQuery of course!

To prove it, lets do a simple example to create a drop-down form field for Shakespeare's characters (using the Shakespeare XML we loaded in the tutorial).  This field will auto-complete using AJAX and XQuery and give the user the characters found in the XML that begin with the letters entered into the field.

To get a head start, we'll use the popular AJAX tool Scriptaculous that takes care of all of the hard javascript stuff and lets us just work on creating the backend to deliver the content.

We'll also make use of MarkLogic Server's app server built-ins and its ability to run an HTTP server to make a complete application that presents the HTML, including Scriptaculous.  To do this, we'll start with the /modules directory we created and accessed with WebDav in the tutorial (where the CQ application was placed).

Assuming you have that set up, here are the steps to get the client side set up:

  1. create a js/ directory and install all of the .js scriptaculous libraries that came with the distribution (located here)
  2. Make a lookup.css in /modules with the sample styles from this page
  3. create lookup.xqy under the /modules directory with the following HTML in it:

xdmp:set-response-content-type("text/html; charset=utf-8")
(:  sets the mime type :)
,
<html>
    <head>
        <title>Shakespeare Lookup</title>
        (: reference the stylesheet :)

       

<link rel="stylesheet" type="text/css" href="lookup.css" media="screen"/>

         (: get the scriptaculous scripts loaded - the " " is to prevent them being optimized into an
        empty XML node like <script...  /> which some browsers don't like :)

        <script src="js/prototype.js">{" "}</script>
        <script src="js/scriptaculous.js">{" "}</script>

    </head>
<body>
    <h1>Shakespeare Character Lookup</h1>

    <div>
        <form>
        (: create the placeholder for the autocomplete field :)
        <input type="text" id="autocomplete" name="autocomplete_parameter"/>
        <input type="submit" value="select character"/>
        </form>
        <div id="autocomplete_choices" class="autocomplete"></div>

        (: run the scriptaculous autocomplete :)
        <script type="text/javascript">
            new Ajax.Autocompleter("autocomplete", "autocomplete_choices",                
            "request.xqy", {{}});
        </script>
    </div>
</body>
</html>

If you are like me and learned HTML and javascript way way back, it may take you a moment to realize that the <input> element named 'autocomplete' is NOT what actually shows up in the browser.  The script Ajax.Autocompleter replaces that element with the fully decked out, onClick enabled <input> element that does all of the auto-lookuping including making the calls to your request.xqy.  You can create all this yourself . . . but Scriptaculous does it for you, so go ahead and enjoy it!

This should all result in a simple web page with a text field on it that you can get to at http://localhost:8002/lookup.xqy (or wherever your MarkLogic Server is installed).  But it won't do anything until we create the backend XQuery.

In the Ajax.autocompleter code, we gave request.xqy as our source for the lookups.  We need to create this in the /modules directory and it's contents can be something as simple as this:

(: get the value of the field - sent to us as a POST from the Scriptaculous autocompleter :)

let $query-base := substring-after(xdmp:get-request-body(),"autocomplete_parameter=")

(: add an '*' to it to create a wildcard search :)

let $query := fn:concat($query-base, "*")

return
    <ul>{

        (: use the MarkLogic built-in cts:element-value-match()* to search all of the values in the PERSONA element in the loaded XML plays :)

        for $item in cts:element-value-match(xs:QName("PERSONA"),$query)
        return

        (: return the <li> elements scriptaculous expects for its list :)

        <li>{fn:string($item)}</li>
        }
    </ul>

Yup - thats all there is to it:  9 lines of code to query some XML and return XML.

When all this is hooked up and running, you should have a mini-application that looks something like this:

Shakespearelookup


It's now up to you how you will use the power of XQuery to create dynamic content elements for AJAX.    Will you populate complex taxonomies and even bring back the content from leaf nodes?  Will you create search interfaces that give you the answers in the form?

How about an amazing interface for searching XML content?  Check out markmail.org, which is all XQuery and AJAX, for some inspiration.

So XQuery really must be the real X in AJAX, right?

Well it turns out there is a bit of room under the X in AJAX these days.  JSON is a popular alternative to XML (and XQuery's got that covered - check out this library Jason Hunter wrote to generate JSON from XQuery) and there are lots of ways to generate both XML and JSON.

But for those of us in the know, XQuery is the only way to go.  And with the growing number of XQuery powered content applications, no one can shut down the real X!

Matt

*NOTE: cts:element-value-match() is a MarkLogic built-in that requires an Element Range Index be configured for the element in question.  This is pretty straight forward: select your database under the Databases tab in the MarkLogic admin interface (also covered in the tutorial). Under Element Range Index, select Add.  For the scalar type select string, namespace can be blank and enter PERSONA for localname.  This will create an index for ordering that can also be used to perform the character lookup.

*ALSO NOTE:  there are plenty of other ways to get a list of PERSONA values based on user's input such as:

//PERSONA[cts:contains(., "p*")] (: still uses MarkLogic search built-ins :)
//PERSONA[fn:starts-with(fn:lower-case(string(.)), "p")] (: standard XQuery :)

But like scriptaculous, MarkLogic's search built-ins do the work for you (and also do it much more efficiently) so let's just enjoy using them too!

February 17, 2008

XML is 10!

As this post from Elliot Kimber reminded me, it was 10 years ago (!!) that XML was officially born with the publication of the recommendation on February 10, 1998.

Unlike Elliot, who was in the middle of the standards process, I was very much a user of XML in 1997-1998.  I was working at PC World Online and we had just started to really think about how to model the articles for a multi-channel delivery process.  Getting them from Quark to the website was hard enough, but with the start of online syndication there were requests for simple HTML, ASCII text and for who knows what. 

As we sat around in the fall of 1997 trying to come up with a plan, the idea of tags that we could control and name emerged as a model that would let us get to almost any other format.  Pretty soon we were learning all about SGML and the soon to be created XML.

Things moved fast back then, and by February of 1998 we were already right in the middle of development of our newly designed XML publishing system featuring an Oracle storage system with XML in a BLOB and key fields as columns (called partial decomposition), TCL script (!!) running on the first version of Vignette and some very basic XML tools that looked a bit like XSLT developed for us by Vignette.

Somehow we put the new system on place and ran our first issue on it in April of 1998 - that's 5 months from idea to production!  (If you want to know more about that project see Just One Question for Matt Turner and this paper I gave at XML 1998).

I think of this project as real proof that the principles of XML and its simplicity compared to SGML really did enable the technology to make that huge leap from a niche idea to mainstream content model.

For me the most exciting part of the story is just beginning.  As I often say, 'Oh how we wish we had XQuery back then' and its true.  We were trying to program and transform XML and had to use so many layers of code (even TCL) and a horrible data model.

XQuery lets you do all that same work in one application layer directly against the native XML content.  Its no wonder that XML and content applications are seeing a huge resurgence now that XML (born in 1998) has its match in XQuery (born in 2007).

Happy Anniversary XML!

Matt

January 21, 2008

Really Loud XQuery

At the Mark Logic sales kickoff a couple of weeks ago I set up a late night demo jam so people could show off what they are doing with XQuery (and MarkLogic Server).

From dynamic time lines and mapping overlays for search and discovery applications to new content presentations (not saying anything ... but I'd keep an eye on the ever innovative markmail) to the XQuery powered Facebook app called Kick-it everyone had something super cool to show.  The crowd kept its part of the bargain with appropriate level of rowdiness and cheering.

As the person with the bright idea to do this, I had to come up set the presentation order.  I'd done this once before and simply picked names out of a hat.  But this time I thought that, since this is all about what XQuery can do, we should pick the order with XQuery.

So while everyone was getting demos ready and the crowd was setting in, I hooked up the projector and tapped out this bit of XQuery in a CQ window.  It uses the MarkLogic Server built-in xdmp:random() in a very neat trick that my colleague James Clippinger came up with as I was attempting far less elegant ways of getting it done.  (His suggestion rising above the other  'input' I was getting from the crowd.)

<demo-order>{

(: create presenter list - this is just a subset :)
let $presenters := ("Paul", "Pete", "David", "Danny", "Jay")

(: sort the list using random.  Each item in the FLOWR expression gets
assigned a sort key and when random() is used in the order by, it assigns a
random number to each item and creates a random sorting of the sequence :)

let $sorted-presenters := 
    for $presenter in
    order by xdmp:random()
    return
        $presenter

(: we then take that new order and go through it really just to get the
number.  Using 'at' gets the order of the original sequence *before* the order
by clause so we have to create sorted sequence first then display that :)

for $presenter at $pos in $sorted-presenters
return

(: output the final list with the order number :)
<presenter>
    <order>{$pos}</order>
    <name>{$presenter}</name>
</presenter>

}</demo-order>

To kick off the event I ran this code and picked the unlucky (or maybe lucky?) presenter to go first.

It was tons of fun to see a bunch of long time XQuery folks get truly excited about what it can do.  And by excitied, I mean really excitied:  we had everyone vote on the demos with crowd noise (measured with a dB meter of course).  The winner's cheers were measured at 107dB!!

That puts really good XQuery demo in between a Snowmobile and a Power Saw on this chart of decibel levels.

Not bad for a programming language!

January 02, 2008

2007: a Very Good XQuery Year

2007 was a great year for all things XQuery.  From the official acceptance of the standard in January, to new XQuery powered applications like MarkMail, to helping out King Lear and Santa, here's how it looked on the Discovering XQuery Blog:

 Here's to hoping 2008 is as good an XQuery year as 2007.

Happy New Year everyone!

Matt

December 21, 2007

Santa's Really Big XQuery List

Let's say you are the head of an international syndicate headquartered up North that gets real busy this time of year getting a gift for every kid in the world (!!).

And let's say you really need to be efficient about it - Wired estimates it would take $27 billion just for the U.S. alone!

Maybe you could use some XQuery to move the process along and make that really big delivery list for all the kids?

If you had a gygnormous database of kids like this:

<kid>
    <name>Jane</name>
    <naughty/>
</kid>
<kid>
    <name>Josh</name>
    <nice/>
</kid>
<kid>
    <name>Michael</name>
    <nice/>
</kid>
<kid>
    <name>Lila</name>
    <nice/>
</kid>

And an equally huge list of toys that looks like this:

<toys>
    <toy>Jakks EyeClops Bionic Eye</toy>
    <toy>IlluStory Make Your Own Story Kit</toy>
    <toy>Blokus Strategy Board Game</toy>
    <toy>LeapFrog ClickStart My First Computer</toy>
    ...
</toys>


(For this example I've used Mark Logic's capabilities to scrape content from the web and grabbed Amazon's top gifts for kids - we all know Santa has his own sources).

You could use a bit of XQuery to bring them together and make sure we get one toy per kid:

(: start by looping through only the kids that get toys.  So limit it to the ones with <nice> elements and that DON'T yet have a <toy>
This method of selecting entries comes in handy for large content sets as we'll see below :)

for $kid in /kid[./nice][not(./toy)]

(: get a quick count of the toys for our random feature :)

let $toys := count(/toys/toy)
return

    (: get a toy based on a random number using MarkLogic's random function :)

    let $toy :=
        let $random := xdmp:random($toys)
        return
            (/toys/toy)[$random]

            return

                    (: add the toy node to the kid :)

                xdmp:node-insert-child($kid, $toy)

Run this, and all our kids now have toys:

<kid><name>Jane</name><naughty/></kid>
<kid><name>Josh</name><nice/><toy>Hasbro Playskool Step Start Walk 'n Ride</toy></kid>
<kid><name>Michael</name><nice/><toy>Scrabble</toy></kid>
<kid><name>Lila</name><nice/><toy>Fisher-Price Little Superstar Sing-Along Stage</toy></kid>

. . . except for naughty Jane of course.

But those of you with an eye for big things might be thinking, this is well and good for 4 kids . .. and maybe 400, but what about a couple billion!

Well that first test is the key.  It lets you grab a set of records to process without having to iterate through records and do commits after x number (like you would with a relational system).

Instead, you can have this code work on a small batch at a time but adding a predicate to limit the number returned by the query:

for $kid in /kid[./nice][not(./toy)][1 to 1000]

Then all you need to do is run it over and over - each run always only picking up the kids that need toys.  When everyone has a toy, it's done.

MarkLogic Server provides some ways of automating this with a function called xdmp:spawn that lets you put code onto a task server.  So to get through the billions of kids, you would take the above code (with the predicate for 1000), save it as a module and then run that module over and over until it there were no more kids who needed toys:

if (/kid[./nice][not(./toy)]) then
    xdmp:spawn("match-toy.xqy")
else "All Done - ready for Christmas Deliver!!"

Just like the big guy up North, XQuery keeps going until every kid has a toy.

Happy Holidays Everyone!

Matt




December 09, 2007

XQuery and Microsoft Office (2007) XML

Last week at the XML conference I gave a talk entitled First Encounters with Office Open XML. 

I learned a ton about this new format and had a lot of fun taking it apart and using XQuery to manipulate content and create new Office Open XML documents that Microsoft Word (2007) can open.

The basics are that the format is XML contained in a zip file.  So you can, with desktop zip tools, open up any Office 2007 file and work with the XML outside of the MS Office 2007 tools.

Combining this with a content server like MarkLogic Server, which has included built-ins to do the unzipping and zipping, you can load Office Open XML into an XML repository, query across multiple documents with XQuery and use XQuery to create new Office Open XML documents.

As I said in my talk, Office Open XML *is* XML and you can open it up and mess with it!

MarkLogic's Pete Aven is starting a series of posts called MarkLogic Server and Office XML.

The first post shows how to open up, query and repackage Office Open XML focusing on the Microsoft Word format.  It is here under Office Logic.

The second post explores the details around Excel and is called Excel-ing with XQuery.

My slides from the presentation are here:

They are also posted to the XML conference site: http://2007.xmlconference.org/public/schedule/detail/362.

In my presentation I had a lot of fun going between the Office Open XML format XML and the tools inside Microsoft Word to view and manage content.  And, as Pete's posts show, it's pretty easy.

One thing I did was to generate Office Open XML format from Shakespeare content I used in the original tutorial.

To do this, I used the same transformation I used in the XQuery Transformers tutorial.  This starts with a recursive function and a typeswitch for each of the elements:

define function recursion($x, $options){
    for $z in $x/node() return mapping($z, $options)
}

define function mapping($x, $options)
{
  typeswitch ($x)
      case element(PLAY) return play($x, $options) 
      case element(TITLE) return title($x, $options) 
      case element(ACT) return section($x, $options)
...
}

But instead of returning an HTML element for each mapping, I created an Office Open element, in this example, the Title element.

First we wrap it in a <customXml> element to add the structure to the Word file.  Then we make a <w:p> element which is the basic building block of a word document.

Because this is a heading, we want some style - so depending on where we encounter this element, we are create style inside of the <w:pStyle> element using the standard word styles "Heading1", "Heading2" etc..

Finally we create a 'run' to hold the text with a <w:r> and put the text inside of the <w:t> element:

define function title($x as element(), $params as node()) as element()
{
    <w:customXml w:element="TITLE">
    <w:p w:rsidR="00592BC3" w:rsidRDefault="009E30CC" w:rsidP="009E30CC">
          <w:pPr>
          <w:pStyle w:val="{
              if ($x/parent::PLAY) then
                   "Heading1"
              else if ($x/parent::ACT) then
                   "Heading2"
              else if ($x/parent::PERSONAE or $x/parent::SCENE) then
                    "Heading3"
              else
                    "Heading4"
           }"/>
           </w:pPr>
           <w:r>
               <w:t>{passthru($x, $params)}</w:t>
           </w:r>
    </w:p>
    </customXml>
}

 

The result is some XML that we can zip up in an Office Open Package (see Pete's posts or the slides on how to do this) and open inside of word:

Asyoulikeit

Pretty neat!

Office Open XML really opens things up in the new Office 2007 suite letting you access and query the documents created in the suite, manipulate that content or create new Office content from other source.

Its XML after all, have fun messing with it!

Matt

November 28, 2007

XML lists on MarkMail and XQuery at XML 2007

Just in time for the upcoming XML conference the MarkMail team (none other than Mark Logic's XQuery gurus Jason Hunter and Ryan Grimm) have added the xml-dev, xquery-talk and xsl-list mailing lists to the already impressive collection of developer mailing lists in MarkMail.

MarkMail is an XQuery powered next generation email search application that not only lets you easily find a specific message in a big stack of email (over 4,000,000 messages) but also presents analytics to let you see the patterns and trends across topics.

So far I've been looking at the big picture with the XML lists and its interesting to see the activity around XQL in the late 90s  when it was picking up steam . . .

Xql_2
(click here to see it live)

the way Quilt (the real ancestor to XQuery) sort of picked up the ball at first but quickly faded . . .

Quilt
(click here to see it live)

and how XQuery really took over as it became a the standard (and got its own list so then things tapered off).

Xquery
(click here to see it live)

I also like how Jonathan Robie is the number 1 poster for all of these searches - thank you Jonathon!!

But most of all I like that I was able to find all this great info with just a few searches and some drilling down into content.

MarkMail is a great example of a content application:  its data is email modeled as XML and its application layer is XQuery to take full advantage of the structure in the content and the powerful set of application buidling features XQuery provides.

And since its powered entirely from the XML content, you can read every post in the same interface.  So for the Quilt search I was able to scan the messages and get a feel that most people were actually just referring to Quilt when actually talking about other things (like XQuery):

Quilmsg
(to see this click any message in this result)

So whether you are looking for trends - such as the decline of DSSSL or your favorite entry in a perma-thread
I think you will find MarkMail a valuable resource for all things on the XML lists.

And the timing couldn't be better.  Next week at the XML conference Jason will be giving the closing keynote "You're Darn Right XML has a Future on the Web" and the power of applications like MarkMail certainly underscore just what XML and XQuery can do.

And I will also be speaking at the conference subbing for Kelly Stirman in a session called "First Encounters with Open Office XML".  I'll be looking that this format then doing some live demos using XQuery to query it, take it apart and put it back together.

Hope to see you there and enjoy the new XML savvy MarkMail.

Matt

November 12, 2007

Code with the XQuery Experts . . . in San Carlos, CA NOT London

Sorry to all the folks over in the U.K. but it turns out the Code with the XQuery Experts event is NOT in London, but in Mark Logic's offices in San Carlos, CA.

New event details are:

Code with the XQuery Experts
Friday, November 30, 2007
8:30 am PT - 5:00 pm PT
Mark Logic Offices
San Carlos, CA

Registration is here

What better way to work off the soporific Thanksgiving turkey than some vigorous XQuery coding!  Plus the Best XQuery App contest lets you bring your XQuery chops and maybe win an iPhone.

For those of you in and around London, Mark Logic is hosting a cocktail party Wednesday November 5th during London Online.  Feel free to come by.  You can hand out with some other Mark Logic XQuery gurus and get a free drink to lessen your disappointment.  Registration for this event is here.

Sorry for the confusion - it is, however, nice to have so much activity XQuery to be confused about.

Matt