Let's say you are the head of an international syndicate headquartered up North that gets real busy this time of year getting a gift for every kid in the world (!!).
And let's say you really need to be efficient about it - Wired estimates it would take $27 billion just for the U.S. alone!
Maybe you could use some XQuery to move the process along and make that really big delivery list for all the kids?
If you had a gygnormous database of kids like this:
<kid>
<name>Jane</name>
<naughty/>
</kid>
<kid>
<name>Josh</name>
<nice/>
</kid>
<kid>
<name>Michael</name>
<nice/>
</kid>
<kid>
<name>Lila</name>
<nice/>
</kid>
And an equally huge list of toys that looks like this:
<toys>
<toy>Jakks EyeClops Bionic Eye</toy>
<toy>IlluStory Make Your Own Story Kit</toy>
<toy>Blokus Strategy Board Game</toy>
<toy>LeapFrog ClickStart My First Computer</toy>
...
</toys>
(For this example I've used Mark Logic's capabilities to scrape content from the web and grabbed Amazon's top gifts for kids - we all know Santa has his own sources).
You could use a bit of XQuery to bring them together and make sure we get one toy per kid:
(: start by looping through only the kids that get toys. So limit it to the ones with <nice> elements and that DON'T yet have a <toy>
This method of selecting entries comes in handy for large content sets as we'll see below :)for $kid in /kid[./nice][not(./toy)]
(: get a quick count of the toys for our random feature :)
let $toys := count(/toys/toy)
return(: get a toy based on a random number using MarkLogic's random function :)
let $toy :=
let $random := xdmp:random($toys)
return
(/toys/toy)[$random]
return
(: add the toy node to the kid :)
xdmp:node-insert-child($kid, $toy)
Run this, and all our kids now have toys:
<kid><name>Jane</name><naughty/></kid>
<kid><name>Josh</name><nice/><toy>Hasbro Playskool Step Start Walk 'n Ride</toy></kid>
<kid><name>Michael</name><nice/><toy>Scrabble</toy></kid>
<kid><name>Lila</name><nice/><toy>Fisher-Price Little Superstar Sing-Along Stage</toy></kid>
. . . except for naughty Jane of course.
But those of you with an eye for big things might be thinking, this is well and good for 4 kids . .. and maybe 400, but what about a couple billion!
Well that first test is the key. It lets you grab a set of records to process without having to iterate through records and do commits after x number (like you would with a relational system).
Instead, you can have this code work on a small batch at a time but adding a predicate to limit the number returned by the query:
for $kid in /kid[./nice][not(./toy)][1 to 1000]
Then all you need to do is run it over and over - each run always only picking up the kids that need toys. When everyone has a toy, it's done.
MarkLogic Server provides some ways of automating this with a function called xdmp:spawn that lets you put code onto a task server. So to get through the billions of kids, you would take the above code (with the predicate for 1000), save it as a module and then run that module over and over until it there were no more kids who needed toys:
if (/kid[./nice][not(./toy)]) then
xdmp:spawn("match-toy.xqy")
else "All Done - ready for Christmas Deliver!!"
Just like the big guy up North, XQuery keeps going until every kid has a toy.
Happy Holidays Everyone!
Matt
Nice post as always, Matt.
I think that using xdmp:spawn in your example warrants a brief mention of its non-transactional side effects - if only to save others the pain they caused me!
Correct me if I'm wrong but xdmp:spawn is non-transactional, so even if the calling transaction is does not commit (e.g. is retried due to a deadlock) the spawned transaction will continue to run. Not a problem if you're expecting it...
Posted by: Nicholas Edwards | January 08, 2008 at 02:06 PM