Last week at the XML conference I gave a talk entitled First Encounters with Office Open XML.
I learned a ton about this new format and had a lot of fun taking it apart and using XQuery to manipulate content and create new Office Open XML documents that Microsoft Word (2007) can open.
The basics are that the format is XML contained in a zip file. So you can, with desktop zip tools, open up any Office 2007 file and work with the XML outside of the MS Office 2007 tools.
Combining this with a content server like MarkLogic Server, which has included built-ins to do the unzipping and zipping, you can load Office Open XML into an XML repository, query across multiple documents with XQuery and use XQuery to create new Office Open XML documents.
As I said in my talk, Office Open XML *is* XML and you can open it up and mess with it!
MarkLogic's Pete Aven is starting a series of posts called MarkLogic Server and Office XML.
The first post shows how to open up, query and repackage Office Open XML focusing on the Microsoft Word format. It is here under Office Logic.
The second post explores the details around Excel and is called Excel-ing with XQuery.
My slides from the presentation are here:
They are also posted to the XML conference site: http://2007.xmlconference.org/public/schedule/detail/362.
In my presentation I had a lot of fun going between the Office Open XML format XML and the tools inside Microsoft Word to view and manage content. And, as Pete's posts show, it's pretty easy.
One thing I did was to generate Office Open XML format from Shakespeare content I used in the original tutorial.
To do this, I used the same transformation I used in the XQuery Transformers tutorial. This starts with a recursive function and a typeswitch for each of the elements:
define function recursion($x, $options){
for $z in $x/node() return mapping($z, $options)
}
define function mapping($x, $options)
{
typeswitch ($x)
case element(PLAY) return play($x, $options)
case element(TITLE) return title($x, $options)
case element(ACT) return section($x, $options)
...
}
But instead of returning an HTML element for each mapping, I created an Office Open element, in this example, the Title element.
First we wrap it in a <customXml> element to add the structure to the Word file. Then we make a <w:p> element which is the basic building block of a word document.
Because this is a heading, we want some style - so depending on where we encounter this element, we are create style inside of the <w:pStyle> element using the standard word styles "Heading1", "Heading2" etc..
Finally we create a 'run' to hold the text with a <w:r> and put the text inside of the <w:t> element:
define function title($x as element(), $params as node()) as element()
{
<w:customXml w:element="TITLE">
<w:p w:rsidR="00592BC3" w:rsidRDefault="009E30CC" w:rsidP="009E30CC">
<w:pPr>
<w:pStyle w:val="{
if ($x/parent::PLAY) then
"Heading1"
else if ($x/parent::ACT) then
"Heading2"
else if ($x/parent::PERSONAE or $x/parent::SCENE) then
"Heading3"
else
"Heading4"
}"/>
</w:pPr>
<w:r>
<w:t>{passthru($x, $params)}</w:t>
</w:r>
</w:p>
</customXml>
}
The result is some XML that we can zip up in an Office Open Package (see Pete's posts or the slides on how to do this) and open inside of word:
Pretty neat!
Office Open XML really opens things up in the new Office 2007 suite letting you access and query the documents created in the suite, manipulate that content or create new Office content from other source.
Its XML after all, have fun messing with it!
Matt
Hi Matt,
A very interesting post.
Can you explain the attributes on this element?
They look like IDs or references of some kind? Do they need to be unique?
Next question ;-) - the element is just a wrapper for the element?
And finally - the passthru function - is that just a way to pass the contents of the sub-elements through the same recursion?
Thanks for any help you can give,
Jim
Posted by: Jim Stock | December 10, 2007 at 05:59 PM
Good semi-technical blog. I gotta learn more about messing with XMLs...
Posted by: MICR Toner Pro | April 11, 2008 at 05:39 PM
You have built a good website
Posted by: Vince | August 01, 2008 at 03:32 PM
keep up the good work!t
Posted by: Kathy | October 02, 2008 at 01:14 PM