(March 26 2006)
REXML comes from http://www.germane-software.com/software/rexml/ and fully supports XPath 1.0. It's not gemmified, so you'll have to download and install by hand.
Take one sample XML document – here's mine, it's the ATOM feed for this blog. Drop it into a text file for easy access, or choose your own – all atom files should have basically the same layout. This is :-
require 'rexml/document' include REXML atom = Document.new File.new "atom.xml" p XPath.first(atom,"//*")
This will add the relevant rexml module (and include the REXML namespace, so we don't have to say things like REXML::Document.new all the time). Then it declares a new rexml document using the contents of “atom.xml”. Finally, we look to see what the first elements are :-
<feed version='0.3' xmlns:dc='http://purl.org/dc/elements/1.1/'
xmlns='http://purl.org/atom/ns#'> ... </>
Well, that's correct - there's only one element in the document at the top level, and that's “feed”. But what's inside it?
XPath.each(atom,"//feed/*") {p}
[<title mode='escaped'> ... </>, <link href='http://nb.inode.co.nz'
rel='alternate' type='text/html'/>, <modified> ... </>, <author> ... </>,
<entry> ... </>, <entry> ... </>, <entry> ... </>, <entry> ... </>,
<entry> ... </>, <entry> ... </>, <entry> ... </>, <entry> ... </>,
<entry> ... </>, <entry> ... </>]
That's better - within the “feed”, there's a title, link, modified, author, and a series of entries. Let's look into the entries …
XPath.each(XPath.first(atom,"//feed/entry/")) {p}
[<title mode='escaped'> ... </>, <author> ... </>, <link
href='http://nb.inode.co.nz/archives/2006-03-20T11_18_09.html'
rel='alternate' type='text/html'/>, <id> ... </>, <issued> ... </>,
<modified> ... </>, <created> ... </>, <dc:subject> ... </>, <content
mode='escaped' type='application/xhtml+xml' xml:space='preserve'
xml:lang='en'> ... </>]
So now we can see the structure within an entry at last. Let's try listing the titles of all the entries in the document :-
XPath.each(atom,"//feed/entry/title/") {|e| puts e.text}
OSX Tiger fails poll()
Xerox DocuColor watermarking
Comments and Trackback
Textile?
The syntax battle ...
Remembering Categories
IRC and antivirus
Markdown and better CSS
Markdown added
Latine