XPath is actually pretty useful once it stops being confusing

142 点作者 sugnid超过 11 年前

25 条评论

kjhughes超过 11 年前

XPaths are extremely useful. I actually enjoy writing them, much like I enjoy writing regular expressions. In fact, I consider both to return manifold the modest investment they require to learn well.XPath : XML :: regex : text

评论 #6672402 未加载

评论 #6671060 未加载

gopalv超过 11 年前

xpath is awesome, especially once you understand what an axis is.And that is what I've found most people who have trouble with it don't understand - like what exactly following-sibling or child means.I spent about 2 months writing my own xpath evaluator once and it gets so much easier (to implement too) when you understand this is just a tree-traversal with an iterator following the axis.Unfortunately the axis syntax makes it very verbose to read.

neves超过 11 年前

W3C is full of terrible standards: the verbose dom, the obtuse xml schema, the crippled css (you can't have a variable), and others. XPath isn't one of them. It is the best way to query XML documents in a forward compatible way. Maybe someday we will able to use XPath in a CSS file instead of their crazy selectors.

d23超过 11 年前

> it's a whopping 11 lines of code.If you think 11 lines of code is a lot, you're overly focused on concision at the expense of readability. I've never (read: never) worked on any Ruby code, yet I find the posted example more readable than the supposedly more valuable xpath.At the very least, they're the same. If you're writing code in Ruby, 11 lines is nothing. If you're writing code in Ruby and xpath is used nowhere else in the project, that single line of super-compact xpath might as well be 1000 lines of Ruby -- it doesn't matter.If you're trying to compact 11 lines of code you're probably doing it wrong.

评论 #6673142 未加载

graue超过 11 年前

After a great article on what looked to be a handy tool, this part disappointed me:for this particular task, XPath is actually considerably slower than the pure-Ruby implementation. Interestingly, that's not true if you take out the part and only look for text at the beginning of paragraphs. My guess is that the following-sibling axis is the culprit, since it has to select all the following siblings of the br tags, and then filter them down to only the first sibling.I was hoping selectors were lazy, in which case, selecting all the following siblings but then immediately filtering that selection down to the first would be cheap. Lazy or not, can there really be no efficient way to do the equivalent of jQuery next()?

radicalbyte超过 11 年前

If you have great tools which keep the feedback loop short, then XPath, like Regex, SQL and CSS is extremely powerful and productive.Just make sure that you document your test cases (i.e. what you should match) or your colleges will hate you.

moron4hire超过 11 年前

Years ago, I wrote a tool for wrapping .NET XmlDocuments and making them far easier to work with via XPath: <a href="https://github.com/capnmidnight/xml-stuff" rel="nofollow">https://github.com/capnmidnight/xml-stuff</a>On its own, .NET's XML libraries are really only good for consuming XML documents, but even that is a rather painful experience, especially as it forces a namespace on all documents, complicating the XPath expressions necessary to query it. Actually authoring documents is a nightmare. My XmlEdit project makes it almost as simple as key-value-pair config files.

评论 #6672051 未加载

slig超过 11 年前

> But it gets more interesting if the lyrics are stored as an HTML fragment.Is there any reason to store the HTML version with s and s instead of a plain text and converting it to HTML with simple rules à la markdown? (single line break = , double line break = )

评论 #6675584 未加载

评论 #6671306 未加载

评论 #6671039 未加载

sixbrx超过 11 年前

I also like XPath for some purposes, but I think it really suffers from (IIRC) having been designed before the xml namespaces, which it only integrates very awkwardly IMO and which ruins the simplicity of XPath. Or maybe XML namespaces spoil everything they affect to some degree :)

评论 #6669516 未加载

评论 #6670123 未加载

评论 #6669932 未加载

hfsktr超过 11 年前

"This is a perfectly reasonable solution, but it's a whopping 11 lines of code. Further, it feels like we're using the wrong tool for the job: why are we using Ruby iterators and conditionals to get at DOM nodes?"Is it really that bad to have 11 lines in Ruby?Initially I didn't get the wrong tool part but after reading it all that did make more sense. I haven't used XPath more than a few times and they were pretty simple so can't complain. Just something I'll have to keep in mind.

评论 #6672053 未加载

评论 #6672869 未加载

narrator超过 11 年前

One problem with XPath is that it can be a lot slower than native or JIT'd code depending on the implementation. Interestingly enough you can do xpath like things in Scala with native code using pattern matching:<a href="http://ofps.oreilly.com/titles/9780596155957/HerdingXMLInScalaDSLs.html" rel="nofollow">http://ofps.oreilly.com/titles/9780596155957/HerdingXMLInSca...</a>

tomasien超过 11 年前

Here's how I use XPath - constant Googling! This is a good explanation, maybe I'll actually learn it now.

评论 #6669871 未加载

ianbicking超过 11 年前

If you are curious about XPath and CSS you might want to play with <a href="http://css2xpath.appspot.com" rel="nofollow">http://css2xpath.appspot.com</a>

gavinpc超过 11 年前

I'm happy to see many other XPath fans here.But as far as the OP, this seems like a case of worrying about the code instead of the data structure. This would be easier to address before the lines are transformed into HTML. Which I assume is not how they are stored.

评论 #6674748 未加载

habosa超过 11 年前

XML gets a bad rap because of how much prettier JSON is, but there are a lot of cool tools associated with it. XPath is pretty awesome, I had to write a XPath parser/executor once (for a class) and it made me appreciate the value and simplicity.Then there was XSLT, which was a pretty sweet way to turn a data format into a variety of "print" or "display" formats. Definitely been replaced by bigger and better things but it's a pretty awesome technology that does one thing really well.

评论 #6674098 未加载

评论 #6673579 未加载

GeorgeMac超过 11 年前

I have made a little XPATH primer. Which is very much a work in progress. Check it out on my github: <a href="https://github.com/GeorgeMac/xpath-primer" rel="nofollow">https://github.com/GeorgeMac/xpath-primer</a>There are a few issues I am having with my markdown editor, in comparison to githubs markdown support.

d0m超过 11 年前

I prefer to use html parsers for such problems, such as beautifulsoup in python. I used xpath in the past but the ending code wasn't that much shorter then a more verbose version based on beautifulsoup. And, for someone looking at the code, the beautiful version makes so much more sense.. Xpath also feels like a big regex expression that magically works.I'm not saying it's not useful. Actually, I believe that if you only have one use-case, then using xpath might be overkill because of all the added-complexity of maintaining a new library/technology/ideology. But if it's the sort of domain that xpath would be useful more than once, then sure use it.

评论 #6670155 未加载

评论 #6670292 未加载

评论 #6670188 未加载

callmeed超过 11 年前

I just started getting into xpaths pretty hardcore with my trivia generator for <a href="http://playhattrick.com" rel="nofollow">http://playhattrick.com</a> ... I use it for identifying tables of data to scrape. It's not as fun as regex IMO but it is powerful.Pro Tip: the chrome inspector lets you right-click on an element and get its xpath.Pro Warning: sometimes the xpath generated by chrome doesn't work when scraping with Nokogiri. I'm not sure why yet, I've just learned not to rely on it.

评论 #6671117 未加载

评论 #6671097 未加载

评论 #6671469 未加载

jefflinwood超过 11 年前

Interesting, (and this could just be a made up problem to illustrate the blog post), but wouldn't it have been much easier to just store the lyrics in another format (not HTML)?For example, you could use TEI XML (<a href="http://www.tei-c.org/index.xml" rel="nofollow">http://www.tei-c.org/index.xml</a>), and then use stanzas and lines. Then when you go to render your lyrics, you can capitalize the first letters in your presentation code.

m_st超过 11 年前

XPath is alright but as sixbrx noted it suffers from problems with namespaces. I keep using this xslt transformation to remove the ns info. Then it works just fine: <a href="http://stackoverflow.com/a/413088/34022" rel="nofollow">http://stackoverflow.com/a/413088/34022</a>

评论 #6669798 未加载

gpsarakis超过 11 年前

CSS selectors are much easier to remember than XPath. Python's BeautifulSoup allows you to select elements with selectors and is very convenient. XPath is a bit more verbose and most people already are familiar with CSS syntax.

评论 #6670559 未加载

评论 #6670434 未加载

masklinn超过 11 年前

> the / in an XPath expression plays the same role as the > in a CSS selector:The `/` in an XPath expression is probably a better match for the space in CSS selectors.

评论 #6669677 未加载

goflyapig超过 11 年前

This is a great explanation and quick tutorial on XPath, but, like regex, don't think I'd ever use it in production code unless I absolutely had to.I'm sure I'd have fun coming up with an XPath solution, but for me, the ultimate goal is maintainability. If I wasn't 90% sure that the next person to look at that code already knew XPath, then I'd go with the Ruby solution.Dealing with 11 lines of code in a language you know is better than dealing with 1 line of code in a language you don't (which ends up forcing you to read 1000 lines of documentation and examples to understand it).

optymizer超过 11 年前

Writing an XPath 1.0 parser in C was fun. Maybe one day I'll use it to (partially) replace MongoDB's JSON query language.

johnward超过 11 年前

I use xpath way more than I care to at my job but it get's the job done.