What Happened to XPath?

135 点作者 AhtiK超过 4 年前

25 条评论

orf超过 4 年前

XPath post 1.0 got ridiculous, like many things do. What started with a simple, elegant language morphed into one with a http client, filesystem methods, json support, functions, loops, extensions and the ability to read environment variables.I wrote a post about it a while back[1] (I regret some of the wording used there) and maintain a tool[2] that can exploit XPath injection issues. I'd recommend sticking with 1 or maybe 2, and pretending 3.x doesn't exist.1. <a href="https://tomforb.es/xcat-1.0-released-or-xpath-injection-issues-are-severely-underrated/" rel="nofollow">https://tomforb.es/xcat-1.0-released-or-xpath-injection-issu...</a>2. <a href="https://github.com/orf/xcat" rel="nofollow">https://github.com/orf/xcat</a>

评论 #24942192 未加载

评论 #24942817 未加载

评论 #24941801 未加载

评论 #24943048 未加载

评论 #24942535 未加载

评论 #24947676 未加载

评论 #24942379 未加载

评论 #24943537 未加载

评论 #24944414 未加载

评论 #24965113 未加载

评论 #24942269 未加载

irjustin超过 4 年前

Anyone who does scraping or automated browser work eventually comes across XPath.In some ways, XPath is like regex. It's got insane power, but comes with a relatively steep learning curve. Remember reading regex for the first time? What? But unlike regex, the number of people using it are few in comparison.I avoided XPath until I couldn't anymore. I could do a lot with CSS selectors, but eventually the DOM traversal became difficult to reason about w/ just CSS.After taking the dive, it's so powerful. Read a single XPath and like regex, you can fully understand what the thing is going after and how it will get there.There are functions in XPath 2.0 that I would love to have, but Nokogiri for Rails is stuck in 1.0 world with no plan to go to 2.0. Sad, but I'll live.

评论 #24942244 未加载

评论 #24942234 未加载

评论 #24941958 未加载

评论 #24941999 未加载

评论 #24941813 未加载

Crazyontap超过 4 年前

Xpath is so powerful for web scraping I just realized recently. I'd been using css selectors for my occasional scraping needs and never bothered to learn xpath until on day on a whim decided to learn at least the basics.Man I can now write scrapers in 2 minutes that used to take me quite some time thanks to the power of xpath. Thing like ancestors, contains, the ability to chain, etc is so so powerful. I used to write so many hacks just to do the same with css before.

评论 #24943166 未加载

评论 #24949827 未加载

评论 #24942299 未加载

评论 #24943545 未加载

benibela超过 4 年前

The biggest problem with the new XPath versions is that the W3C made the standards, but almost no one implemented them, so you cannot actually use themI was doing web scraping, and needed regular expressions to get the text, so I have implemented XPath 2. And currently I am updating it to XPath 3.1: <a href="http://www.videlibri.de/xidel.html" rel="nofollow">http://www.videlibri.de/xidel.html</a>

评论 #24942330 未加载

评论 #24942323 未加载

评论 #24945089 未加载

thom超过 4 年前

XPath and XSLT was the first time (despite doing Haskell at university) that I started to really understand functional programming. The first time was working on a tech stack that was basically Microsoft SHAPE queries transformed into HTML. The second was multiple projects customising Google custom search engine results. It was weird realising that these very limited primitive were actually infinitely powerful if you were willing to warp your brain the right way.That said, I scrape a fair few webpages now and have never once revisited XPath. I suppose people have mostly written off anything that feels too much like XML as enterprisey and deprecated.

评论 #24943196 未加载

评论 #24943434 未加载

ping_pong超过 4 年前

XPath and XML in general is a great example of "Death by Committee". They tried too hard to be too smart and try to solve everything, and overcomplicated it to death. This is why people largely abandoned it. This is what is happening to C++ and they are steering themselves by committee into a dead end.

评论 #24943630 未加载

评论 #24946960 未加载

评论 #24946918 未加载

projektfu超过 4 年前

With increasing power comes the likelihood that people accidentally implement behavior that is nonpolynomial. It looks good in testing but then with real live data starts taking seconds to render/re-render. There are probably examples of this already in CSS but seems more likely with arbitrarily backtracking XPath expressions.

anonymousblip超过 4 年前

I love the XPath model of declaratively querying and transforming data, which has been highly influential (see JQ, JSONPath, GROQ, etc.). Ultimately, it was too closely tied with XML, which was overdesigned complex, and sucked into the committee hell that brought us more overdesigned technologies like SOAP and XML Schema.

mongol超过 4 年前

Xpath 1.0 is maybe the single most useful output from the XML universe. Did something like it exist before?

icedchai超过 4 年前

XPath 1.0 was released in the late 90’s. I remember using it in some server-side XML processing code (Java 1.2?) It did the job where the alternative was writing a ton of procedural code to get at a specific node, etc.

lkuty超过 4 年前

XPath 3 and XQuery 3 are powerful and great technologies to query XML if you need that stuff. The problem is that most implementations cover XPath 1.0 because I guess it is too difficult (i.e. time consuming and involved) to produce a 2.x or 3.x implementation, let alone with full W3C XML Schema support. There is also BaseX which implements XQuery 3.x which is a nice native XML database. I really dig XML and its technologies. I wish XQuery 3.x was available everywhere.

jarym超过 4 年前

Shameless plug of DefiantJS[1] that gives a lovely fast XPath query capability to JSON data.1. <a href="https://defiantjs.com" rel="nofollow">https://defiantjs.com</a>

dehrmann超过 4 年前

One of the huge gaps in JSON tooling is there isn't a standard XPath equivalent (there's JSON Pointer, but it's nowhere close to XPath, and JSON Path which isn't standardized) and no XSLT equivalent.For as painful as XSLT was, at least it was a standard thing that existed.

johnward超过 4 年前

I do a bunch of of XML/XSLT work still. I use XPATH 1.0 basically everyday. It's also awesome for web scraping. Overall, it's a great tool that doesn't get a ton of exposure.

mapgrep超过 4 年前

Is there something I can read to get up to speed on xpath? Any recommendations for online or printed resources? (Particularly from folks who use it regularly!)

varispeed超过 4 年前

I remember spending good two weeks writing XPath parser in C and then the client changed their system responses to JSON. My last experience with XPath.

chriswarbo超过 4 年前

XPath is great, and works equally well in lumbering, ceremony-heavy Enterprise Java environments; and in quick bash one-liners.I use it in a bunch scraping scripts for Web sites which don't provide RSS feeds. It's really nice for quickly 'exploring' a document to find the needed data; it's simple to update when sites change their layout; and it can be read in from a config file, argument, env var, etc. to keep things generic and flexible.

forgotmypw17超过 4 年前

XPath is hard to replace when writing Selenium WebDriver scripts. Thank you for existing, XPath.

mimixco超过 4 年前

I thought XPath was pretty terrific for the day. It let you transform XML into a user interface in an entirely declarative way -- not just the appearance of items like CSS but the actual content could be inspected and altered. I built some cool things in XPath before frameworks like Angular took over.

评论 #24941818 未加载

评论 #24942457 未加载

评论 #24941753 未加载

techsin101超过 4 年前

css selector aren't alternative to xpath, alternative would be to write it out yourself in js, sort of entire tree parsing algo. there are times when this is the only option when scrapping.

chrshawkes超过 4 年前

What is the alternative for accurate scraping?

dzonga超过 4 年前

if you do any type of webscraping. xpath is the way to go. thanks to my former co-worker Justin, for showing me that.

dsq超过 4 年前

I used xpath last week for something

tinus_hn超过 4 年前

This is that weird language you use to make WebDAV servers look okay in a browser, right?

评论 #24941775 未加载

katzgrau超过 4 年前

It's hard not to read this as satire, because XPath is so inelegant. Not that CSS selectors are a model of elegance, but it gets the job done (most of the time) and is easy enough for rookie devs and designers to pick up.

评论 #24942278 未加载

评论 #24942111 未加载

评论 #24942124 未加载