For a while now, I've aliased a version of wget as 'wcat', (alias wcat="wget -qO- -U NoSuchBrowser/1.0")
to dump pages directly to my browser so I could quickly search through and use less, sed, and all sorts of other stuff. Integrating text extraction into that would be pretty useful.