"How are you retrieving the web page?" would be my first question. I usually do `lynx -dump [<a href="http://optional.com/]some.html`" rel="nofollow">http://optional.com/]some.html`</a> to get plain text. I know this works for links also.