TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Stop Words as Social Signals (a result that goes against IR common knowledge)

85 pointsby hamiltonabout 15 years ago

8 comments

klochnerabout 15 years ago
One of the two linked papers (AnnualReview.pdf) discusses formal settings, including those with disparities in power:<p>'Brown &#38; Levinson’s (1987) polite-ness theory takes into account an individual’s efforts to preserve the “face(s)” of others with whom one communicates. For example, they propose impersonalizing the speaker and hearer by avoiding the pronouns I and you, using past tense to create distance and time, diminishing the force of speech by using hedge words such as perhaps, using slang to convey ingroup membership, and using inclusive forms (we and let’s) to include speaker and hearer.'
conaniteabout 15 years ago
Amazing that [to, to, on, my, to, on, be, on, to, to ... etc] can reveal so much. One of the linked articles ( <a href="http://homepage.psy.utexas.edu/homepage/faculty/pennebaker/reprints/Chung&#38;JWP.pdf" rel="nofollow">http://homepage.psy.utexas.edu/homepage/faculty/pennebaker/r...</a> ) describes how the relative frequency of the use of "I" in communication is a powerful indicator of status (less "I" =&#62; higher status).
评论 #1306092 未加载
tlbabout 15 years ago
He seems to be claiming that in the phrase <i>I don't feel comfortable signing up for a $7,000 extra flight without talking to you guys</i> , it's the word <i>to</i> that signals the manager/subordinate relationship. It seems to me that <i>... without talking ... you</i> is the best clue.<p>In English, subjunctive case usually requires a <i>to</i>, and people asking permission use the subjunctive a lot. Is that the signal he's finding? Is that the best NLP can do?
sesquabout 15 years ago
I was always uncomfortable with stopwords. Ignoring some data smells like overfitting, or perhaps a naïve algorithm (as in, it has no concept of structure – which is often the case, for computational reasons).<p>That said, the last time I dabbled in IR, my algorithm choked on stopwords. I should revisit it.
Vivtekabout 15 years ago
Good <i>Lord</i> that is counterintuitive!
rozimabout 15 years ago
See also SpotSigs ("...stopwords may however be very good indicators of the actual interesting parts of a web page...")<p><a href="http://infoblog.stanford.edu/2008/08/spotsigs-are-stopwords-finally-good-for.html" rel="nofollow">http://infoblog.stanford.edu/2008/08/spotsigs-are-stopwords-...</a>
lallysinghabout 15 years ago
Outside the old post-2001 TIA/NSA efforts, what can this be used for?<p>Maybe for determining social hierarchies for mind share in product recommendations or advertising?<p>Yeah, I can't think of one noncreepy way to use this.
akshaybhatabout 15 years ago
I think the following phrase is unnncessary "a result that goes against IR common knowledge"<p>IR aims to retrieve information and not sentiments from the text. This kind of work is related to sentiment analysis, where use of stops words as signal is common. Thus the whole argument that some holy criteria in IR has been shown to false is clearly incorrect. Even original authors don't make such assertions.