TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Our machine learning and NLP journey

114 pointsby prabhatjhaalmost 7 years ago

5 comments

Radimalmost 7 years ago
A big factor in producing a good analysis is the feedback modality -- chat transcripts are different from emails, which are different from web forms or operator notes.<p>We&#x27;ve had several &quot;customer feedback &#x2F; intent &#x2F; support case analysis&quot; projects in the past. Some for large customers with millions of individual records (Autodesk), where there&#x27;s the additional challenge of &quot;What should the categories be in the first place? What&#x27;s in the data?&quot; (discovery).<p>What we learned is a model trained on one type of feedback will not necessarily perform well on others, because the relevant signals manifest differently across modalities: feedback length &#x2F; writing style &#x2F; typos, lexical richness &#x2F; repetition &#x2F; boilerplate, OCR noise &#x2F; how long is the long tail… Your model may learn to pick up on cues that are orthogonal to the sentiment or categorization problem.<p>This is especially true for black box models (deep learning) where introspection is limited: Did the model learn to rely on syntax? Specific words or character ngrams? Exclamation marks? Something else? Does an Indian-looking name imply sentiment negativity?<p>Slapping a generic ML technique (Stanford NLP, Naive Bayes, bi-LSTM, whatever) onto a bunch of tokens is a reasonable first step, that&#x27;s the low-hanging fruit. The tricky part is defining the problem space and the QA process correctly, and managing the devil that comes with the details.
评论 #17187982 未加载
diegoserranoaalmost 7 years ago
I always see all these article, services and products offering NLP for English. I wonder how this works with other languages that have a different structure e.g. Japanese, Arabic, etc. It would also be interesting to see how these algorithms behave when considering cultural aspects: one word or expression may have a different meaning in different places. How would the system handle something like &quot;Your service is the sh*t!&quot;. Is that positive? negative? There&#x27;s probably info on this subject all over the internet already haha very interesting though...
评论 #17183615 未加载
评论 #17183453 未加载
评论 #17184228 未加载
评论 #17186206 未加载
评论 #17183729 未加载
评论 #17182996 未加载
评论 #17182785 未加载
alexbeloialmost 7 years ago
Have you considered using this for analyzing feedback for politicians? They have similar pain points in understanding what feedback from constituents is a general problem vs isolated concern. Maybe through twitter data (as PoC) and then actual emails from constituents.
评论 #17183124 未加载
gmonfort77almost 7 years ago
Interesting article, how do you guys cope with badly written feedback or feedback that just doesn&#x27;t make sense? I guess that this type of feedback could &quot;pollute&quot; your algorithms if you constantly use unverified feedback as training data?
评论 #17182191 未加载
sharkensteinalmost 7 years ago
When you talk about discerning between algorithms from Google, Stanford, etc... what&#x27;s the criteria for doing that? Does it change based on the domain? if you are just trying to classify feedback how much the domain affects the algorithm?
评论 #17182055 未加载