TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A PhD student's perspective on research in NLP in the era of LLMs

126 点作者 morgangiraud大约 2 年前

7 条评论

rhdunn大约 2 年前
I&#x27;ve been interested in NLP for tagging stories based on topics and themes (detectives, werewolves, murder mystery, etc.), so need accurate disambiguation of parts of speech and ways of detecting uses of metaphore, similies, etc. to describe those. I also want to be able to assess how much of the text is about a given topic, so that if I&#x27;m interested in reading a detective story from e.g. the Project Gutenberg collection, I don&#x27;t want it to pick up a story where a detective is only mentioned in one paragraph.<p>I&#x27;ve looked at several existing NLP frameworks (Open NLP, Stanford NLP) and none of them are accurate enough -- they fail on things like adjectives and old english second person pronouns. This makes them practically unusable for proper sense diambiguation, lemma and part of speech based rules, etc.<p>The Open NLP tokenizer is also terrible at tokenizing title abbreviations (&quot;Dr&quot;, etc.) and things like the use of &quot;--&quot; to delimit text, which is frequently found various Project Gutenberg texts. You can train the Open NLP tokenizer, but it works on what it has seen, so you need to give it every variation of &quot;(Mr|Mrs|Miss|Ms|Rev|Dr|...). [A-Z]&quot; for it to tokenize those titles; the same for other tokens.
评论 #36082047 未加载
评论 #36081162 未加载
dontupvoteme大约 2 年前
Are papers becoming blogs?
评论 #36082194 未加载
评论 #36082264 未加载
评论 #36084050 未加载
评论 #36089980 未加载
评论 #36083104 未加载
评论 #36085412 未加载
评论 #36083065 未加载
dcl大约 2 年前
A PhD Student&#x27;s Perspective... ~20 authors.
评论 #36083100 未加载
评论 #36084312 未加载
评论 #36085954 未加载
totorovirus大约 2 年前
well I see topics like NLP in ethics, healthcare, etc, which I think is a sign they don&#x27;t have much to do here.
评论 #36082480 未加载
teleforce大约 2 年前
According to the article, the original research on language model was kick started by Claude Shannon&#x27;s early contributions in Markov chain model of English words.<p>If you are in the field of Information and Communication Technology (ICT) there are hardly any area in the field which their fundamentals do not have Shannon&#x27;s hands in it.<p>Leonard Kleinrock once remarked that he has to focus on the exotic queuing theory field that later leads to the packet switching and then Internet because most of the fundamentals problems in electrical and computer engineering (older version of ICT) have already been solved by Shannon.
etamponi大约 2 年前
Isn&#x27;t the main problem with NLP research now that you&#x27;ll need a ton of money to run your experiments? How can an &quot;average&quot; PhD researcher hope to validate their hypothesis if they need several thousand dollars per test?
评论 #36082061 未加载
评论 #36081989 未加载
评论 #36082293 未加载
al__be__rt大约 2 年前
this paper doesn&#x27;t appear to have been edited by any reputable journal, so take its authenticity at face value...
评论 #36085462 未加载