TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Electra: Pre-Training Text Encoders as Discriminators Rather Than Generators (2020)

65 pointsby luu10 months ago

4 comments

visarga10 months ago
LOL, I was reading the abstract and remembering there used to be a paper like that. Then I look at the title and see it was from 2020. For a moment I thought someone plagiarised the original paper.<p>Unfortunately BERT models are dead. Even the cross between BERT and GPT - the T5 architecture (encode-decoder) is rarely used.<p>The issue with BERT is that you need to modify the network to adapt it to any task by creating a prediction head, while decoder models (GPT style) do every task with tokens and never need to modify the network. Their advantage is that they have a single format for everything. BERT&#x27;s advantage is the bidirectional attention, but apparently large size decoders don&#x27;t have an issue with unidirectionality.
评论 #40995947 未加载
评论 #41005526 未加载
评论 #40998027 未加载
评论 #40996652 未加载
评论 #40997591 未加载
cs70210 months ago
Good work by well-known reputable authors.<p>The gains in training efficiency and compute cost versus widely used text-encoding models like RoBERTa and XLNet are significant.<p>Thank you for sharing this on HN!
adw10 months ago
(2020)
trhway10 months ago
Reminds somewhat parallel from the classic expert systems - human experts shine at discrimination, and that is one of the most efficient methods of knowledge eliciting from them.