Will Norris who works on OSS at Twitter posted this[0]: "watch this space <a href="https://github.com/twitter/the-algorithm" rel="nofollow">https://github.com/twitter/the-algorithm</a>"<p>[0]: <a href="https://twitter.com/willnorris/status/1518694675909013504" rel="nofollow">https://twitter.com/willnorris/status/1518694675909013504</a>
Many people have commented that it is empty. However what they do not realize is that there has never actually been an algorithm and that is why it is empty.
I don't understand the concept of open-sourcing "the algorithm".<p>First of all, "the algorithm" is probably hundreds of thousands of lines of code, including all the tedious boilerplate like cache policies and multi-AZ logic.<p>And second of all, doesn't the algorithm include machine learning components, which are trained on terabytes of data? That data will likely be impossible to open source. And open sourcing the neural nets without the training data is mostly meaningless from a transparency perspective?
I've worked on very large scale recommendation systems at a FAANG. If Twitter's system resembles anything like ours, the concept of publishing or open sourcing "the algorithm" doesn't make sense.<p>Even if we were to open source all associated code and publish all related documents it would be very difficult to make sense of the entire system. That is precisely why companies such as Twitter A/B test the hell out of everything. What most people think of as "the algorithm" is a complex system that receives many inputs (maybe hundreds) and has dependencies on many other internal Twitter services. Tweets likely pass through multiple filtering steps as well as scoring before you ever see them. Each of these steps is highly contextual, depending on: location, past tweets, verification status, etc. You can attempt to predict the effect of a certain change, but you never know the actual outcome until you test it.<p>I think what will ultimately happen is that _some_ details will be published. Elon will parade that around as a victory for free speech as Twitter is now more "open". In reality, nothing of value will be gained as "the algorithm" isn't a simple function.
Is this supposed to be a joke? It's clearly an empty repo.<p>Either this is a mistake, or this is a really, really misguided attempt at a joke from Twitter.
Imagine having something like this for Google's and YouTube's algorithms; $100bn+ SEO industry would go bankrupt or at least they would pivot to some sort of advising but there wouldn't be the mayhem that we have today.
I tried to make a pull request already, haha.<p>error forking repo: HTTP 403: The repository exists, but it contains no Git content. Empty repositories cannot be forked. (<a href="https://api.github.com/repos/twitter/the-algorithm/forks" rel="nofollow">https://api.github.com/repos/twitter/the-algorithm/forks</a>)<p>My thoughts:<p>- Explicit rules for temporary and permanent bans<p>- Edit button<p>- More fun and thoughtful conversations like HN<p>- Less thought bubble Brooklyn based reporters, less VC and side grind hustle snake oil, maybe more comedians and memes?
Assuming Twitter is serious about publishing their feed algorithm [1], it's possible they're merely anticipating the EU's upcoming Digital Services Act which was finalized over the weekend. Among other things, the Act will compel large platforms to "make the working of their recommender algorithms (used for sorting content on the News Feed or suggesting TV shows on Netflix) transparent to users." [2]<p>Twitter's EU user base is probably [3] above the 45 million threshold that triggers the strictest transparency requirements under the Act. So perhaps they figure if they're going to be forced to disclose anyway, they might as well do it proactively.<p>[1] If it's even coherent to talk about their feed ranking system as a single algorithm — see the other comments in this thread.<p>[2] <a href="https://www.theverge.com/2022/4/23/23036976/eu-digital-services-act-finalized-algorithms-targeted-advertising" rel="nofollow">https://www.theverge.com/2022/4/23/23036976/eu-digital-servi...</a><p>[3] <a href="https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/" rel="nofollow">https://www.statista.com/statistics/242606/number-of-active-...</a>
Seems weird to start as a non-private repo until there's some content. Also bit of an unusual name. Can't tell if this is internal trolling or the future
Surely you guys don’t think that twitters sorting algorithm is already factored out into its own repo. Of course it’s empty.<p>That doesn’t mean it’s a joke, I see it as a show of goodwill — that there are a handful of people inside Twitter that are excited for transparency and for a revenue model that isn’t entirely based on ads, that are excited to get to work on this right away.
whatever will show up in this repo, I hope people realize that depending on what data you put into some algorithm you can get whatever output you want, and twitter is never going to (and neither can or should they) publish everyone's personal information and interaction on the site.<p>So I'm not sure what the ultimate point of this exercise is other than producing faux-transparency.
Not "the algorithm", but you can check if twitter is silently suppressing your account here: <a href="https://taishin-miyamoto.com/ShadowBan/" rel="nofollow">https://taishin-miyamoto.com/ShadowBan/</a>
There are elements of their algo that I think should be openly defined, and perhaps there should be some regulatory branch that reports to Congress that has full access. However, obfuscation is often necessary to countering bad actors.
The government, at federal, state and local levels, all rely on Twitter to conduct official taxpayer funded work. Taxpayer funded work should not happen on proprietary systems that operate with zero oversight or public transparency.<p>Elon polled Twitter users about this and the response was overwhelmingly in favor of open source and transparency. Everyone on Twitter got a vote.<p>If you oppose transparency, as many now are, you lose your credibility. So it’s another one of Elon’s people hacks, and look at all the morons falling for it.
Kind of unrealistic but I hope Twitter now open-sources not only the algorithm but also the Rails monolith itself. Would be kind of interesting to see how everything is done
I'm very technical and I think it would still be valuable to have a list of all the things that weight into the timeline view, even without the models or underlying data.<p>Like, there's no public admission right now of whether "shadow banning" or "ghost banning" is even officially a thing!<p>Some transparency seems unquestionably more powerful than none, and we can work from there.
At the time of posting, Will Norris (the open source lead at twitter, admin of their github account presumably) posted this. It has 44 retweets, 193 likes, 17 quote tweets, on github it has 1.6k stars.<p>That seems... bizarre to me?
I agree that there is no such thing as "the algorithm." It is Twitter in its entirety. And with that I have a wild question. Can Musk make Twitter fully open-source on GitHub?
Anyone who actually uses Twitter already knows the algorithm:<p>* Chronological - reverse sort by date<p>* Home - for all of the followed topics, recommended topics, retweets and tweets in the past day determine the estimated level of engagement, include the highest and reverse sort by date. This is likely to be a fairly basic ML model.<p>It will be uncontroversial, technically unsophisticated and of no practical use to anyone - users, developers or researchers.<p>This is not going to be PageRank where some genuine new insight was discovered.
"The algorithm" could mean a lot of things. Whatever it means, it probably spans hundreds or even thousands of services. That doesn't mean it cannot be made open-source.<p>I imagine they'd probably start with documentation and white-papers that communicate "here's how we intend for it to work".<p>It's seriously unlikely anyone in Twitter knows actually works how any non-trivial algorithm in the company works. To figure THAT out, they could decide to do a company-wide documentation and instrumentation push like they probably would've had to do for GDPR anyway, which is painful and boring and going to take a very long time.<p>Failing that, they could just say 'the algorithm as it stands is no longer fit for purpose, given part of its core requirement has become that it needs to be transparent and publishable, and presumably legible. We need to make a new one. Publish the core algorithm. We probably won't deploy it in that exact state, it's going to span multi-services and so on, you obviously don't get the data we used to train the models, but we will work backwards from it and here's an open mechanism to measure how true-to-form it actually is'
I’ve spent the better part of a decade writing open source projects for few to see. An empty repo gets hundreds of stars immediately. It’s all a popularity contest.
Apples are red. The sky is blue. Twitter shadowbans and tinkers with who sees who. I wonder what the old guard will do with the codebase over the next few months.
it's probably just a ripoff of pagerank with a separate spam filtering and banning system along with an army of contractors manually fixing it up.<p>if twitter is a game, sinking $43bn into it is kinda like winning or losing the grand final boss level. (unclear which)<p>wish elon would get back to facilitating the building of useful things. we still don't have a great clean energy generation story.
Musk has repeatedly talked about "open sourcing" twitter's algorithm. Given Musk is (understandably) super impatient, this repo may be his first move. I expect this to start with bunch of readme and other high level docs and evolve into details and eventually code.