TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How Spotify ran a large Google Dataflow job for Wrapped 2019

269 pointsby jhataxover 5 years ago

16 comments

spyke112over 5 years ago
This they can do, but you can&#x27;t change your display name unless you hook up with Facebook. [0]<p>[0] <a href="https:&#x2F;&#x2F;community.spotify.com&#x2F;t5&#x2F;Live-Ideas&#x2F;Account-Change-Username&#x2F;idi-p&#x2F;703799" rel="nofollow">https:&#x2F;&#x2F;community.spotify.com&#x2F;t5&#x2F;Live-Ideas&#x2F;Account-Change-U...</a>
评论 #22363860 未加载
评论 #22364093 未加载
评论 #22367424 未加载
gwittelover 5 years ago
Interesting. I wish it had more details as far as inputs&#x2F;outputs, data sizes in different phases.<p>One thing that I wonder about is how much work could they do to collect this data on a forward moving basis. Often I see huge lookback jobs that answer predictable&#x2F;static questions -- prime candidates for aggregation during ingest.
评论 #22360523 未加载
rsmetsover 5 years ago
I thought this was such a marvel! However, my excitement level was tapered when I realized the playlist Best of the Decade was not created by only my music listening habits.<p>Seems as though users were pinned to some general playlist that had characteristics similar to listening habits? Still hats off from an engineering perspective. I as well wish there was more technical detail provided.<p>The year recap playlists though are fun personal snapshot of time.
评论 #22361845 未加载
评论 #22365259 未加载
评论 #22361055 未加载
评论 #22362346 未加载
dna_polymeraseover 5 years ago
Basically the perfect use case for cloud computing. Tons of compute for a short time. In this case there can’t possibly be people arguing for their own datacenter over cloud.
评论 #22361125 未加载
评论 #22361273 未加载
data4lyfeover 5 years ago
One massive SQL query across a billion plus users.
评论 #22359781 未加载
matlinover 5 years ago
I&#x27;m curious how much data this involves per user. This is clearly a massive undertaking when you&#x27;re talking about ~250 million users but I bet it would be easy to provide the same info if all the data was local on a device and each user ran their own query. This assumes that the space required to store all of your listening history fits on device which I think is a safe bet.
评论 #22361669 未加载
评论 #22363607 未加载
deepsunover 5 years ago
I&#x27;d recommend them to check out Clickhouse for exactly the same purposes. Works well for Cloudflare, Yandex, Sentry.<p>Another idea is to run probabilistic queries instead of exact ones, could bring down costs way more.
dangover 5 years ago
There&#x27;s more info at <a href="https:&#x2F;&#x2F;techcrunch.com&#x2F;2020&#x2F;02&#x2F;18&#x2F;how-spotify-ran-the-largest-google-dataflow-job-ever-for-wrapped-2019&#x2F;" rel="nofollow">https:&#x2F;&#x2F;techcrunch.com&#x2F;2020&#x2F;02&#x2F;18&#x2F;how-spotify-ran-the-larges...</a>.<p>(via <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22359528" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22359528</a>)
justlexi93over 5 years ago
In early December, Spotify launched its annual personalized Wrapped playlist with its users’ most-streamed sounds of 2019. That has become a bit of a tradition and isn’t necessarily anything new, but for 2019, it also gave users a look back at how they used Spotify over the last decade. Because this was quite a large job, Spotify gave us a bit of a look under the covers of how it generated these lists for its ever-growing number of free and paid subscribers.
drdoooomover 5 years ago
Was a neat little feature, too bad the share functionality didn&#x27;t actually work.
dvtrnover 5 years ago
I thought we had a thing about preserving post titles from the source?
评论 #22360186 未加载
评论 #22359994 未加载
fmjreyover 5 years ago
This may be a more appropriate source, from the source:<p><a href="https:&#x2F;&#x2F;labs.spotify.com&#x2F;2019&#x2F;11&#x2F;12&#x2F;spotifys-event-delivery-life-in-the-cloud&#x2F;" rel="nofollow">https:&#x2F;&#x2F;labs.spotify.com&#x2F;2019&#x2F;11&#x2F;12&#x2F;spotifys-event-delivery-...</a>
评论 #22359622 未加载
评论 #22359597 未加载
downerendingover 5 years ago
Impressive, but I&#x27;d be more impressed if they fixed their random shuffle.
评论 #22360278 未加载
评论 #22359838 未加载
评论 #22360933 未加载
评论 #22360005 未加载
评论 #22361811 未加载
stilisstukover 5 years ago
No tech crunch... You can&#x27;t have my cookies..
fs111over 5 years ago
why is this link doing a redirect through some ad network?
评论 #22359868 未加载
评论 #22359922 未加载
评论 #22359506 未加载
swagonomixxxover 5 years ago
This is interesting, but what I actually find even more interesting than this is Spotify continuing it&#x27;s usage of Google Cloud products even after being acquired by Microsoft. Can anyone shed some light as to why this is the case? Has that acquisition not been a &quot;traditional&quot; MS acquisition?
评论 #22362062 未加载
评论 #22368697 未加载
评论 #22362067 未加载
评论 #22362068 未加载
评论 #22362061 未加载
评论 #22362052 未加载