TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How Does Similarweb Work?

5 点作者 gomes33超过 2 年前
I&#x27;m a bit puzzled on how its possible to have data like https:&#x2F;&#x2F;www.similarweb.com&#x2F;corp&#x2F;ourdata&#x2F; has?<p>They say that their data comes from &quot;Partnerships&quot;:<p>&quot;Rich data pre-analyzed by global partners like DPSs, ISPs, measurement companies and corporate intelligence firms&quot;<p>What is a &quot;corporate intelligence firm&quot;? What is a &quot;DPS&quot;? How can ISP&#x27;s provide the data?<p>And from &quot;Website &amp; Apps Owners&quot;:<p>Data directly measured through first party analytics (e.g. Google Analytics) of millions of websites and apps.<p>But how? If they don&#x27;t have access to a GA account, how do they know? Is if the website uses some third party asset (js, css, etc..)?

2 条评论

mtmail超过 2 年前
It&#x27;s in their interest to be vague but at the same time claim lots of source.<p>Some United States ISP sell anonymous browsing data. The data is still grouped by home connection. With SSL these days that only contains domain names but in the past it contained full URL. One was able to correlate if somebody searched for a product on shop A but then finished checkout on shop B. 15 years ago I dealt with such data, kind of scary. So you can correlate that people who visit one domain regularly also visit certain others.<p>DPS sounds like data processing, so intermediary that resells data or summaries. For example they might have data from browser toolbars, widgets on multiple websites or anybody else who sells user data.<p>Google Analytics: when you crawl pages you can extract the GA id and some companies use the same id on multiple domain. Thus you correlate they have the same owner. Similar with any other type of id or apikey one might use on the website, e.g. Google Maps API key.<p>Add some data on domain-to-IP address to see if a two websites are hosted on the same server.<p>&gt; How can ISP&#x27;s provide the data?<p>In the US it&#x27;s part of the terms of service<p><a href="https:&#x2F;&#x2F;www.netzero.net&#x2F;start&#x2F;landing.do?page=www&#x2F;legal&#x2F;yourprivacyrights" rel="nofollow">https:&#x2F;&#x2F;www.netzero.net&#x2F;start&#x2F;landing.do?page=www&#x2F;legal&#x2F;your...</a><p>&quot;we have collected the following categories of personal information from its consumers within the last twelve (12) months: &quot;<p>- &quot;Age (40 years or older), marital status (title), sex (including gender, gender identity, gender expression, pregnancy or childbirth and related medical conditions).&quot;<p>- &quot;Browsing history, search history, information on a consumer&#x27;s interaction with a website, application, or advertisement.&quot;<p>- &quot;Profile reflecting a person&#x27;s preferences, characteristics, psychological trends, predispositions, behavior, attitudes, intelligence, abilities, and aptitudes.&quot;<p>&quot;We share your personal information with the following categories of third parties:<p><pre><code> Service providers. Advertisers. Affiliates. Partners.&quot; </code></pre> I&#x27;d argue all information an ISP has no business with.
评论 #33608902 未加载
altdataseller超过 2 年前
The vast majority of their data comes from browser extensions that track the URLs users visit. The other stuff like ISPs, DSPS, etc are there to make you think they have some sophisticated model with a diverse # of sources (don&#x27;t fall for it, they don&#x27;t).<p>If Google ever prevents browser extensions from tracking your every visit, their business is in extreme trouble. Full Stop.
评论 #33626396 未加载