TLDR: Campbell's methodology is flawed, does not consider edge cases (one of which (equating apex-only and www-prefixed domains) I consider reckless), and didn't understand how Majestic collects and processes its data.<p>Longer version: This isn't comprehensive, but I think of two main reasons why:<p>- The Majestic Million lists only the registrable part (with some exceptions), and this sometimes lead to central CDNs being listed. For example, the Majestic Million lists wixsite.com (for those who are unaware is a CDN domain used by Wix.com with separate subdomains), but if you visit wixsite.com you wouldn't get anything. Same with Azure, subdomains of azureedge.net and azurewebsites.net do exist (for example <a href="https://peering.azurewebsites.net/" rel="nofollow">https://peering.azurewebsites.net/</a>) but azureedge.net and azurewebsites.net themselves don't exist. Without similar filtering, using the Cisco list (<a href="https://s3-us-west-1.amazonaws.com/umbrella-static/index.html" rel="nofollow">https://s3-us-west-1.amazonaws.com/umbrella-static/index.htm...</a>) would quickly lead you to this precise problem (mainly because the number one is "com", but phew at least <a href="http://ai./" rel="nofollow">http://ai./</a> does exist!)<p>- Also, shame on the author considering www-prefixed and apex-only as one and the same. For some websites, it isn't. Take this example: jma.go.jp (Japan Meteorological Agency), which doesn't respond (actually NODATA) on <a href="http://jma.go.jp/" rel="nofollow">http://jma.go.jp/</a> but is fine on <a href="https://www.jma.go.jp/" rel="nofollow">https://www.jma.go.jp/</a>. Similarly, beian.gov.cn (Chinese ICP Licence Administrator) wouldn't respond at all but <i>www</i>.beian.gov.cn will. And for ncbi.nlm.nih.gov (National Center for Biotechnology Information) ? I can't blame Majestic: <a href="https://www.ncbi.nlm.nih.gov/" rel="nofollow">https://www.ncbi.nlm.nih.gov/</a> and <a href="https://ncbi.nlm.nih.gov/" rel="nofollow">https://ncbi.nlm.nih.gov/</a> don't redirect to a canonical domain, and unless you've compared the HTTP pages there's no way you would know that they are the same website!<p>Edit: I've downloaded out the CSV to check my claims, and it shows:<p><pre><code> wixsite.com 0
beian.gov.cn 0
</code></pre>
Please, for the love of sanity, consider what the Majestic Million (and similar lists) criterion on inclusion. I can't believe it to say, but can we crowd-source "Falsehoods programmers believe about domains"?<p>Also addendum to crawling but I consider "probably forgivable":<p>- Some websites are only available in certain countries (internal Russian websites don't respond at all outside Russia for example). This can skew the numbers a little bit.