I built Sitetruth.com to try to solve that problem. I'm going to shut it down soon.<p>The goal of SiteTruth was to try to find the real-world business behind a web site, and look up information about the business, such as how long it has been in business and its annual revenue. That's become harder and harder over the last decade.<p>First, it's now acceptable to have an online business with no real-world address and no visible legal existence. This is illegal in the European Union and illegal in California if the business accepts payments, but enforcement is nonexistent.<p>Second, more sites have become inaccessible to scraping by servers. I have a system which looks for a human-readable business address on a site. It looks in the obvious places (front page, "about", "legal", "terms", "contact", etc.) and quits after trying the 20 most likely pages. It uses a honest agent ID ("Sitetruth.com site verification system", registered with the now meaningless "bots vs browsers" list) and obeys robots.txt. A sizeable fraction of the time, it can't read the site at all.<p>Third, the data sources for company information have been becoming less accessible. There used to be two reliable data sources: Hoovers, and Dun and Bradstreet. They merged. Dun and Bradstreet for a while became rather corrupt. They licensed a company in Santa Monica, CA to use their name, and sent the small-business part of the business to them. This unit's marketing approach was "Nice credit rating. Be a shame if it something happened to it". After much litigation, DnB HQ bought the Santa Monica company, but the reputational damage was done and DnB is no longer the gold standard of company information. There are lower tier data sources (look up "US Business List"), but the data quality is poor. Anything based on user recommendations, like Yelp, gets spammed, so that's out. Yahoo Directory, which was reasonably spam free, is gone.<p>Fourth, the SSL cert industry became corrupt. OV standards were never very high, and EV standards started slipping. Then there was the Cloudflare problem. Cloudflare is a certificate authority, and they issue certs to themself for domains which run through Cloudflare. So looking up a cert just gets Cloudflare's info.<p>Fifth, Google is making it harder and harder to have Chrome plug-ins that critique their ads. I dropped Chrome support recently, and only have a Firefox add-on at this point.<p>So, after fifteen years, Sitetruth is coming to an end.