Hey HN! I’m John from xhr.dev (<a href="https://xhr.dev" rel="nofollow">https://xhr.dev</a>). At xhr.dev, I’m building tools
for reverse engineering websites, and our initial product is a one-line code
integration that enables bot detection avoidance via a forward proxy.<p>I’ve been creating integrations for websites that lack official APIs for four
years; initially using Puppeteer automation, and later through XHR requests.<p>Many people are familiar with web scraping, but fewer know about scraping via
XHR requests. In fact, XHR requests are my preferred method for scraping
because they allow you to build reliable and performant integrations into sites
that either lack official APIs or restrict their use. I’ve found that building
“unofficial” integrations using the XHR method is far more reliable than
traditional web scraping approaches. Here’s why:<p><pre><code> • Modern Websites: Many are built with frontend frameworks that load data asynchronously from backend APIs.
• Undocumented APIs: These backend APIs are as robust as official APIs but are left undocumented to the public (though the company knows them well
internally).
• Reliable Integration: You can hook into these backend APIs to create dependable integrations.
• Performance: Integrations built via XHR are much more reliable and performant than generic web scraping tools like Selenium, Puppeteer, or Playwright.
</code></pre>
While developing integrations using the XHR method, I encountered a significant
challenge: anti-bot software like Cloudflare can easily detect that your
requests aren’t coming from a browser and block them. These tools are highly
effective at fingerprinting your requests.<p>Many developers might try the XHR method by copying requests from the Chrome
DevTools Network tab as cURL commands, only to receive a surprising 403 error.
This happens because Cloudflare excels at identifying non-browser requests.<p>At my previous company, I built all integrations using the XHR method but found
that more sophisticated websites were protected by anti-bot software like
Cloudflare. I experimented with existing solutions (including numerous supposed
Cloudflare bypasses on GitHub and paid services like Zenrows, Scrapingbee,
Oxylabs, and Brightdata) but found that they either didn’t work or required
unnecessarily complex integrations (e.g., request headers were not
transparently passed through to the target server resulting in incorrect
responses, or response header cookies were not sent back to the client, and on
and on with painful and unreliable edge cases such as these).<p>This led me to develop xhr.dev’s initial product: a magic proxy that offers
anti-bot avoidance with a one-line code integration.<p>How This Can Help You:<p><pre><code> • Reliable Scraping via XHR: If you’re scraping via XHR, this tool allows you to hook into a platform’s backend, making it very difficult for the backend to detect that you’re scraping or not a real person.
• Unblock When Blocked: If you get blocked, it will unblock you.
• Captcha Auto-Solving: If you’re new to scraping and encounter various anti-bot methods like CAPTCHAs, it can automatically solve them for you.
</code></pre>
What I’m Looking For:<p><pre><code> • Customers with Web Scraping Use Cases: Who can provide valuable product feedback.
• Feedback on Product Shape: Comments on the current form and functionality of the product.
• Insights on Scraping Challenges: Understanding other problems people face when scraping and what solutions they would be willing to pay for.
</code></pre>
You can view our historical performance on our status page (<a href="https://status.xhr.dev" rel="nofollow">https://status.xhr.dev</a>).<p>I hope you give it a try!<p>ty v much, john<p>(also - this is my 2nd ShowHN post - first time around I reposted it - oops!
This one is left completely organic)