Ask HN: Is there any decent API to download a paper given its name?

18 pointsby hexomanceralmost 3 years ago

I am developing a PDF viewer designed for reading research papers[1]. One very useful feature that I would like to add is the ability to directly download and open a paper just by clicking on its name in the PDF file. I have implemented a version of this using PyPaperBot [2] which is not bad, but it is not as fast as I would like it to be (it uses scihub which doesn't have the best servers).By parsing the html from google scholar, I wrote a python script that does this which was very fast and was working perfectly, however, after using it for a couple of minutes (maybe about 10-15 requests), I can no longer query google scholar using python requests (the returned html is a request for captcha). It appears that google disallows any programmatic use of google scholar (even though this was not spammy at all, the user has to manually click on a paper to send a request to google scholar).Anyway I was wondering if there is any decent and free API to get the url of a paper given its name, I have found a couple of paid ones but they are way too expensive.[1] https://sioyek.info[2] https://github.com/ahrm/sioyek-python-extensions#-paper_downloader

4 comments

wilsonnb3almost 3 years ago

> it uses scihub which doesn't have the best serversyou could always download the scihub backup torrents from libgen and host them yourself somewhere. its probably like 100TB of data by now though so this isn't really a cheap approach.

评论 #32733148 未加载

boredemployeealmost 3 years ago

Nice! I would easily use your tool for the next 2 years (my research deadline). I have a question tho. How would you bypass the fact that you need to be under some university vpn or special login to download some articles?

评论 #32725801 未加载

ALittleLightalmost 3 years ago

One thing you could try is downloading the papers before the user requests them so that they will be ready instantly for a user's request.Example: User opens paper on page 1. Page 1 has citations to 3 other papers. Your tool instantly begins downloading the other 3 papers. User goes on to Page 2 which cites 1 other paper. You begin downloading the new paper. User clicks on the citation and your tool now has the linked paper already downloaded and ready to open.Might be a bit of an imposition on sci-hub though.

评论 #32728710 未加载

ttpphdalmost 3 years ago

Not sure off the top of my head but it's possible that crossref DOI lookup has an API you can use for this.<a href="https://openapc.github.io/general/openapc/2018/01/29/doi-reverse-lookup/" rel="nofollow">https://openapc.github.io/general/openapc/2018/01/29/doi-rev...</a>