As long as these PDFs are exposed publicly (and linked to, which a tweet with or without #pdftribute will take care of), they will mostly be indexed by Google Scholar, which does a decent job of extracting metadata using heuristics etc.<p>Of course, it would be much better if people started embedding machine-readable metadata in PDFs (totally possible, see for example <a href="http://code.google.com/p/pdfmeat/" rel="nofollow">http://code.google.com/p/pdfmeat/</a>), and if there was some agreed-upon format for bibliographic microformats, that could be embedded in websites listing articles.<p>We also eventually need an open alternative to Google Scholar. GS is great, and I use it every day (and love that you can output BibTex for example), but it has no API (and will never have one because of deals with publishers), actively resists automatic access, is a black-box in terms of how data is gathered, etc. Think of "Open Scholar" to Google Scholar as analogous to OSM vs GMaps. OSM might not look as pretty, or be as consistent in the beginning, but it enables a whole range of applications that GMaps doesn't. (And at least GMaps does have a fairly good API, even if it charges for overuse, GS has nothing).<p>(These are just some thoughts I've made, as I've been experimenting with an open scholar workflow, trying to share as much of the "byproduct" of the research, including rich notes and summaries, my own bibliography with links to OA pubs where they exist etc: <a href="http://reganmian.net/wiki/researchr:start" rel="nofollow">http://reganmian.net/wiki/researchr:start</a>).<p>Another thing I've found working on my project, where I try to expose OA links to as many pubs as possible, and regularly rescan to see if they are still available (and still OA), is how quickly documents disappear... Hosting on private pages is convenient, but fragile. Ideally, people would upload papers to university repositories, subject repositories like Arxiv.org, etc.