科技回声

6 条评论

physicsguy将近 4 年前

I found Zenodo to be the best out there by far in terms of ease of use and a decent amount of default storage (50gb). We have to upload data by law in the UK due to funding requirements by the research councils and the Universities own offerings are normally pitiful.

评论 #27379231 未加载

michaelhoffman将近 4 年前

We recently wrote a commentary describing how to share biological data and describing some of the specialist and generalist repositories where one might do so:<a href="https://febs.onlinelibrary.wiley.com/doi/10.1002/1873-3468.14067" rel="nofollow">https://febs.onlinelibrary.wiley.com/doi/10.1002/1873-3468.1...</a>

评论 #27381777 未加载

评论 #27381669 未加载

bluenose69将近 4 年前

In addition to sites such as those listed, Universities often provide institution-level data hosting. So long as the university doesn't go under, they ought to be stable. An advantage is that there are local people who can help the researchers with the process, e.g. in setting up useful metadata and so forth.I worry a bit about people just dumping data into large repositories, without thinking much about the format or the later uses, but only focussing on a checklist that needs to be ticked off to get that precious bean (publication) for the bean-counters (deans).

评论 #27381462 未加载

评论 #27379994 未加载

评论 #27379296 未加载

clickok将近 4 年前

None of these solutions are ideal, although Zenodo's better than most. As far as I can tell, they're all targeted more towards the final, authoritative release, so it seems you're still out of luck during the paper writing process. What if I'm just trying to share a dataset/pre-trained model with remote collaborators?I ran into this when doing some OCR experiments[1], finding acquiring data and pre-trained models to be the most time-consuming part of the enterprise. This ended up adding enough additional hassle that I didn't manage to get anything really interesting going, although figuring out how to containerize other peoples' code was educational. Personally, I think I'll be relying on some combination of institutional repositories + torrents/IPFS for any large datasets/models I end up releasing in the future.-----1. <a href="https://github.com/rldotai/ocr-experiments" rel="nofollow">https://github.com/rldotai/ocr-experiments</a>

mirker将近 4 年前

In my opinion, data should be mirrored on a torrent in addition to institutional servers (which can also provide checksums). Torrents offload the bandwidth problems from the institute to users and stay up if people use the data.But that probably won’t happen because torrents are a dirty word due to illegal activity and they also give up control of the data.

评论 #27383985 未加载

medstrom将近 4 年前

TLDR: re3data and FAIRsharing appear to be registries of "where can you find this repository", in case they change URI, I guess. Not so much for finding specific datasets, just hosters?I noticed many of these repos are javascript-walled. Is there any kind of standard API through which you can search for repos and fetch datasets?

Scientific Data Repositories

6 条评论

Scientific Data Repositories

6 条评论