TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Scientific Data Repositories

120 点作者 the-mitr将近 4 年前

6 条评论

physicsguy将近 4 年前
I found Zenodo to be the best out there by far in terms of ease of use and a decent amount of default storage (50gb). We have to upload data by law in the UK due to funding requirements by the research councils and the Universities own offerings are normally pitiful.
评论 #27379231 未加载
michaelhoffman将近 4 年前
We recently wrote a commentary describing how to share biological data and describing some of the specialist and generalist repositories where one might do so:<p><a href="https:&#x2F;&#x2F;febs.onlinelibrary.wiley.com&#x2F;doi&#x2F;10.1002&#x2F;1873-3468.14067" rel="nofollow">https:&#x2F;&#x2F;febs.onlinelibrary.wiley.com&#x2F;doi&#x2F;10.1002&#x2F;1873-3468.1...</a>
评论 #27381777 未加载
评论 #27381669 未加载
bluenose69将近 4 年前
In addition to sites such as those listed, Universities often provide institution-level data hosting. So long as the university doesn&#x27;t go under, they ought to be stable. An advantage is that there are local people who can help the researchers with the process, e.g. in setting up useful metadata and so forth.<p>I worry a bit about people just dumping data into large repositories, without thinking much about the format or the later uses, but only focussing on a checklist that needs to be ticked off to get that precious bean (publication) for the bean-counters (deans).
评论 #27381462 未加载
评论 #27379994 未加载
评论 #27379296 未加载
clickok将近 4 年前
None of these solutions are ideal, although Zenodo&#x27;s better than most. As far as I can tell, they&#x27;re all targeted more towards the final, authoritative release, so it seems you&#x27;re still out of luck during the paper <i>writing</i> process. What if I&#x27;m just trying to share a dataset&#x2F;pre-trained model with remote collaborators?<p>I ran into this when doing some OCR experiments[1], finding acquiring data and pre-trained models to be the most time-consuming part of the enterprise. This ended up adding enough additional hassle that I didn&#x27;t manage to get anything really interesting going, although figuring out how to containerize other peoples&#x27; code was educational. Personally, I think I&#x27;ll be relying on some combination of institutional repositories + torrents&#x2F;IPFS for any large datasets&#x2F;models I end up releasing in the future.<p>-----<p>1. <a href="https:&#x2F;&#x2F;github.com&#x2F;rldotai&#x2F;ocr-experiments" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;rldotai&#x2F;ocr-experiments</a>
mirker将近 4 年前
In my opinion, data should be mirrored on a torrent in addition to institutional servers (which can also provide checksums). Torrents offload the bandwidth problems from the institute to users and stay up if people use the data.<p>But that probably won’t happen because torrents are a dirty word due to illegal activity and they also give up control of the data.
评论 #27383985 未加载
medstrom将近 4 年前
TLDR: re3data and FAIRsharing appear to be registries of &quot;where can you find this repository&quot;, in case they change URI, I guess. Not so much for finding specific datasets, just hosters?<p>I noticed many of these repos are javascript-walled. Is there any kind of standard API through which you can search for repos and fetch datasets?
评论 #27379344 未加载