The magic of small databases

192 pointsby topcat31over 2 years ago

25 comments

btownover 2 years ago

Buried in here is a fascinating musing on "Market-making Small Databases" - "Imagine a Substack for databases - an easy tool for creating, maintaining and publishing databases with the ability to restrict parts or all of it behind a pay wall. Pair it with the ability to send email updates to your audience about changes and additions..." It's worth a read in full in the original article.One of my favorite small databases is <a href="https://hiregoats.com/" rel="nofollow">https://hiregoats.com/</a> - it's a simple site showing goat herds for rent (for clearing brush in a sustainable way, etc.), monetized with at $35 listing fee and nothing else. There's no e-commerce, no attempt to insert the site into the transaction or funds flow, no bells and whistles. Certainly this doesn't scale to other niches where suppliers are less incentivized to pay a listing fee, but I'd love to see this kind of thing be more common, and incentivize people to curate.

评论 #34561001 未加载

评论 #34565058 未加载

dmjeover 2 years ago

I run a little agency in the UK who works with museums to help them with digital. A large part of this is getting collections online.Some years ago we commissioned a developer to make CultureObject[0], a free and open source WordPress plugin to make it easier to ingest collections data for display on the web. At the heart it's a glorified data importer, and many people just use the CSV mode to sync and import collections data.It requires some dev effort - we've built an add-on which makes this easier but there's no denying that search, faceting and display needs knowledge of wordpress development.Three years ago we then launched The Museum Platform[1] which is a more SaaS based model - we take away the need for dev skills and ask clients to just send us a CSV and any related media and we do the hard work. It's WordPress again but a modified version where we also facilitate storytelling and narrative around the ingested collections.The interesting thing about this journey is that the requirement to "get a collection online" is apparently and theoretically easy. But the reality is it gets hard quite quickly as the need for search / filtering appears, and it gets harder still as scale comes into it. 1000 records is fine. 100,000 gets quite a bit harder.There are also many subtleties - particularly with museum collections. "Location" of a record could be where it was collected, or where it is now, or where it's on display. Relational stuff is hard, as are taxonomies and authority terms. It's hard to generalise and it's hard to scale.[0] <a href="https://cultureobject.co.uk/" rel="nofollow">https://cultureobject.co.uk/</a> [1] <a href="https://themuseumplatform.com/" rel="nofollow">https://themuseumplatform.com/</a>

评论 #34566382 未加载

评论 #34563544 未加载

评论 #34567730 未加载

评论 #34564412 未加载

评论 #34566212 未加载

breckover 2 years ago

I'm going to plug our related project: TreeBase. It's the public domain software that powers PLDB.com (a Programming Language DataBase).It's very simple. If your small database was about cars, your structure might look something like this:<pre><code> database/ grammar/ engine.grammar interior.grammar things/ model3.car camry.car </code></pre> The `grammar` files are written in a Tree Language called Grammar. Those are your schema files. You basically create a new syntax-free plain text "language" for storing your data, in this case 1 "car" file per model of car.It was a pipedream of mine until the M1's came out. Those changed everything, because then it became fast enough to actually do it.We have a new release coming out soon with a new query language that will change everything. Here is the source code: <a href="https://github.com/breck7/jtree/tree/main/treeBase">https://github.com/breck7/jtree/tree/main/treeBase</a>

LunarAuroraover 2 years ago

There are categories of “Nocode” online services that could work, more or less, as small databases. Some are already cited in the article:- DBs platforms (Best for more advanced DB) : Airtable, getgrist.com- wikis+DB platforms (Best for building a site around the DB) : notion.so, coda.io- Airtable/GSheet publishing (Best for simple/custom UI) : glideapps.com, siteoly.com- Bookmarks/Collections (Best for links/References) : Zotero (online groups), are.na- List sharing (Best for open collaboration?) : listium.com, (ranker.com ?)- BI platforms (Best for advanced filters/charts) : polymersearch.com, Google Data Studio- Data Set Hosting (Best for downloading?) : data.world, kaggle.comAll these allow publishing, and some collaboration

评论 #34561393 未加载

xnxover 2 years ago

"Publishing documents to the web is a well-served use case but publishing small indexes, databases and collections to the web is still an incredibly frustrating and under-served use case. Here I outline why I think it matters and a variety of approaches to solving it."Amen. I'm surprised the post doesn't mention sqlite3 WASM/JS (<a href="https://sqlite.org/wasm/doc/trunk/about.md" rel="nofollow">https://sqlite.org/wasm/doc/trunk/about.md</a>). That, paired with an easy-to-use faceting library, would go a long way.

ZephyrBluover 2 years ago

I love this. I've been thinking about something similar lately. There are so few good indexes and search engines for niche collections of data.Imagine if there was a niche search engine for everything, and the search engine was customized for that niche.I think the main problems here are:- Data format and ingestion - Domain-specific indexing/relevanceMost data is super messy and it not accessible through nice APIs, which presents a problem. You might need custom ingestion for each niche and it's pretty likely you'll need some rules to standardize data from multiple sources, neither of which seems easy to generalize and automate because they're very domain-specific.The other part to this is indexing/relevance so the search feels good to use. Some fields are obviously going to be more important than others and people are going to want to utilize search for things that are to predict ahead of time.To use the authors example of artists in Brooklyn, people might want to search for artists near them. Now you have to gather location data, format it, ingest it, index it and add it to the search UI.The fact that adding another field to index on is a vertical integration adds a lot of overhead.All of this stuff in isolation is not difficult, but when you put it together it becomes quite a lot of work that generally isn't easily scalable.

itsmemattchungover 2 years ago

Reminds of Amazon EBS and a white paper describing the philosophy of deploying millions of tiny databases:<a href="https://assets.amazon.science/c4/11/de2606884b63bf4d95190a3c2390/millions-of-tiny-databases.pdf" rel="nofollow">https://assets.amazon.science/c4/11/de2606884b63bf4d95190a3c...</a>

zokierover 2 years ago

Personally I find the whole dBase etc non-SQL kinda-graphical database systems interesting historical software branch that feels mostly died out these days. Access probably did quite a lot of damage here, killing out competitors before mostly succumbing itself.

评论 #34560971 未加载

评论 #34561042 未加载

simongrayover 2 years ago

This post is an exercise in describing the motivation and features of the Semantic Web seemingly without realising the tech stack already exists.

评论 #34561289 未加载

moehmover 2 years ago

For what it's worth, here is my "small database" attempt, a structured list of worthwhile Wikipedia articles to read.<a href="https://www.mostdiscussed.com" rel="nofollow">https://www.mostdiscussed.com</a>

overgardover 2 years ago

People would love this for sports. There's so much interesting data locked up in proprietary databases

roncesvallesover 2 years ago

I'm aware that this may sound dismissive but the solution that the author of the OP is looking for is the World Wide Web itself.The "small database" in question is, well, an HTML page. It can be shared and passed around by selecting the portions of it that you need and pressing Ctrl+C/Ctrl+V. Search is accomplished by the browser using Ctrl+F. Collaboration can take many forms - wikis, comments, forums, live editing. Links between databases are what URL links are. The database that OP is looking for is a page of text (for unstructured data) or somewhat structured solutions like CSV, JSON, or YAML.Now, yes, there are certain participants on the WWW who make poor web design choices that cause agreed-upon functionality to break. E.g. unnecessary pagination or accordions breaking Ctrl+F, not offering data for download, not having useful URL paths etc.

Zababaover 2 years ago

I like the idea, but I think one issue is that the database is the easy part. If I look again at the list of requirements, most are not about the database but about how to put data from external source in the database, how to edit the database, and how to publish it. To me this sounds like an interface problem. But since the whole point is small, specialized collections, interfaces have to be specialized too. That means no single tool that can offer a solution. Maybe it's an issue of definition, I call a database something like MySQL or SQLite or even a CSV file, while for the author it's the finished product, the database about <stuff> and the tools that are adapted to <stuff>.Substack is an interesting example. It's great for written content with a few images, which mostly looks the same everywhere. But it lacks great customisation features that I think a database would need, because that stuff is hard to do.If I had to propose a solution, it would be this: if you want to do a small database, do it. Experimentation in the cyberspace is very cheap. These days you have lots of resources for everything online. It can be intimidating, and can lead to analysis paralysis. I'm supposed to be a professional developer and still struggle with that. But one thing that has helped me a lot recently is to try stuff, see if it works, if it fails, ask questions (to either real people or ChatGPT/Copilot, Copilot is especially valuable to get in a "just keep writing, editing comes later" mood). It's not always fun, in fact it can be quite frustrating, but that's how things are.In the end, this is about decentralisation and you can't have proper decentralisation if you don't also decentralise the skills, the know-how. For example, there has been a lot of talk about Mastodon as a decentralised alternative to Twitter. And it is one. But if you simply go from being a user on Twitter to being a user on Mastodon, well you don't regain much control. On the other hand if you try running a small instance, even just a local instance to see how it works, or maybe add a few feature to your preferred client (it can be code, but it could also be helping translation, or maybe a color scheme (you wouldn't believe how many color scheme are barely usable when you're colorblind)), well then you start being in control.

dgudkovover 2 years ago

Small databases aren't popular because Excel spreadsheets already occupy that niche. A small database doesn't have to be normalized. Because it's small, it can be denormalized into a flat table that can be conveniently handled in Excel.

评论 #34563264 未加载

topcat31over 2 years ago

Hey OP here, just wanted to say thanks for all the comments (goats and all). There's lots I still need to learn about (actual) databases as a hobby developer...In the meantime I've made a big update to the Airtable with links to tools, examples and further reading:<a href="https://airtable.com/shrYY94GrqVB4HUsi/tblHPrdomiPbLpod6/viwxizssDJMsGqhg9?backgroundColor=green&blocks=hide" rel="nofollow">https://airtable.com/shrYY94GrqVB4HUsi/tblHPrdomiPbLpod6/viw...</a>

marniewebbover 2 years ago

H2O — <a href="https://h2o.law.harvard.edu/" rel="nofollow">https://h2o.law.harvard.edu/</a> — is a now-defunct collaborative syllabus project from Harvard that gets at a lot of this I think. It’s basically a list maker with a lot of additional capabilities. While it’s made for small list of things it’s easy to imagine this is a piece of the solution.

082349872349872over 2 years ago

A search for "filemaker" reveals that Claris is still in business; I'd hope they'd have something that might address this need?

hardwaresoftonover 2 years ago

Weirdly enough I haven’t seen too much mention of CMSes — them plus/minus spreadsheet like tools are almost surely the way to handle this kind of use case.What’s missing is the added search + UI capabilities.I think about saas ideas a lot and this is actually quite a common one (though I’m generally thinking of a specific niche) —- enabling people to craft and expose datasets would surely be a great startup.

jerryuover 2 years ago

Having a small database is especially useful when collaborating on data strategy. I have seen some database diagrams with 1000s of tables and it is hard to make sense of it using ERD tools.Even with advanced views offered by tools like ERDLab.io it is a pain in the ass to collaborate on large schemas at various stages of development.

cavisneover 2 years ago

I feel like this is getting really close. GPT is create at writing sql queries from text and turning a blob of semi structured data into an sql schema.We just need to somehow tie it together so anyone can explain their use case, and show an example of the data in plain english, then lock in a schema and feed everything in.

aabbcc1241over 2 years ago

For collection of links with short description for projects/services, there are many awesome list on github.For more complex data to be shared, maybe it can be csv/md/mdx shared over git as well?It can have stable url and be searchable from github, search engines, and 3rd indicies

vaporupover 2 years ago

<a href="https://zed.brimdata.io" rel="nofollow">https://zed.brimdata.io</a>

maphewover 2 years ago

Makes me think of what something like Datasette fused with Fossil SCM could accomplish.

Trayja-Peterover 2 years ago

"I want to empower more individuals to publish, maintain and collaborate on small indexes. To build a million tiny libraries, community databases, weird collections and indie indexes."Funnily enough, a friend and I have been building <a href="https://Trayja.com" rel="nofollow">https://Trayja.com</a>, a tool which does this exact thing, with a focus on the "community" aspect. There's a huge amount of wisdom in communities, whose value could be multiplied if it would be aggregated in a structured, indexable, searchable way. This article articulated so much of what I've been trying to explain about my project.

LAC-Techover 2 years ago

With how fast computers are now, they can work well for small businesses too.