There have been a few times I wanted the ability to select some text out of a Markdown doc. For example, a GitHub CI check to ensure that PRs / issues / etc are properly formatted.<p>This can be done to some extent with regex, but those expressions are brittle and hard to read or edit later. mdq uses a familiar pipe syntax to navigate the Markdown in a structured way.<p>It's in 0.x because I don't want to fully commit to the syntax being stable, in case real-world testing shows that the syntax needs tweaking. But I think the project is in a pretty good spot overall, and would be interested in feedback!
> GitHub PRs are Markdown documents, and some organizations have specific templates with checklists for all reviewers to complete. Enforcing these often requires ugly regexes that are a pain to write and worse to debug<p>This is because GitHub is not building the features we need, instead they are putting their energy towards the AI land grab. Bitbucket, by contrast, has a feature where you can block PRs using a checkbox list outside of the description box. There are better ways to solve this first example from OP readme. Cool project, I write mainly MDX these days, would be cool to see support for that dialect
Ironically one of the reasons markdown (and other text based file formats) were popular because you could use regular find/grep to analyze it, and version control to manage it.
Kind of aligned with this is MarkdownDB, providing an SQLite backend to your Markdown files [0]. Cool to see this, I feel the structure of .md files is not always equally respected or regarded as a data serialisation target.<p>[0]: <a href="https://markdowndb.com/" rel="nofollow">https://markdowndb.com/</a>
I think you'd benefit of having some more real-world-ish examples in the README, as someone who doesn't intuit what I'd want to use this for.
Please don’t reimplement JQ. That problem is already solved. Instead, just provide a tool that can convert your target syntax into JSON, then it can be piped to JQ for querying.
Cool thanks for sharing! I'll have to check this out. I've wanted something similar.<p>After trying a bunch of the usual ones, the only "notes system" I've stuck with is just a directory of markdown files that's automatically committed to git on any change using watchexec.<p>I've wanted to add a little smarts to it so I could use it to track tasks (eg. sort, prune completed, forward uncomplete tasks over to the next day's journal, collect tasks from "projects", etc.) so I started writing some Rust code using markdown-rs. Then, to round-trip markdown with changes, only the javascript version of the library currently supports serializing github flavored markdown. So then I actually dumped the markdown ast to json from rust and picked it up in js to serialize it for a proof of concept. That's about as far as I got so far. But while markdown-rs saves position information, it doesn't save source token information (like, * and - are both list items) so you can't reliably round-trip.<p>FWIW, the other thing I was hoping to do was treat markdown documents as trees (based on headings) use an xpath kind of language to pull out sections. Anyway, will check out your code, thanks for posting.
Interesting; one thing you may have learned researching existing tools and libraries: many of them serialize markdown to html before running structured extraction/manipulation - even stuff like converting to pdf.<p>The core assumption here is that Markdown was/is designed to be serializeable to html - this is why a markdown document/AST is mostly <i>not</i> a tree structure, for tree-ish elements such as sub-sections. Instead, it is flat, an array of elements in order of appearance in the document. Apparently this most closely matches the structure of html, at both the block and inline levels. Only Lists and Blockquotes (afair) support nesting.<p>Ex: h1 -> paragraph -> h2 -> paragraph is not nested, it is an array of four ordered elements.<p>Anyway, you might throw a task at Cursor or Copilot to see how an equivalent implementation using html fares against your test suite, you may be able to develop more quickly.
Thanks for sharing! No immediate use-case <i>for me</i> right now, but good to know something like this exists.<p>I wanted to point out little nitpicks for the documented shell invocations:<p><pre><code> cat example.md | mdq '# usage'
</code></pre>
This can be changed into a stdin file redirect to avoid invoking an extra `cat` process (see Useless use of cat [1]):<p><pre><code> mdq '# usage' < example.md
</code></pre>
In a similar fashion, you can avoid an extra `echo` process here:<p><pre><code> echo "$ISSUE_TEXT" | mdq -q '- [x] I have searched for existing issues'
</code></pre>
by changing to this:<p><pre><code> mdq -q '- [x] I have searched for existing issues' <<< "$ISSUE_TEXT"
</code></pre>
[1]: <a href="https://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat" rel="nofollow">https://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat</a>
I worked on a project converting word docs to markdown so they could more easily be ingested into an LLM, one issue was that context windows used to be very short, so we would basically split on `\n#` to get sections, but this turns into a whole thing where you have to make guesses about which header level is appropriate to split at, and then you turn each section into a separate chunk in FAISS. Anyways we ended up using HTML instead of MD but theres so much tooling for traversing HTML and not MD. This would have been helpful for that
I've always wanted a "literate programming" / jupyter-style notebook based on markdown. Maybe this could help make something like that possible.
congrats on your tool, will check it out.
I have a side question on markdown: cursor messes up markdown generation quite often for me. I think its responses are always in markdown with sections for code and asking it to generate markdown breaks it. So the question: any ideas on how to have cursor generate markdown?