科技回声

Hey HN! We’re Jai and Sanket, co-founders of DeepSource (YC W20). We're open-sourcing Globstar (<a href="https://github.com/DeepSourceCorp/globstar">https://github.com/DeepSourceCorp/globstar</a>), a static analysis toolkit that lets you easily write and run custom code quality and security checkers in YAML [1] or Go [2].After 5+ years of building AST-based static analyzers that process millions of lines of code daily at DeepSource, we kept hearing a common request from customers: "How do we write custom checks specific to our codebase?" AppSec and DevOps teams have a lot of learned anti-patterns and security rules they want to enforce across their orgs, and being able to do that without being a static analysis expert, came up as an important want.We initially built an internal framework using tree-sitter [3] for our proprietary infrastructure-as-code analyzers, which enabled us to rapidly create new checkers. We realized that making the framework open-source could solve this problem for everyone.Our key insight was that writing checkers isn't the hard part anymore. Modern AI assistants like ChatGPT and Claude are excellent at generating tree-sitter queries with very high accuracy. We realized that the tree-sitters' gnarly s-expression syntax isn’t a problem anymore (since the AI will be doing all the generation anyway), and we can instead focus on building a fast, flexible, and reliable checker runtime around it.So instead of creating yet another DSL, we use tree-sitter's native query syntax. Yes, the expressions look more complex than simplified DSLs, but they give you direct access to your code's actual AST structure – which means your rules work exactly as you'd expect them to. When you need to debug a rule, you're working with the actual structure of your code, not an abstraction that might hide important details.We've also designed Globstar to have a gradual learning curve: The YAML interface works well for simple checkers, and the Go Interface can handle complex scenarios when you need features like cross-file analysis, scope resolution, data flow analysis, and context awareness. The Go API gives you direct access to tree-sitter bindings, so you can write arbitrarily complex checkers on day one.Key features:- Written in Go with native tree-sitter bindings, distributed as a single binary- MIT-licensed- Write all your checkers in a “.globstar” folder in your repo, in YAML or Go, and just run “globstar check” without any build steps- Multi-language support through tree-sitter (20+ languages today)We have a long way to go and a very exciting roadmap for Globstar, and we’d love to hear your feedback![1] <a href="https://globstar.dev/guides/writing-yaml-checker" rel="nofollow">https://globstar.dev/guides/writing-yaml-checker</a>[2] <a href="https://globstar.dev/guides/writing-go-checker" rel="nofollow">https://globstar.dev/guides/writing-go-checker</a>[3] <a href="https://tree-sitter.github.io/tree-sitter/" rel="nofollow">https://tree-sitter.github.io/tree-sitter/</a>

8 条评论

markrian3 个月前

Interesting! Do you have a page which compares globstar against other similar tools, like Semgrep, ast-grep, Comby, etc?For instance, something like <a href="https://ast-grep.github.io/advanced/tool-comparison.html#comparison-with-other-frameworks" rel="nofollow">https://ast-grep.github.io/advanced/tool-comparison.html#com...</a>.

评论 #43210742 未加载

xxpor3 个月前

Another rule engine checker that doesn't support the language that needs this type of thing the most: CIn this case, it's inexplicable to me since tree-sitter supports C fine.

评论 #43208954 未加载

评论 #43208767 未加载

micksmix3 个月前

One of the main benefits of Semgrep is its unified DSL that works across all supported languages. In contrast, using the Go module "smacker/go-tree-sitter" can expose you to differences in s-expression outputs due to variations and changes in independent grammars.I've seen grammars that are part of "smacker/go-tree-sitter" change their syntax between versions, which can lead to broken S-expressions. Semgrep solves that with their DSL, because it's also an abstraction away from those kind of grammar changes.I'm a bit concerned that tree-sitter s-expressions can become "write-only" and rely on the reader to also understand the grammar for which they've been generated.For example, here's a semgrep rule for detecting a Jinja2 environment with autoescaping disabled:<pre><code> rules: - id: incorrect-autoescape-disabled patterns: - pattern: jinja2.Environment(... , autoescape=$VAL, ...) - pattern-not: jinja2.Environment(... , autoescape=True, ...) - pattern-not: jinja2.Environment(... , autoescape=jinja2.select_autoescape(...), ...) - focus-metavariable: $VAL </code></pre> Now, compare it to the corresponding tree-sitter S-expression (generated by o3-mini-high):<pre><code> ( call function: (attribute object: (identifier) @module (#eq? @module "jinja2") attribute: (identifier) @func (#eq? @func "Environment")) arguments: (argument_list (_)* (keyword_argument name: (identifier) @key (#eq? @key "autoescape") value: (_) @val (#not-match @val "^True$") (#not-match @val "^jinja2\\.select_autoescape\\(")) (_)*) ) @incorrect_autoescape </code></pre> People can disagree, but I'm not sure that tree-sitter S-expressions as an upgrade over a DSL. I'm hoping I'm proven wrong ;-)

评论 #43214742 未加载

评论 #43214510 未加载

pdimitar3 个月前

Wow this looks great. I will be giving it a go VerySoon™!Looking forward to writing some enhanced linters.

etyp3 个月前

I really love that static analyzers are pushing in this direction! I loved writing Clippy lints and I think applying that "it's just code" with custom checks is a powerful idea. I worked on a static analysis product and the rules for that were horrible, I don't blame the customers for not really wanting to write them.Is there a general way to apply/remove/act on taint in Go checkers? I may not be digging deeply enough but it seems like the example just uses some `unsafeVars` map that is made with a magic `isUserInputSource` method. It's hard for me to immediately tell what the capabilities there are, I bet I'm missing a bit.

评论 #43211024 未加载

评论 #43210055 未加载

评论 #43211102 未加载

micksmix3 个月前

This is a really interesting project!I'd love to hear how this project differs from Bearer, which is also written in Go and based on tree-sitter? <a href="https://github.com/Bearer/bearer">https://github.com/Bearer/bearer</a>Regardless, considering there is a large existing open-source collection of Semgrep rules, is there a way they can be adapted or transpiled to tree-sitter S-expressions so that they may be reused with Globstar?

评论 #43214350 未加载

henning3 个月前

Is there a way to add a comment to disable the check rule similar to what you can do in ESLint to ignore a rule?

评论 #43208565 未加载

评论 #43208545 未加载

codepathfinder3 个月前

Nothing comes closer to CodeQL!If anyone is interested please checkout, codepathfinder.dev, truly opensource CodeQL alternative.Feedbacks are appreciated!

评论 #43210132 未加载

8 条评论

markrian3 个月前

评论 #43210742 未加载

xxpor3 个月前

Another rule engine checker that doesn't support the language that needs this type of thing the most: CIn this case, it's inexplicable to me since tree-sitter supports C fine.

评论 #43208954 未加载

评论 #43208767 未加载

micksmix3 个月前

评论 #43214742 未加载

评论 #43214510 未加载

pdimitar3 个月前

Wow this looks great. I will be giving it a go VerySoon™!Looking forward to writing some enhanced linters.

etyp3 个月前

评论 #43211024 未加载

评论 #43210055 未加载

评论 #43211102 未加载

micksmix3 个月前

评论 #43214350 未加载

henning3 个月前

Is there a way to add a comment to disable the check rule similar to what you can do in ESLint to ignore a rule?

评论 #43208565 未加载

评论 #43208545 未加载

codepathfinder3 个月前

Nothing comes closer to CodeQL!If anyone is interested please checkout, codepathfinder.dev, truly opensource CodeQL alternative.Feedbacks are appreciated!

评论 #43210132 未加载

Show HN: Globstar – Open-source static analysis toolkit

8 条评论

Show HN: Globstar – Open-source static analysis toolkit

8 条评论