Semgrep: Semantic grep for code

415 点作者 ievans大约 4 年前

31 条评论

This is an excellent tool to have as a security consultant, and it just keeps getting better and better. When approaching a large codebase, it enables you to write custom rules that match on certain antipatterns you've spotted that may be unique to the codebase. That's the real value of the tool, but the repository of per-language rules is also convenient for quickly finding low-hanging fruit (like every use of a potentially injectable function such as exec,system,etc. in PHP).For example, a webapp may have been designed such that authorisation needs to be explicitly added with a line or two to each controller. A semgrep rule can be written to match all the controllers which are missing this line. Then these controllers can be manually reviewed to assess whether unauthorised access should be allowed. Depending on what you are trying to match, this is something that may be very complex or even impossible to implement accurately in plain grep. Some languages like Ruby have powerful static analysis tools (Brakeman) that can also do this, but the benefit of Semgrep is the flexibility across multiple languages and how readable the rulesets are. [1][1] <a href="https://blog.includesecurity.com/2021/01/custom-static-analysis-rules-showdown-brakeman-vs-semgrep/" rel="nofollow">https://blog.includesecurity.com/2021/01/custom-static-analy...</a>

评论 #26906725 未加载

评论 #26906503 未加载

评论 #26906534 未加载

评论 #26918660 未加载

评论 #26906119 未加载

thesuperbigfrog大约 4 年前

The name "Semantic Grep" does not give a good idea for what this tool is and what it does.The web page states: "Static analysis at ludicrous speed. Find bugs and enforce code standards""grep" is short for "global regular expression print". It finds matches for the given regular expression and prints them."Semantic Grep" is a static analyzer with configurable rules, style checks, etc. It does much more than search and print.Perhaps a better name is needed?Edit: How about "omnilint" or "omnicritic" since semgrep is more of a "lint" (<a href="https://en.wikipedia.org/wiki/Lint_(software)" rel="nofollow">https://en.wikipedia.org/wiki/Lint_(software)</a>) or "critic" (<a href="https://en.wikipedia.org/wiki/Perl::Critic" rel="nofollow">https://en.wikipedia.org/wiki/Perl::Critic</a>) type of tool that handles multiple languages?Edit2: "Static analysis at ludicrous speed" ==> "turbolint"? ("ludicrous speed" reminds of the hilarious Space Balls scene :) "turbolint, GO!"

评论 #26906896 未加载

评论 #26909408 未加载

评论 #26906127 未加载

评论 #26906260 未加载

评论 #26906308 未加载

westurner大约 4 年前

Is there a more complete example of how to call semgrep from pre-commit (which gets called before every git commit) in order to prevent e.g. Python print calls (print(), print \\n(), etc.) from being checked in?<a href="https://semgrep.dev/docs/extensions/" rel="nofollow">https://semgrep.dev/docs/extensions/</a> describes how to do pre-commit.Nvm, here's semgrep's own .pre-commit-config.yml for semgrep itself: <a href="https://github.com/returntocorp/semgrep/blob/develop/.pre-commit-config.yaml" rel="nofollow">https://github.com/returntocorp/semgrep/blob/develop/.pre-co...</a>

评论 #26908362 未加载

评论 #26909757 未加载

SavantIdiot大约 4 年前

Since the capability has never existed, I don't think in terms of being able to semgrep. If that makes any sense. My brain is not wired this way, yet.Like, if you've never tasted lychee, it would never occur to you how to cook with it.I'm going to need to see some useful, real-world examples to jumpstart my brain to think this way.

评论 #26905738 未加载

评论 #26907763 未加载

joshuamorton大约 4 年前

There's lots of confusion about what semgrep does here, which is kind of unfortunate. I haven't touched it much, but I have built a very similar tool (I'm one of the contributors to refex[1], which is a very similar project).The starting point of semantic grep is very useful. When you have a big codebase, you often want to detect antipatterns, or not even antipatterns, but just uses of a thing, say you're renaming a method and want to track down the callers.Being able to act on the AST, instead of hoping you searched up all of the variants of whitespace and line breaks and, depending on the specific example, different uses of argument passing, is really useful.But often when you're semantically grepping, your goal is to replace something with something else (this is what refex was initially built for: to aide in large scale changes in python, as a sort of equivalent to the C++ tools that Google uses).But then you want to shift left even further: once you have a pattern that you want to replace once, you can just enforce that a linter yell at you when anyone does it again. So it's very natural to develop a linter-style thing on top of one of these[2].This is, as I understand it sort of the same thing that happens in C++: clang-tidy and clang-format are written on top of AST libraries that can be used for ad-hoc analysis and transformations, but you can also just plug them into a linter.The thing is, for most organizations, enforcing code style and best practices is more valuable than apply a refactoring to 10M lines of code, because most organizations don't have 10M lines of code to refactor. That doesn't mean that these tools aren't also useful for ad-hoc transforms and exploratory analysis. They absolutely are![1]: <a href="https://github.com/ssbr/refex" rel="nofollow">https://github.com/ssbr/refex</a>[2]: <a href="https://github.com/ssbr/refex/tree/main/refex/fix" rel="nofollow">https://github.com/ssbr/refex/tree/main/refex/fix</a>

enriquto大约 4 年前

> You need to enable JavaScript to run this app.Wait, is this a web app? I was expecting a command line tool to navigate my code locally.

评论 #26905366 未加载

评论 #26905360 未加载

评论 #26905388 未加载

unwind大约 4 年前

When tools like this use terms like "legacy languages", and don't show that C is supported unless you click "More Languages", it makes me feel old. :)Still, it seems rather cool, I like the idea of being able to search code at a higher level than just raw source text.

kesterallen大约 4 年前

Typo in the "Trying Semgrep" screenshot ("ruleste"): <a href="https://semgrep.dev/static/media/Step1.df848497.png" rel="nofollow">https://semgrep.dev/static/media/Step1.df848497.png</a>

jhgb大约 4 年前

Isn't "grep for code" called just "grep"?

评论 #26905441 未加载

评论 #26905348 未加载

leafmeal大约 4 年前

What does this give you over writing a flake8 plugin (for Python at least)?I've found the flake8 API and documentation lacking, so perhaps just a cleaner interface?

rmetzler大约 4 年前

Looks like a useful tool for me and I would like to try it.Go down, see "brew install semgrep" and try to copy paste it. And it's an image :(

评论 #26911490 未加载

hn_throwaway_99大约 4 年前

I currently use a highly opinionated ESLint config (based on the airbnb one) together with strict checking in my TypeScript config, and it is configured to run on every commit with husky git hooks. The example given on the Semgrep homepage is an exact match to one that exists in my ESLint config (eslint's no-console rule).How does Semgrep compare to ESLint+a strict tsconfig?

评论 #26908162 未加载

评论 #26907478 未加载

shuringai大约 4 年前

This is much better alternative to codeQL used by google and does not use a shameless registration-only model! Thanks for sharing

评论 #26908763 未加载

vlovich123大约 4 年前

I want the ease of use of their AST specification with the power of clang’s refactor tool. Has anyone attempted to do that?

pabs3大约 4 年前

Does it come with a standard set of rules that finds bad code without any false positives out of the box? Or is it more of a tool for people doing code security audits & pentesting who know what they are looking for and want to read the surrounding code?

layer8大约 4 年前

No Windows support yet: <a href="https://github.com/returntocorp/semgrep/issues/1330" rel="nofollow">https://github.com/returntocorp/semgrep/issues/1330</a>

评论 #26906379 未加载

评论 #26906411 未加载

CGamesPlay大约 4 年前

How much does the CI service cost? I can't seem to find any information about it on the website without creating an account.

评论 #26909779 未加载

nojvek大约 4 年前

The underlying package tree-sitter that semgrep uses is pretty amazing too. It’s an incremental parser for many different languages written in C.It blows my mind how fast it is compared to many tools in js ecosystem. Tree-sitter was parsing millions of files in half a minute. JS, TS, Ruby, yaml, html, Css. It’s quite magical. Such great engineering.

vindarel大约 4 年前

Interesting. Looks similar to Comby: <a href="https://comby.dev/" rel="nofollow">https://comby.dev/</a> "a tool for searching and changing code structure". Comby is more on rewriting, it has less integration for a CI (though you can do it), it is less geared towards reporting.

wdb大约 4 年前

Apparently this is invalid TypeScript (cannot parse it says):<pre><code> try { const parsedURL = new URL(url) requestPath = parsedURL.pathname } catch (error: unknown) { // NOOP } </code></pre> It's complaining about : unknown bit which one of the newer typescript eslint rules enforces.

评论 #26909879 未加载

realquadrant大约 4 年前

Hi, this is very cool. I have been building up a suite of tools to roll out across major open source projects to improve security. I like what I have seen so far, this is a great use case. Whom can I connect with to learn more? And similarity/diff with sourcegraph, also like a lot.

silasb大约 4 年前

Just the tool that I was looking for. We are looking to do Service linting in our organization as a method of making sure our services don't drift too far apart.Anyone else know of a Service linting tool? OPA/conftest come close but lack syntax parsers for Ruby/Javascript.

more_corn大约 4 年前

I used to use SAST-SCAN but that seems abandonware. I like that this exists. Everyone should go from nothing to something in the SAST space. A free/freemium tool/service for that is pretty great. The first couple runs have found useful results.

afro88大约 4 年前

No swift support yet. What would be involved in adding it?

评论 #26910387 未加载

minusf大约 4 年前

probably doing something wrong but running the ci ruleset on a tiny django hobby project made all cores spin at 100% after 33% of the progress bar and made the OS almost unresponsive. ctrl-c after 5 minutes and i still had to pkill every semgrep process... never seen the M1 airbook overheat this much before.

评论 #26910191 未加载

sriram_malhar大约 4 年前

Nice looking tool.Is there a way to search for functions in C (other than printf!) whose return value is ignored at the call site?

pantuza大约 4 年前

Really outstanding those guardrails rules from semgrep. Useful to enforce code. Thanks for sharing the tool.

globular-toast大约 4 年前

Whenever I see "at ludicrous speed" or something to that effect, I now assume it's slow.

Annatar大约 4 年前

I click on the link above and I get a seemingly blank page, all because the website uses some JavaScript garbage and violates W3C standards. That's the ridiculous, disgusting state of the information technology industry in the 21st century. I rue the day I decided to do this professionally, and I am deeply ashamed and despondent.

hardon4semgrep大约 4 年前

How does this compare to the tools available at large companies like Google and Facebook?

solipsism大约 4 年前

What's the status of C++ support?

评论 #26911802 未加载