Rule structure syntax
Getting started with rule writing? Try the Semgrep Tutorial ๐
This document describes the YAML rule syntax of Semgrep.
Schemaโ
Requiredโ
All required fields must be present at the top level of a rule immediately under the rules key.
| Field | Type | Description |
|---|---|---|
id | string | Unique, descriptive identifier, for example: no-unused-variable |
message | string | Message that includes why Semgrep matched this pattern and how to remediate it. See also Rule messages. |
severity | string | Severity can be LOW, MEDIUM, HIGH, or CRITICAL. It indicates the criticality of issues detected by a rule. Note: Semgrep Supply Chain uses CVE assignments for severity, while the rule author sets severity for Code and Secrets. The older levels ERROR, WARNING, and INFO match HIGH, MEDIUM, and LOW. Severity values remain backwards compatible. |
languages | array | See language extensions and tags. |
pattern* | string | Find code matching this expression |
patterns* | array | Logical AND of multiple patterns |
pattern-either* | array | Logical OR of multiple patterns |
pattern-regex* | string | Find code matching this PCRE2-compatible pattern in multiline mode |
Only one of the following keys are required: pattern, patterns, pattern-either, pattern-regex
Language extensions and languages key valuesโ
The following table includes languages supported by Semgrep, accepted file extensions for test files that accompany the rules, and valid values that Semgrep rules require in the languages key.
| Language | Extensions | languages key values |
|---|---|---|
| Apex (only in Semgrep Pro Engine) | .cls | apex |
| Bash | .bash, .sh | bash, sh |
| C | .c, .h | c |
| Cairo | .cairo | cairo |
| Circom | .circom | circom |
| Clojure | .clj, .cljs, .cljc, .edn | clojure |
| C++ | .cc, .cpp, .cxx, .c++, .pcc, .tpp, .C, .h, .hh, .hpp, .hxx, .inl, .ipp | cpp, c++ |
| C# | .cs | csharp, c# |
| Dart | .dart | dart |
| Dockerfile | .dockerfile, .Dockerfile, dockerfile, Dockerfile | dockerfile, docker |
| Elixir (only in Semgrep Pro Engine) | .ex, .exs | ex, elixir |
| Generic | .generic | generic |
| Go | .go | go, golang |
| Gosu (only in Semgrep Pro Engine) | .gs | gosu |
| Hack | .hack, .hck, .hh | hack |
| HTML | .htm, .html | html |
| Java | .java | java |
| JavaScript | .js, .jsx, .cjs, .mjs | js, javascript |
| JSON | .json, .ipynb | json |
| Jsonnet | .jsonnet, .libsonnet | jsonnet |
| JSX | .js, .jsx | js, javascript |
| Julia | .jl | julia |
| Kotlin | .kt, .kts, .ktm | kt, kotlin |
| Lisp | .lisp, .cl, .el | lisp |
| Lua | .lua | lua |
| Move on SUI | .move | move_on_sui |
| Move on Aptos | .move | move_on_aptos |
| OCaml | .ml, .mli | ocaml |
| PHP | .php, .tpl, .phtml | php |
| Prometheus Query Language | .promql | promql |
| Protocol Buffers | .proto | proto, protobuf, proto3 |
| Python | .py, .pyi | python, python2, python3, py |
| QL | .ql, .qll | ql |
| R | .r, .R | r |
| Ruby | .rb | ruby |
| Rust | .rs | rust |
| Scala | .scala | scala |
| Scheme | .scm, .ss | scheme |
| Solidity | .sol | solidity, sol |
| Swift | .swift | swift |
| Terraform | .tf, .hcl, .tfvars | tf, hcl, terraform |
| TypeScript | .ts, .tsx | ts, typescript |
| Vue | .vue | vue |
| XML | .xml, .plist | xml |
| YAML | .yml, .yaml | yaml |
To see the maturity level of each supported language, see the following references:
Optionalโ
| Field | Type | Description |
|---|---|---|
options | object | Options object to turn on or turn off matching features |
fix | object | Simple search-and-replace capability |
metadata | object | Arbitrary user-provided data; attach data to rules without affecting Semgrep behavior |
min-version | string | Minimum Semgrep version compatible with the rule |
max-version | string | Maximum Semgrep version compatible with the rule |
paths | object | Paths to include or exclude when running the rule |
The following field is optional, but if used, it must be nested underneath a patterns or pattern-either field.
| Field | Type | Description |
|---|---|---|
pattern-inside | string | Keep findings that lie inside this pattern |
The following fields are optional, but if used, they must be nested underneath a patterns field.
| Field | Type | Description |
|---|---|---|
metavariable-regex | map | Search metavariables for Python re compatible expressions; regex matching is left anchored |
metavariable-pattern | map | Match metavariables with a pattern formula |
metavariable-comparison | map | Compare metavariables against basic Python expressions |
metavariable-name | map | Match metavariables against constraints on what they name |
pattern-not | string | Logical NOT - remove findings matching this expression |
pattern-not-inside | string | Keep findings that do not lie inside this pattern |
pattern-not-regex | string | Filter results using a PCRE2-compatible pattern in multiline mode |
Operatorsโ
patternโ
The pattern operator looks for code matching its expression. This can be basic expressions like $X == $X or unwanted function calls like hashlib.md5(...).
rules:
- id: md5-usage
languages:
- python
message: Found md5 usage
pattern: hashlib.md5(...)
severity: HIGH
The preceding pattern matches the following:
import hashlib
# ruleid: md5-usage
digest = hashlib.md5(b"test")
# ok: md5-usage
digest = hashlib.sha256(b"test")
patternsโ
The patterns operator performs a logical AND operation on one or more child patterns. This is useful for chaining multiple patterns together where all patterns must be true.
rules:
- id: unverified-db-query
patterns:
- pattern: db_query(...)
- pattern-not: db_query(..., verify=True, ...)
message: Found unverified db query
severity: HIGH
languages:
- python
The preceding pattern matches the following:
# ruleid: unverified-db-query
db_query("SELECT * FROM ...")
# ok: unverified-db-query
db_query("SELECT * FROM ...", verify=True, env="prod")
patterns operator evaluation strategyโ
The order in which the child patterns are declared in a patterns operator does not affect the final result. A patterns operator is always evaluated in the same way:
- Semgrep evaluates all positive patterns, including
pattern-insides,patterns,pattern-regexes, andpattern-eithers. Each range matched by one of these patterns is intersected with the ranges matched by the other operators. The result is a set of positive ranges. The positive ranges carry metavariable bindings. For example, in one range,$Xcan be bound to the function callfoo(), and in another range$Xcan be bound to the expressiona + b. - Semgrep evaluates all negative patterns, including
pattern-not-insides,pattern-nots, andpattern-not-regexes. This provides a set of negative ranges which are used to filter the positive ranges. This results in a strict subset of the positive ranges computed in the previous step. - Semgrep evaluates all conditionals, including
metavariable-regexes,metavariable-patterns, andmetavariable-comparisons. These conditional operators can only examine the metavariables bound in the positive ranges in step 1 and have been filtered through the negative patterns in step 2. Note that metavariables bound by negative patterns are not available here. - Semgrep applies all
focus-metavariables by computing the intersection of each positive range with the range of the metavariable on which you want to focus. Again, the only metavariables available to focus on are those bound by positive patterns.
pattern-eitherโ
The pattern-either operator performs a logical OR operation on one or more child patterns. This is useful for chaining multiple patterns together where any may be true.
rules:
- id: insecure-crypto-usage
pattern-either:
- pattern: hashlib.sha1(...)
- pattern: hashlib.md5(...)
message: Found insecure crypto usage
languages:
- python
severity: HIGH
The preceding pattern matches the following:
import hashlib
# ruleid: insecure-crypto-usage
digest = hashlib.md5(b"test")
# ruleid: insecure-crypto-usage
digest = hashlib.sha1(b"test")
# ok: insecure-crypto-usage
digest = hashlib.sha256(b"test")
This rule checks for the use of Python standard library functions hashlib.md5 or hashlib.sha1. Depending on their usage, these hashing functions are considered insecure.
pattern-regexโ
The pattern-regex operator searches files for substrings matching the given Perl-Compatible Regular Expressions (PCRE) pattern. PCRE is a full-featured regular expression (regex) library that is widely compatible with Perl, as well as with the respective regex libraries of Python, JavaScript, Go, Ruby, and Java. This is useful for migrating existing regular expression code search capability to Semgrep. Patterns are compiled in multiline mode. For example, ^ and $ match at the beginning and end of lines, respectively, in addition to the beginning and end of input.
PCRE2 supports some Unicode character properties, but not some Perl properties. For example, \p{Egyptian_Hieroglyphs} is supported, but \p{InMusicalSymbols} isn't.
Example: pattern-regex combined with other pattern operatorsโ
rules:
- id: boto-client-ip
patterns:
- pattern-inside: boto3.client(host="...")
- pattern-regex: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
message: boto client using IP address
languages:
- python
severity: HIGH
The preceding pattern matches the following:
import boto3
# ruleid: boto-client-ip
client = boto3.client(host="192.168.1.200")
# ok: boto-client-ip
client = boto3.client(host="dev.internal.example.com")
Example: pattern-regex used as a standalone, top-level operatorโ
rules:
- id: legacy-eval-search
pattern-regex: eval\(
message: Insecure code execution
languages:
- javascript
severity: HIGH
The preceding pattern matches the following:
# ruleid: legacy-eval-search
eval('var a = 5')
Single (') and double (") quotes behave differently in YAML syntax. Single quotes are typically preferred when using backslashes (\) with pattern-regex.
Note that you may bind a section of a regular expression to a metavariable by using named capturing groups. In this case, the name of the capturing group must be a valid metavariable name.
rules:
- id: my_pattern_id-copy
patterns:
- pattern-regex: a(?P<FIRST>.*)b(?P<SECOND>.*)
message: Semgrep found a match, with $FIRST and $SECOND
languages:
- regex
severity: MEDIUM
The preceding pattern matches the following:
acbd