Writing Custom Semgrep Rules: Security Patterns for Your Codebase

Learn how to create custom Semgrep rules to detect security vulnerabilities unique to your application, enforce coding standards, and catch business logic flaws that generic tools miss.

Why Custom Rules Matter

While Semgrep's built-in rulesets detect common vulnerabilities (SQL injection, XSS, hardcoded secrets), they can't understand your application's specific security requirements:

  • Business logic: Payment processing flows, access control patterns
  • Framework-specific: Custom ORM methods, internal APIs
  • Compliance: PCI-DSS data handling, HIPAA PHI protection
  • Organizational standards: Approved libraries, deprecated functions
💡 Pro Tip:

Start by converting recurring code review comments into automated rules. This saves reviewer time and ensures consistency.

Semgrep Rule Anatomy

A Semgrep rule consists of four essential components:

Basic Rule Structure
rules:
  - id: unique-rule-name
    pattern: |
      # Code pattern to match
    message: Human-readable description of the issue
    severity: ERROR  # or WARNING, INFO
    languages: [javascript, typescript]
    metadata:
      category: security
      cwe: "CWE-89"
      owasp: "A03:2021 - Injection"

Pattern Syntax Basics

Ellipsis Operator (...)

Matches zero or more statements, arguments, or expressions:

Ellipsis Examples
# Matches any function call with "password" as first arg
foo("password", ...)

# Matches any SQL query concatenation
db.query("SELECT * FROM users WHERE id = " + ...)

# Matches object with specific key (any value)
{ apiKey: ..., ... }

Metavariables ($VAR)

Capture and reuse parts of matched code:

Metavariable Examples
# Capture function name and argument
$FUNC($ARG)

# Ensure same variable used twice
if ($X === null) { return $X; }

# Capture entire expression
const result = $EXPR;

Example 1: Hardcoded AWS Credentials

Detect AWS access keys in source code across multiple assignment patterns:

.semgrep/aws-credentials.yaml
rules:
  - id: hardcoded-aws-credentials
    patterns:
      - pattern-either:
          # Direct assignment
          - pattern: |
              const $KEY = "AKIA..."
          # Object property
          - pattern: |
              { accessKeyId: "AKIA...", ... }
          # Environment variable (still bad!)
          - pattern: |
              process.env.AWS_ACCESS_KEY_ID = "AKIA..."
      - metavariable-regex:
          metavariable: $KEY
          regex: ^AKIA[0-9A-Z]{16}$
    message: |
      Hardcoded AWS credentials detected. Use AWS SDK credential chain:
      - IAM roles for EC2/Lambda
      - Environment variables (not in code!)
      - AWS SSM Parameter Store
      - AWS Secrets Manager
    severity: ERROR
    languages: [javascript, typescript]
    metadata:
      category: security
      cwe: "CWE-798"
      owasp: "A07:2021 - Identification and Authentication Failures"
      confidence: HIGH

Testing the Rule

Create test cases to validate your rule catches vulnerabilities:

.semgrep/aws-credentials.test.js
// ruleid: hardcoded-aws-credentials
const accessKey = "AKIAIOSFODNN7EXAMPLE";

// ruleid: hardcoded-aws-credentials
const config = {
  accessKeyId: "AKIAIOSFODNN7EXAMPLE",
  secretAccessKey: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
};

// ok: hardcoded-aws-credentials (from environment)
const accessKey = process.env.AWS_ACCESS_KEY_ID;

Run tests with: semgrep --test .semgrep/

Example 2: Unsafe Database Queries

Detect SQL injection vulnerabilities in your custom ORM:

.semgrep/sql-injection.yaml
rules:
  - id: sql-injection-string-concat
    patterns:
      - pattern-either:
          # Template literals with variables
          - pattern: |
              db.query(`... ${$VAR} ...`)
          # String concatenation
          - pattern: |
              db.query("..." + $VAR + "...")
          # .format() method
          - pattern: |
              db.query("...".format($VAR))
      # Exclude safe parameterized queries
      - pattern-not: |
          db.query("...", [$VAR, ...])
      - pattern-not: |
          db.query("...", { $KEY: $VAR, ... })
    message: |
      Potential SQL injection via string concatenation. Use parameterized queries:
      
      Bad:  db.query(`SELECT * FROM users WHERE id = ${userId}`)
      Good: db.query('SELECT * FROM users WHERE id = ?', [userId])
    severity: ERROR
    languages: [javascript, typescript]
    metadata:
      category: security
      cwe: "CWE-89"
      owasp: "A03:2021 - Injection"
      confidence: HIGH

Example 3: Business Logic - Unauthorized Price Modification

Enforce that prices can only be set by specific authorized functions:

.semgrep/price-modification.yaml
rules:
  - id: unauthorized-price-modification
    patterns:
      # Match price assignments
      - pattern-either:
          - pattern: $OBJ.price = $VALUE
          - pattern: $OBJ["price"] = $VALUE
          - pattern: |
              { price: $VALUE, ... }
      # Only allow in specific functions
      - pattern-not-inside: |
          function calculatePrice(...) { ... }
      - pattern-not-inside: |
          function applyDiscount(...) { ... }
      - pattern-not-inside: |
          class PricingService { 
            setPrice(...) { ... }
          }
    message: |
      Direct price modification detected outside authorized functions.
      Use PricingService.setPrice() or calculatePrice() instead.
      This ensures proper validation, audit logging, and approval workflows.
    severity: WARNING
    languages: [javascript, typescript]
    metadata:
      category: security
      subcategory: business-logic
      confidence: MEDIUM

Example 4: Compliance - HIPAA PHI Detection

Ensure Protected Health Information is properly encrypted:

.semgrep/hipaa-phi.yaml
rules:
  - id: unencrypted-phi-storage
    patterns:
      # Match PHI field assignments
      - pattern-either:
          - pattern: |
              db.save({ ssn: $VAL, ... })
          - pattern: |
              db.save({ medicalRecordNumber: $VAL, ... })
          - pattern: |
              db.save({ healthInsuranceId: $VAL, ... })
      # Ensure encryption function is used
      - pattern-not: |
          db.save({ $FIELD: encrypt($VAL), ... })
      - pattern-not: |
          db.save({ $FIELD: encryptPHI($VAL), ... })
    message: |
      Protected Health Information (PHI) must be encrypted before storage.
      Use encryptPHI() function which implements AES-256-GCM encryption.
      
      Example: db.save({ ssn: encryptPHI(patientSSN) })
    severity: ERROR
    languages: [javascript, typescript]
    metadata:
      category: security
      subcategory: compliance
      compliance: HIPAA
      cwe: "CWE-311"

Advanced Pattern Matching

Taint Analysis

Track data flow from source (user input) to sink (sensitive operation):

Taint Tracking Example
rules:
  - id: xss-taint-tracking
    mode: taint
    pattern-sources:
      - pattern: req.query.$PARAM
      - pattern: req.body.$PARAM
      - pattern: req.params.$PARAM
    pattern-sinks:
      - pattern: res.send($DATA)
      - pattern: $ELEM.innerHTML = $DATA
    pattern-sanitizers:
      - pattern: sanitizeHTML($DATA)
      - pattern: escape($DATA)
    message: Unsanitized user input flows to HTML output (XSS risk)
    severity: ERROR
    languages: [javascript]

Focus Metavariables

Narrow matches to specific parts of code:

Focus Example
rules:
  - id: weak-password-hashing
    patterns:
      - pattern: |
          $LIB.hash($PASSWORD, { algorithm: $ALG })
      - metavariable-regex:
          metavariable: $ALG
          regex: ^(md5|sha1)$
      - focus-metavariable: $ALG
    message: Weak hashing algorithm. Use bcrypt or Argon2
    severity: ERROR
    languages: [javascript]

Organizing Rules

Directory Structure

.semgrep/
├── rules/
│   ├── security/
│   │   ├── injection.yaml
│   │   ├── crypto.yaml
│   │   └── auth.yaml
│   ├── compliance/
│   │   ├── pci-dss.yaml
│   │   └── hipaa.yaml
│   └── business-logic/
│       ├── pricing.yaml
│       └── permissions.yaml
├── tests/
│   ├── injection.test.js
│   └── crypto.test.js
└── .semgrepignore

.semgrepignore

Exclude files from scanning:

# Dependencies
node_modules/
vendor/

# Generated code
dist/
build/
*.min.js

# Test fixtures (intentionally vulnerable)
test/fixtures/
*.test.js

Best Practices

📝 Clear Messages

Include why it's a problem, how to fix it, and an example. Treat it like a code review comment.

🧪 Test Thoroughly

Create test files with both vulnerable (ruleid:) and safe (ok:) examples. Run semgrep --test before committing.

🎯 Start Specific

Begin with narrow patterns, then expand. Over-broad rules generate noise and get disabled.

🔄 Iterate on Feedback

Monitor false positives in CI. Refine patterns based on real-world usage.

📚 Document Metadata

Include CWE codes, OWASP mappings, and confidence levels for triage priority.

🔐 Version Control

Store rules in Git. Review changes like any code. Tag releases for rollback.

Running Custom Rules

Local Development

# Scan with custom rules directory
semgrep scan --config=.semgrep/rules .

# Test rules
semgrep --test .semgrep/

# Validate rule syntax
semgrep --validate .semgrep/rules/

CI/CD Integration

Add to your GitHub Actions workflow:

- name: Run Custom Semgrep Rules
  run: |
    semgrep scan \
      --config=.semgrep/rules \
      --sarif \
      --output=custom-rules.sarif \
      .

Common Pitfalls

1. Over-Matching

❌ Bad:
pattern: eval($X)

Matches test files, safe wrappers, even comments with "eval" in them.

✅ Good:
patterns:
  - pattern: eval($X)
  - pattern-not-inside: |
      function safeEval(...) { ... }
  - pattern-not-inside: |
      describe("...", ...)

2. Vague Messages

❌ Bad:
message: "Potential security issue"
✅ Good:
message: |
  SQL injection via string concatenation.
  Use parameterized queries instead:
  db.query('SELECT * FROM users WHERE id = ?', [userId])

3. Ignoring False Positives

Don't just add # nosemgrep comments. Instead, refine the rule with pattern-not or adjust pattern-inside to exclude legitimate use cases.

Real-World Example: API Rate Limiting

Ensure all API endpoints implement rate limiting:

.semgrep/rate-limiting.yaml
rules:
  - id: missing-rate-limit
    patterns:
      # Match Express.js route definitions
      - pattern-either:
          - pattern: |
              app.$METHOD($PATH, $HANDLER)
          - pattern: |
              router.$METHOD($PATH, $HANDLER)
      - metavariable-regex:
          metavariable: $METHOD
          regex: ^(get|post|put|patch|delete)$
      # Exclude routes with rate limiting middleware
      - pattern-not: |
          app.$METHOD($PATH, rateLimit(...), $HANDLER)
      - pattern-not: |
          router.$METHOD($PATH, rateLimit(...), $HANDLER)
      # Exclude health check endpoints
      - pattern-not: |
          app.$METHOD("/health", ...)
      - pattern-not: |
          app.$METHOD("/healthz", ...)
    message: |
      API endpoint missing rate limiting middleware.
      Add rateLimiter to prevent abuse:
      
      router.post('/api/users', rateLimiter({ max: 100 }), createUser);
    severity: WARNING
    languages: [javascript, typescript]
    metadata:
      category: security
      subcategory: api-security
      owasp: "A04:2021 - Insecure Design"

Contributing to Semgrep Registry

Share your rules with the community:

  1. Fork semgrep-rules on GitHub
  2. Add your rule to appropriate directory (e.g., javascript/express/)
  3. Create comprehensive test cases
  4. Run semgrep --test --strict to validate
  5. Submit PR with clear description and rationale

Next Steps

You're now equipped to write custom Semgrep rules! Here's what to do next:

💡 Need Custom Rules?

ElevatedIQ offers rule development workshops, false positive tuning, and ongoing rule maintenance. Get in touch to accelerate your security automation.