Writing Custom Semgrep Rules

Why Custom Rules Matter

While Semgrep's built-in rulesets detect common vulnerabilities (SQL injection, XSS, hardcoded secrets), they can't understand your application's specific security requirements:

Business logic: Payment processing flows, access control patterns
Framework-specific: Custom ORM methods, internal APIs
Compliance: PCI-DSS data handling, HIPAA PHI protection
Organizational standards: Approved libraries, deprecated functions

💡 Pro Tip:

Start by converting recurring code review comments into automated rules. This saves reviewer time and ensures consistency.

Semgrep Rule Anatomy

A Semgrep rule consists of four essential components:

Basic Rule Structure

rules:
  - id: unique-rule-name
    pattern: |
      # Code pattern to match
    message: Human-readable description of the issue
    severity: ERROR  # or WARNING, INFO
    languages: [javascript, typescript]
    metadata:
      category: security
      cwe: "CWE-89"
      owasp: "A03:2021 - Injection"

Pattern Syntax Basics

Ellipsis Operator (...)

Matches zero or more statements, arguments, or expressions:

Ellipsis Examples

# Matches any function call with "password" as first arg
foo("password", ...)

# Matches any SQL query concatenation
db.query("SELECT * FROM users WHERE id = " + ...)

# Matches object with specific key (any value)
{ apiKey: ..., ... }

Metavariables ($VAR)

Capture and reuse parts of matched code:

Metavariable Examples

# Capture function name and argument
$FUNC($ARG)

# Ensure same variable used twice
if ($X === null) { return $X; }

# Capture entire expression
const result = $EXPR;

Example 1: Hardcoded AWS Credentials

Detect AWS access keys in source code across multiple assignment patterns:

.semgrep/aws-credentials.yaml

rules:
  - id: hardcoded-aws-credentials
    patterns:
      - pattern-either:
          # Direct assignment
          - pattern: |
              const $KEY = "AKIA..."
          # Object property
          - pattern: |
              { accessKeyId: "AKIA...", ... }
          # Environment variable (still bad!)
          - pattern: |
              process.env.AWS_ACCESS_KEY_ID = "AKIA..."
      - metavariable-regex:
          metavariable: $KEY
          regex: ^AKIA[0-9A-Z]{16}$
    message: |
      Hardcoded AWS credentials detected. Use AWS SDK credential chain:
      - IAM roles for EC2/Lambda
      - Environment variables (not in code!)
      - AWS SSM Parameter Store
      - AWS Secrets Manager
    severity: ERROR
    languages: [javascript, typescript]
    metadata:
      category: security
      cwe: "CWE-798"
      owasp: "A07:2021 - Identification and Authentication Failures"
      confidence: HIGH

Testing the Rule

Create test cases to validate your rule catches vulnerabilities:

.semgrep/aws-credentials.test.js

// ruleid: hardcoded-aws-credentials
const accessKey = "AKIAIOSFODNN7EXAMPLE";

// ruleid: hardcoded-aws-credentials
const config = {
  accessKeyId: "AKIAIOSFODNN7EXAMPLE",
  secretAccessKey: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
};

// ok: hardcoded-aws-credentials (from environment)
const accessKey = process.env.AWS_ACCESS_KEY_ID;

Run tests with: semgrep --test .semgrep/

Example 2: Unsafe Database Queries

Detect SQL injection vulnerabilities in your custom ORM:

.semgrep/sql-injection.yaml

rules:
  - id: sql-injection-string-concat
    patterns:
      - pattern-either:
          # Template literals with variables
          - pattern: |
              db.query(`... ${$VAR} ...`)
          # String concatenation
          - pattern: |
              db.query("..." + $VAR + "...")
          # .format() method
          - pattern: |
              db.query("...".format($VAR))
      # Exclude safe parameterized queries
      - pattern-not: |
          db.query("...", [$VAR, ...])
      - pattern-not: |
          db.query("...", { $KEY: $VAR, ... })
    message: |
      Potential SQL injection via string concatenation. Use parameterized queries:
      
      Bad:  db.query(`SELECT * FROM users WHERE id = ${userId}`)
      Good: db.query('SELECT * FROM users WHERE id = ?', [userId])
    severity: ERROR
    languages: [javascript, typescript]
    metadata:
      category: security
      cwe: "CWE-89"
      owasp: "A03:2021 - Injection"
      confidence: HIGH

Example 3: Business Logic - Unauthorized Price Modification

Enforce that prices can only be set by specific authorized functions:

.semgrep/price-modification.yaml

rules:
  - id: unauthorized-price-modification
    patterns:
      # Match price assignments
      - pattern-either:
          - pattern: $OBJ.price = $VALUE
          - pattern: $OBJ["price"] = $VALUE
          - pattern: |
              { price: $VALUE, ... }
      # Only allow in specific functions
      - pattern-not-inside: |
          function calculatePrice(...) { ... }
      - pattern-not-inside: |
          function applyDiscount(...) { ... }
      - pattern-not-inside: |
          class PricingService { 
            setPrice(...) { ... }
          }
    message: |
      Direct price modification detected outside authorized functions.
      Use PricingService.setPrice() or calculatePrice() instead.
      This ensures proper validation, audit logging, and approval workflows.
    severity: WARNING
    languages: [javascript, typescript]
    metadata:
      category: security
      subcategory: business-logic
      confidence: MEDIUM

Example 4: Compliance - HIPAA PHI Detection

Ensure Protected Health Information is properly encrypted:

.semgrep/hipaa-phi.yaml

rules:
  - id: unencrypted-phi-storage
    patterns:
      # Match PHI field assignments
      - pattern-either:
          - pattern: |
              db.save({ ssn: $VAL, ... })
          - pattern: |
              db.save({ medicalRecordNumber: $VAL, ... })
          - pattern: |
              db.save({ healthInsuranceId: $VAL, ... })
      # Ensure encryption function is used
      - pattern-not: |
          db.save({ $FIELD: encrypt($VAL), ... })
      - pattern-not: |
          db.save({ $FIELD: encryptPHI($VAL), ... })
    message: |
      Protected Health Information (PHI) must be encrypted before storage.
      Use encryptPHI() function which implements AES-256-GCM encryption.
      
      Example: db.save({ ssn: encryptPHI(patientSSN) })
    severity: ERROR
    languages: [javascript, typescript]
    metadata:
      category: security
      subcategory: compliance
      compliance: HIPAA
      cwe: "CWE-311"

Advanced Pattern Matching

Taint Analysis

Track data flow from source (user input) to sink (sensitive operation):

Taint Tracking Example

rules:
  - id: xss-taint-tracking
    mode: taint
    pattern-sources:
      - pattern: req.query.$PARAM
      - pattern: req.body.$PARAM
      - pattern: req.params.$PARAM
    pattern-sinks:
      - pattern: res.send($DATA)
      - pattern: $ELEM.innerHTML = $DATA
    pattern-sanitizers:
      - pattern: sanitizeHTML($DATA)
      - pattern: escape($DATA)
    message: Unsanitized user input flows to HTML output (XSS risk)
    severity: ERROR
    languages: [javascript]

Focus Metavariables

Narrow matches to specific parts of code:

Focus Example

rules:
  - id: weak-password-hashing
    patterns:
      - pattern: |
          $LIB.hash($PASSWORD, { algorithm: $ALG })
      - metavariable-regex:
          metavariable: $ALG
          regex: ^(md5|sha1)$
      - focus-metavariable: $ALG
    message: Weak hashing algorithm. Use bcrypt or Argon2
    severity: ERROR
    languages: [javascript]

Organizing Rules

Directory Structure

.semgrep/
├── rules/
│   ├── security/
│   │   ├── injection.yaml
│   │   ├── crypto.yaml
│   │   └── auth.yaml
│   ├── compliance/
│   │   ├── pci-dss.yaml
│   │   └── hipaa.yaml
│   └── business-logic/
│       ├── pricing.yaml
│       └── permissions.yaml
├── tests/
│   ├── injection.test.js
│   └── crypto.test.js
└── .semgrepignore

.semgrepignore

Exclude files from scanning:

# Dependencies
node_modules/
vendor/

# Generated code
dist/
build/
*.min.js

# Test fixtures (intentionally vulnerable)
test/fixtures/
*.test.js

Best Practices

📝 Clear Messages

Include why it's a problem, how to fix it, and an example. Treat it like a code review comment.

🧪 Test Thoroughly

Create test files with both vulnerable (ruleid:) and safe (ok:) examples. Run semgrep --test before committing.

🎯 Start Specific

Begin with narrow patterns, then expand. Over-broad rules generate noise and get disabled.

🔄 Iterate on Feedback

Monitor false positives in CI. Refine patterns based on real-world usage.

📚 Document Metadata

Include CWE codes, OWASP mappings, and confidence levels for triage priority.

🔐 Version Control

Store rules in Git. Review changes like any code. Tag releases for rollback.

Running Custom Rules

Local Development

# Scan with custom rules directory
semgrep scan --config=.semgrep/rules .

# Test rules
semgrep --test .semgrep/

# Validate rule syntax
semgrep --validate .semgrep/rules/

CI/CD Integration

Add to your GitHub Actions workflow:

- name: Run Custom Semgrep Rules
  run: |
    semgrep scan \
      --config=.semgrep/rules \
      --sarif \
      --output=custom-rules.sarif \
      .

Common Pitfalls

1. Over-Matching

❌ Bad:

pattern: eval($X)

Matches test files, safe wrappers, even comments with "eval" in them.

✅ Good:

patterns:
  - pattern: eval($X)
  - pattern-not-inside: |
      function safeEval(...) { ... }
  - pattern-not-inside: |
      describe("...", ...)

2. Vague Messages

❌ Bad:

message: "Potential security issue"

✅ Good:

message: |
  SQL injection via string concatenation.
  Use parameterized queries instead:
  db.query('SELECT * FROM users WHERE id = ?', [userId])

3. Ignoring False Positives

Don't just add # nosemgrep comments. Instead, refine the rule with pattern-not or adjust pattern-inside to exclude legitimate use cases.

Real-World Example: API Rate Limiting

Ensure all API endpoints implement rate limiting:

.semgrep/rate-limiting.yaml

rules:
  - id: missing-rate-limit
    patterns:
      # Match Express.js route definitions
      - pattern-either:
          - pattern: |
              app.$METHOD($PATH, $HANDLER)
          - pattern: |
              router.$METHOD($PATH, $HANDLER)
      - metavariable-regex:
          metavariable: $METHOD
          regex: ^(get|post|put|patch|delete)$
      # Exclude routes with rate limiting middleware
      - pattern-not: |
          app.$METHOD($PATH, rateLimit(...), $HANDLER)
      - pattern-not: |
          router.$METHOD($PATH, rateLimit(...), $HANDLER)
      # Exclude health check endpoints
      - pattern-not: |
          app.$METHOD("/health", ...)
      - pattern-not: |
          app.$METHOD("/healthz", ...)
    message: |
      API endpoint missing rate limiting middleware.
      Add rateLimiter to prevent abuse:
      
      router.post('/api/users', rateLimiter({ max: 100 }), createUser);
    severity: WARNING
    languages: [javascript, typescript]
    metadata:
      category: security
      subcategory: api-security
      owasp: "A04:2021 - Insecure Design"

Contributing to Semgrep Registry

Share your rules with the community:

Fork semgrep-rules on GitHub
Add your rule to appropriate directory (e.g., javascript/express/)
Create comprehensive test cases
Run semgrep --test --strict to validate
Submit PR with clear description and rationale

Next Steps

You're now equipped to write custom Semgrep rules! Here's what to do next:

Integrate rules into CI/CD with GitHub Actions
Try our interactive demo to see rules in action
Read the official Semgrep docs
Contact us for custom rule development services

💡 Need Custom Rules?

ElevatedIQ offers rule development workshops, false positive tuning, and ongoing rule maintenance. Get in touch to accelerate your security automation.