Why Custom Rules Matter
While Semgrep's built-in rulesets detect common vulnerabilities (SQL injection, XSS, hardcoded secrets), they can't understand your application's specific security requirements:
- Business logic: Payment processing flows, access control patterns
- Framework-specific: Custom ORM methods, internal APIs
- Compliance: PCI-DSS data handling, HIPAA PHI protection
- Organizational standards: Approved libraries, deprecated functions
Start by converting recurring code review comments into automated rules. This saves reviewer time and ensures consistency.
Semgrep Rule Anatomy
A Semgrep rule consists of four essential components:
rules:
- id: unique-rule-name
pattern: |
# Code pattern to match
message: Human-readable description of the issue
severity: ERROR # or WARNING, INFO
languages: [javascript, typescript]
metadata:
category: security
cwe: "CWE-89"
owasp: "A03:2021 - Injection"
Pattern Syntax Basics
Ellipsis Operator (...)
Matches zero or more statements, arguments, or expressions:
# Matches any function call with "password" as first arg
foo("password", ...)
# Matches any SQL query concatenation
db.query("SELECT * FROM users WHERE id = " + ...)
# Matches object with specific key (any value)
{ apiKey: ..., ... }
Metavariables ($VAR)
Capture and reuse parts of matched code:
# Capture function name and argument
$FUNC($ARG)
# Ensure same variable used twice
if ($X === null) { return $X; }
# Capture entire expression
const result = $EXPR;
Example 1: Hardcoded AWS Credentials
Detect AWS access keys in source code across multiple assignment patterns:
rules:
- id: hardcoded-aws-credentials
patterns:
- pattern-either:
# Direct assignment
- pattern: |
const $KEY = "AKIA..."
# Object property
- pattern: |
{ accessKeyId: "AKIA...", ... }
# Environment variable (still bad!)
- pattern: |
process.env.AWS_ACCESS_KEY_ID = "AKIA..."
- metavariable-regex:
metavariable: $KEY
regex: ^AKIA[0-9A-Z]{16}$
message: |
Hardcoded AWS credentials detected. Use AWS SDK credential chain:
- IAM roles for EC2/Lambda
- Environment variables (not in code!)
- AWS SSM Parameter Store
- AWS Secrets Manager
severity: ERROR
languages: [javascript, typescript]
metadata:
category: security
cwe: "CWE-798"
owasp: "A07:2021 - Identification and Authentication Failures"
confidence: HIGH
Testing the Rule
Create test cases to validate your rule catches vulnerabilities:
// ruleid: hardcoded-aws-credentials
const accessKey = "AKIAIOSFODNN7EXAMPLE";
// ruleid: hardcoded-aws-credentials
const config = {
accessKeyId: "AKIAIOSFODNN7EXAMPLE",
secretAccessKey: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
};
// ok: hardcoded-aws-credentials (from environment)
const accessKey = process.env.AWS_ACCESS_KEY_ID;
Run tests with: semgrep --test .semgrep/
Example 2: Unsafe Database Queries
Detect SQL injection vulnerabilities in your custom ORM:
rules:
- id: sql-injection-string-concat
patterns:
- pattern-either:
# Template literals with variables
- pattern: |
db.query(`... ${$VAR} ...`)
# String concatenation
- pattern: |
db.query("..." + $VAR + "...")
# .format() method
- pattern: |
db.query("...".format($VAR))
# Exclude safe parameterized queries
- pattern-not: |
db.query("...", [$VAR, ...])
- pattern-not: |
db.query("...", { $KEY: $VAR, ... })
message: |
Potential SQL injection via string concatenation. Use parameterized queries:
Bad: db.query(`SELECT * FROM users WHERE id = ${userId}`)
Good: db.query('SELECT * FROM users WHERE id = ?', [userId])
severity: ERROR
languages: [javascript, typescript]
metadata:
category: security
cwe: "CWE-89"
owasp: "A03:2021 - Injection"
confidence: HIGH
Example 3: Business Logic - Unauthorized Price Modification
Enforce that prices can only be set by specific authorized functions:
rules:
- id: unauthorized-price-modification
patterns:
# Match price assignments
- pattern-either:
- pattern: $OBJ.price = $VALUE
- pattern: $OBJ["price"] = $VALUE
- pattern: |
{ price: $VALUE, ... }
# Only allow in specific functions
- pattern-not-inside: |
function calculatePrice(...) { ... }
- pattern-not-inside: |
function applyDiscount(...) { ... }
- pattern-not-inside: |
class PricingService {
setPrice(...) { ... }
}
message: |
Direct price modification detected outside authorized functions.
Use PricingService.setPrice() or calculatePrice() instead.
This ensures proper validation, audit logging, and approval workflows.
severity: WARNING
languages: [javascript, typescript]
metadata:
category: security
subcategory: business-logic
confidence: MEDIUM
Example 4: Compliance - HIPAA PHI Detection
Ensure Protected Health Information is properly encrypted:
rules:
- id: unencrypted-phi-storage
patterns:
# Match PHI field assignments
- pattern-either:
- pattern: |
db.save({ ssn: $VAL, ... })
- pattern: |
db.save({ medicalRecordNumber: $VAL, ... })
- pattern: |
db.save({ healthInsuranceId: $VAL, ... })
# Ensure encryption function is used
- pattern-not: |
db.save({ $FIELD: encrypt($VAL), ... })
- pattern-not: |
db.save({ $FIELD: encryptPHI($VAL), ... })
message: |
Protected Health Information (PHI) must be encrypted before storage.
Use encryptPHI() function which implements AES-256-GCM encryption.
Example: db.save({ ssn: encryptPHI(patientSSN) })
severity: ERROR
languages: [javascript, typescript]
metadata:
category: security
subcategory: compliance
compliance: HIPAA
cwe: "CWE-311"
Advanced Pattern Matching
Taint Analysis
Track data flow from source (user input) to sink (sensitive operation):
rules:
- id: xss-taint-tracking
mode: taint
pattern-sources:
- pattern: req.query.$PARAM
- pattern: req.body.$PARAM
- pattern: req.params.$PARAM
pattern-sinks:
- pattern: res.send($DATA)
- pattern: $ELEM.innerHTML = $DATA
pattern-sanitizers:
- pattern: sanitizeHTML($DATA)
- pattern: escape($DATA)
message: Unsanitized user input flows to HTML output (XSS risk)
severity: ERROR
languages: [javascript]
Focus Metavariables
Narrow matches to specific parts of code:
rules:
- id: weak-password-hashing
patterns:
- pattern: |
$LIB.hash($PASSWORD, { algorithm: $ALG })
- metavariable-regex:
metavariable: $ALG
regex: ^(md5|sha1)$
- focus-metavariable: $ALG
message: Weak hashing algorithm. Use bcrypt or Argon2
severity: ERROR
languages: [javascript]
Organizing Rules
Directory Structure
.semgrep/
├── rules/
│ ├── security/
│ │ ├── injection.yaml
│ │ ├── crypto.yaml
│ │ └── auth.yaml
│ ├── compliance/
│ │ ├── pci-dss.yaml
│ │ └── hipaa.yaml
│ └── business-logic/
│ ├── pricing.yaml
│ └── permissions.yaml
├── tests/
│ ├── injection.test.js
│ └── crypto.test.js
└── .semgrepignore
.semgrepignore
Exclude files from scanning:
# Dependencies
node_modules/
vendor/
# Generated code
dist/
build/
*.min.js
# Test fixtures (intentionally vulnerable)
test/fixtures/
*.test.js
Best Practices
📝 Clear Messages
Include why it's a problem, how to fix it, and an example. Treat it like a code review comment.
🧪 Test Thoroughly
Create test files with both vulnerable (ruleid:) and safe (ok:) examples. Run semgrep --test before committing.
🎯 Start Specific
Begin with narrow patterns, then expand. Over-broad rules generate noise and get disabled.
🔄 Iterate on Feedback
Monitor false positives in CI. Refine patterns based on real-world usage.
📚 Document Metadata
Include CWE codes, OWASP mappings, and confidence levels for triage priority.
🔐 Version Control
Store rules in Git. Review changes like any code. Tag releases for rollback.
Running Custom Rules
Local Development
# Scan with custom rules directory
semgrep scan --config=.semgrep/rules .
# Test rules
semgrep --test .semgrep/
# Validate rule syntax
semgrep --validate .semgrep/rules/
CI/CD Integration
Add to your GitHub Actions workflow:
- name: Run Custom Semgrep Rules
run: |
semgrep scan \
--config=.semgrep/rules \
--sarif \
--output=custom-rules.sarif \
.
Common Pitfalls
1. Over-Matching
pattern: eval($X)
Matches test files, safe wrappers, even comments with "eval" in them.
patterns:
- pattern: eval($X)
- pattern-not-inside: |
function safeEval(...) { ... }
- pattern-not-inside: |
describe("...", ...)
2. Vague Messages
message: "Potential security issue"
message: |
SQL injection via string concatenation.
Use parameterized queries instead:
db.query('SELECT * FROM users WHERE id = ?', [userId])
3. Ignoring False Positives
Don't just add # nosemgrep comments. Instead, refine the rule with
pattern-not or adjust pattern-inside to exclude legitimate use cases.
Real-World Example: API Rate Limiting
Ensure all API endpoints implement rate limiting:
rules:
- id: missing-rate-limit
patterns:
# Match Express.js route definitions
- pattern-either:
- pattern: |
app.$METHOD($PATH, $HANDLER)
- pattern: |
router.$METHOD($PATH, $HANDLER)
- metavariable-regex:
metavariable: $METHOD
regex: ^(get|post|put|patch|delete)$
# Exclude routes with rate limiting middleware
- pattern-not: |
app.$METHOD($PATH, rateLimit(...), $HANDLER)
- pattern-not: |
router.$METHOD($PATH, rateLimit(...), $HANDLER)
# Exclude health check endpoints
- pattern-not: |
app.$METHOD("/health", ...)
- pattern-not: |
app.$METHOD("/healthz", ...)
message: |
API endpoint missing rate limiting middleware.
Add rateLimiter to prevent abuse:
router.post('/api/users', rateLimiter({ max: 100 }), createUser);
severity: WARNING
languages: [javascript, typescript]
metadata:
category: security
subcategory: api-security
owasp: "A04:2021 - Insecure Design"
Contributing to Semgrep Registry
Share your rules with the community:
- Fork semgrep-rules on GitHub
- Add your rule to appropriate directory (e.g.,
javascript/express/) - Create comprehensive test cases
- Run
semgrep --test --strictto validate - Submit PR with clear description and rationale
Next Steps
You're now equipped to write custom Semgrep rules! Here's what to do next:
- Integrate rules into CI/CD with GitHub Actions
- Try our interactive demo to see rules in action
- Read the official Semgrep docs
- Contact us for custom rule development services
ElevatedIQ offers rule development workshops, false positive tuning, and ongoing rule maintenance. Get in touch to accelerate your security automation.