Enrichment Pipeline
How the enrichment pipeline works: domain, IP, threat intel, WHOIS, GeoIP, BGP.
Enrichment Pipeline Documentation
Overview
The enrichment pipeline adds infrastructure context from the Knowledge Graph (billions of nodes, tens of billions of edges) to your security events.
Diagram
Architecture
The enrichment pipeline is a streaming search command (whisperlookup) that processes events inline in the SPL pipeline. Each event passes through five stages:
Diagram
The pipeline is built from four modules:
| Module | Role |
|---|---|
whisper_enrichment.py | Orchestrates the pipeline: type detection, cache check, API call, field mapping |
whisper_enrichment_queries.py | Builds parameterized Cypher queries for domain and IP enrichment |
whisper_enrichment_parsers.py | Parses API responses into flat dictionaries |
whisper_field_mapper.py | Maps parsed results to whisper_ prefixed and CIM-aliased fields |
All API calls go through WhisperAPIClient, which handles rate limiting, retries, and connection pooling. Results are cached in the whisper_enrichment_cache KV Store collection via whisper_cache.py.
Enrichment pipeline
- Type detection -- figures out if the indicator is an IP (IPv4 regex) or domain
- Cache check -- looks in the
whisper_enrichment_cacheKV Store for a cached result - API enrichment -- queries the Knowledge Graph with parameterized Cypher
- Field mapping -- maps graph results to CIM-compliant field names with a
whisper_prefix - Event output -- appends the enrichment fields to the original event
Domain enrichment
Domain enrichment runs in two stages to stay within the API's 2-hop depth limit:
- Resolve — maps the hostname to its IP addresses (1 hop)
- Infrastructure — looks up BGP context for the first resolved IP via the
ANNOUNCED_BYpath (matching the IP enrichment path)
Stage 1: HOSTNAME → RESOLVES_TO → IPV4
Stage 2: IPV4 → ANNOUNCED_BY → PREFIX ← ROUTES ← ASN
(then: ASN → HAS_NAME, ASN → HAS_COUNTRY as separate 1-hop queries)
The IP node's inline threat properties (threatScore, isThreat, isTor, etc.) are returned directly from Stage 2. A separate explain() call is only made when the inline data is absent.
Example:
index=dns sourcetype=dns
| whisperlookup field=query type=domain
| table query whisper_ip whisper_prefix whisper_asn whisper_asn_name whisper_country
Output fields: whisper_ip, whisper_prefix, whisper_asn, whisper_asn_name, whisper_country, whisper_cohost_count, plus inline threat fields when available (see Threat intelligence enrichment)
IP enrichment
Resolves an IP to its network context using the ANNOUNCED_BY path:
IPV4 → ANNOUNCED_BY → PREFIX ← ROUTES ← ASN → HAS_NAME → ASN_NAME
ASN → HAS_COUNTRY → COUNTRY
Inline threat properties on the IPV4 node are returned in the same query, eliminating a separate explain() call when the data is present.
Example:
index=firewall sourcetype=pan:traffic
| whisperlookup field=dest_ip type=ip
| table dest_ip whisper_prefix whisper_asn whisper_asn_name whisper_country
Output fields: whisper_prefix, whisper_asn, whisper_asn_name, whisper_country, whisper_reverse_dns_count, whisper_cohost_count, inline threat fields (see below), and ASN threat reputation fields when available
Note: Private IP addresses (RFC 1918) are automatically skipped.
Threat intelligence enrichment
Threat data is returned in two ways:
- Inline —
threatScore,isThreat,isTor,isC2, and related boolean flags are properties on IPV4 nodes in the graph. The infrastructure queries (domain and IP) return these directly. No extra API call is needed. - explain() — a richer assessment including explanation text, contributing factors, per-feed sources, and first/last seen dates. The
explain()call is skipped when inline data is already present (threat_scoreis non-null).
index=proxy sourcetype=squid
| whisperlookup field=dest_host include_threat_intel=true include_feeds=true
| where whisper_threat_score > 30
| table dest_host whisper_threat_level whisper_threat_score whisper_feed_names whisper_threat_explanation
Output fields from inline data:
| Field | Description |
|---|---|
whisper_threat_score | Numeric threat score (0-100+, unbounded float) |
whisper_threat_level | NONE/INFO/LOW/MEDIUM/HIGH/CRITICAL (derived from score when API returns null) |
whisper_is_threat | Boolean: indicator is known threat |
whisper_is_tor | Boolean: Tor exit node |
whisper_is_c2 | Boolean: command-and-control server |
whisper_is_malware | Boolean: malware distribution |
whisper_is_phishing | Boolean: phishing host |
whisper_is_spam | Boolean: spam source |
whisper_is_bruteforce | Boolean: brute-force source |
whisper_is_scanner | Boolean: network scanner |
whisper_is_blacklist | Boolean: on public blacklist |
whisper_is_proxy | Boolean: open proxy |
whisper_is_vpn | Boolean: known VPN exit |
whisper_is_anonymizer | Boolean: anonymization service |
whisper_is_whitelist | Boolean: explicitly whitelisted |
whisper_threat_sources_count | Number of threat intelligence sources listing this indicator |
whisper_threat_first_seen | Earliest date this indicator appeared in any feed |
whisper_threat_last_seen | Most recent date this indicator appeared in any feed |
ASN threat reputation fields (returned with IP and domain enrichment):
| Field | Description |
|---|---|
whisper_asn_threat_level | ASN overall threat level: NONE/LOW/MEDIUM/HIGH/CRITICAL |
whisper_asn_threat_score | ASN composite threat score (numeric) |
whisper_asn_max_threat_score | Highest single-prefix threat score within the ASN |
whisper_asn_avg_threat_score | Average threat score across the ASN's prefixes |
whisper_asn_has_threatening_prefixes | Boolean: ASN contains at least one high-risk prefix |
Null-safe fields: ASN threat fields are only present in enrichment output when the API returns a non-null value. Use
isnotnull(whisper_asn_threat_level)in SPL to filter only events where the API provided ASN reputation data.
Additional fields from explain() (when called):
| Field | Description |
|---|---|
whisper_threat_explanation | Human-readable threat summary |
whisper_threat_factors | Contributing factors (multivalue) |
whisper_threat_sources | Per-feed source data (list of dicts) |
whisper_threat_feed_ids | Feed identifiers for ES threat_key |
whisper_threat_breakdown | Component scores from the explain API |
whisper_threat_available | Whether threat data is available |
whisper_threat_cached | Whether the explain response was cached |
Score range:
whisper_threat_scoreis an unbounded float (typically 0-100+), not a 0-1 fraction. Thresholds: >= 50 is high confidence, >= 10 is moderate.
WHOIS enrichment
Domain enrichment automatically includes WHOIS data when available:
HOSTNAME → HAS_REGISTRAR → REGISTRAR
HOSTNAME → REGISTERED_BY → ORGANIZATION
HOSTNAME → HAS_EMAIL → EMAIL
HOSTNAME → HAS_PHONE → PHONE
HOSTNAME → PREV_REGISTRAR → REGISTRAR (previous)
Example:
index=dns sourcetype=dns
| whisperlookup field=query type=domain
| table query whisper_registrar whisper_registrant_org whisper_registrant_email whisper_organization
Output fields:
| Field | Description |
|---|---|
whisper_registrar | Domain registrar name |
whisper_registrant_org | Registrant organization |
whisper_registrant_email | Registrant contact email |
whisper_registrant_phone | Registrant phone number |
whisper_registration_date | Domain registration date |
whisper_expiration_date | Domain expiration date |
whisper_prev_registrar | Previous registrar (registrar change detection) |
whisper_organization | Registrant organization via REGISTERED_BY edge |
Sparse WHOIS data: WHOIS data varies by domain. Not all fields will be populated for every domain. Fields use OPTIONAL MATCH and will be absent (not empty) when data is unavailable.
GeoIP city-level enrichment
IP enrichment automatically includes city-level geolocation:
IPV4 → LOCATED_IN → CITY
CITY nodes contain latitude, longitude, and country code embedded in the name.
Example:
index=firewall sourcetype=pan:traffic
| whisperlookup field=dest_ip type=ip
| table dest_ip whisper_geo_city whisper_geo_country whisper_geo_latitude whisper_geo_longitude
Output fields:
| Field | Description |
|---|---|
whisper_geo_city | City name (e.g., "Mountain View") |
whisper_geo_country | Country code extracted from city name (e.g., "US") |
whisper_geo_latitude | City latitude (decimal degrees) |
whisper_geo_longitude | City longitude (decimal degrees) |
Anycast IPs: Anycast IPs (e.g., 1.1.1.1) may not have a single LOCATED_IN edge. GeoIP fields will be absent for such IPs.
HOSTNAME threat properties
Domain enrichment queries threat properties directly from the HOSTNAME node, independent of any IPV4-derived threat data:
Example:
index=dns sourcetype=dns
| whisperlookup field=query type=domain include_threat_intel=true
| where whisper_hostname_threat_level="HIGH" OR whisper_hostname_threat_level="CRITICAL"
| table query whisper_hostname_threat_score whisper_hostname_threat_level
Output fields: All fields are prefixed with hostname_ to distinguish from IP-level threat data:
| Field | Description |
|---|---|
whisper_hostname_threat_score | HOSTNAME node threat score |
whisper_hostname_threat_level | HOSTNAME threat level (NONE/LOW/MEDIUM/HIGH/CRITICAL) |
whisper_hostname_is_spam | HOSTNAME is a spam source |
whisper_hostname_is_proxy | HOSTNAME is a proxy |
whisper_hostname_is_vpn | HOSTNAME is a VPN exit |
| (etc.) | All is_* booleans available with hostname_ prefix |
Prefix threat assessment
IP enrichment includes threat data from ANNOUNCED_PREFIX and REGISTERED_PREFIX nodes:
IPV4 → ANNOUNCED_BY → ANNOUNCED_PREFIX (BGP routing)
IPV4 → BELONGS_TO → REGISTERED_PREFIX (RIR allocation)
Output fields:
| Field | Description |
|---|---|
whisper_announced_prefix | BGP announced prefix name |
whisper_ap_threat_score | Announced prefix threat score |
whisper_ap_threat_level | Announced prefix threat level |
whisper_ap_is_threat | Announced prefix is threat |
whisper_registered_prefix | RIR registered prefix name |
whisper_rp_threat_score | Registered prefix threat score |
whisper_rp_threat_level | Registered prefix threat level |
whisper_rp_is_threat | Registered prefix is threat |
BGP hijack detection
IP enrichment automatically compares the announcing ASN (BGP) with the registered ASN (RIR) to detect potential route hijacking:
Example:
index=firewall sourcetype=pan:traffic
| whisperlookup field=dest_ip type=ip
| where whisper_bgp_hijack_detected="true"
| table dest_ip whisper_bgp_announcing_asn whisper_bgp_registered_asn
Output fields:
| Field | Description |
|---|---|
whisper_bgp_hijack_detected | Boolean: announcing ASN differs from registered ASN |
whisper_bgp_announcing_asn | ASN currently announcing the prefix via BGP |
whisper_bgp_registered_asn | ASN registered as the prefix owner with RIR |
whisper_bgp_announced_prefix | The announced prefix |
whisper_bgp_registered_prefix | The registered prefix |
Warning: BGP hijack detection carries the highest risk score (70 points) in the risk scoring system. A detected hijack means the IP's traffic may be routed through an unauthorized network.
Web link graph enrichment
Domain enrichment includes web link data from the LINKS_TO relationship (billions of edges):
HOSTNAME → LINKS_TO → HOSTNAME (outbound)
HOSTNAME ← LINKS_TO ← HOSTNAME (inbound)
Example:
index=dns sourcetype=dns
| whisperlookup field=query type=domain
| where whisper_link_count > 0
| table query whisper_link_count whisper_outbound_links whisper_inbound_links
Output fields:
| Field | Description |
|---|---|
whisper_linked_domains | Deduplicated list of all linked domains |
whisper_link_count | Total unique linked domains |
whisper_suspicious_link_count | Number of links to/from suspicious or threat-listed domains |
whisper_outbound_links | Domains this domain links to (up to 25) |
whisper_inbound_links | Domains that link to this domain (up to 25) |
Tip: Domains with many inbound links from legitimate sites are more likely to be trustworthy. Domains with no inbound links or linked only by suspicious sites are flagged in the risk score.
CNAME chain enrichment
Follows CNAME alias chains up to 5 hops:
HOSTNAME -[:ALIAS_OF*1..5]-> HOSTNAME
Example:
index=dns sourcetype=dns
| whisperlookup field=query include_cname=true
| where whisper_cname_depth > 0
| table query whisper_cname_chain whisper_cname_target whisper_cname_depth
Output fields: whisper_cname_chain, whisper_cname_depth, whisper_cname_target
Nameserver enrichment
Pulls nameservers for a domain:
HOSTNAME <-[:NAMESERVER_FOR]- HOSTNAME
Example:
index=dns sourcetype=dns
| whisperlookup field=query include_nameserver=true
| table query whisper_nameservers
Output fields: whisper_nameservers (comma-separated list)
CIM field mapping
Enrichment fields are aliased to CIM-compliant names:
| Whisper Field | CIM Field | CIM Data Model |
|---|---|---|
whisper_ip | dest_ip | Network Resolution |
whisper_country | dest_country | Network Resolution |
whisper_asn | dest_asn | Network Resolution |
whisper_threat_score | threat_score | Threat Intelligence |
whisper_threat_level | threat_level | Threat Intelligence |
whisper_risk_score | risk_score | Threat Intelligence |
whisper_risk_level | risk_level | Threat Intelligence |
whisper_is_threat | is_threat | Threat Intelligence |
whisper_is_c2 | is_c2 | Threat Intelligence |
whisper_is_tor | is_tor | Threat Intelligence |
whisper_is_malware | is_malware | Threat Intelligence |
whisper_is_phishing | is_phishing | Threat Intelligence |
whisper_is_anonymizer | is_anonymizer | Threat Intelligence |
whisper_is_spam | is_spam | Threat Intelligence |
whisper_is_bruteforce | is_bruteforce | Threat Intelligence |
whisper_is_scanner | is_scanner | Threat Intelligence |
whisper_is_blacklist | is_blacklist | Threat Intelligence |
whisper_is_proxy | is_proxy | Threat Intelligence |
whisper_is_vpn | is_vpn | Threat Intelligence |
whisper_is_whitelist | is_whitelist | Threat Intelligence |
The sourcetype whisper:enrichment is tagged with CIM tags: network, resolution, dns.
Also set automatically:
vendor=Whisper Securityvendor_product=Whisper Knowledge Graph
Caching
All enrichment results are cached in the whisper_enrichment_cache KV Store collection.
| Setting | Default | Description |
|---|---|---|
| Cache TTL | 3600 seconds (1 hour) | How long cached results are valid |
| Cache collection | whisper_enrichment_cache | KV Store collection name |
The cache is keyed by indicator + indicator_type. The Whisper - Evict Expired Cache Entries saved search runs hourly (when enabled) to clean up expired entries.
To manually flush the cache:
| whisperflush collection=cache
Pre-computed watchlist enrichment
If you have indicators that need instant enrichment (say, for alerts), you can pre-compute results on a schedule:
- Create a CSV or KV Store collection with indicators to watch
- Configure the Whisper Watchlist Enrichment modular input
- Set the enrichment interval (minimum 300 seconds)
- Pre-computed results are stored in
whisper_precomputed_enrichment
The whisperlookup command checks the pre-computed collection before making API calls.
Performance
Throughput
| Scenario | Throughput | Notes |
|---|---|---|
| Cache hit | 5,000+ events/sec | KV Store lookup only, no API call |
| Cache miss (IP) | 10-30 events/sec | Single API call per unique IP |
| Cache miss (domain) | 8-25 events/sec | Two-stage query (resolve + infrastructure) |
| Mixed (80% cache hit) | 500-2,000 events/sec | Typical production workload |
Optimization tips
- Specify indicator type: Use
type=iportype=domaininstead oftype=autoto skip type detection - Disable unused enrichment: Set
include_threat_intel=false,include_cname=false,include_nameserver=falseto skip API calls you do not need - Filter before enriching: Apply
whereorsearchfilters beforewhisperlookupto reduce the number of indicators - Use pre-built macros: Common investigation patterns are optimized in the 8 macros
- Monitor cache hit rates: Caching reduces API calls by 5-10x for repeated indicators. Check
| inputlookup whisper_enrichment_cache | stats countto monitor cache size - Pre-compute for alerting: Use the watchlist input for indicators that need instant lookup without API latency
See the Performance and Sizing reference for detailed benchmarks and sizing recommendations.