Enrichment Pipeline

How the enrichment pipeline works: domain, IP, threat intel, WHOIS, GeoIP, BGP.

Updated April 2026Splunk Integration

Overview

The enrichment pipeline adds infrastructure context from the Knowledge Graph (billions of nodes, tens of billions of edges) to your security events.

Diagram Diagram

Architecture

The enrichment pipeline is a streaming search command (whisperlookup) that processes events inline in the SPL pipeline. Each event passes through five stages:

Diagram Diagram

The pipeline is built from four modules:

Module	Role
`whisper_enrichment.py`	Orchestrates the pipeline: type detection, cache check, API call, field mapping
`whisper_enrichment_queries.py`	Builds parameterized Cypher queries for domain and IP enrichment
`whisper_enrichment_parsers.py`	Parses API responses into flat dictionaries
`whisper_field_mapper.py`	Maps parsed results to `whisper_` prefixed and CIM-aliased fields

All API calls go through WhisperAPIClient, which handles rate limiting, retries, and connection pooling. Results are cached in the whisper_enrichment_cache KV Store collection via whisper_cache.py.

Enrichment pipeline

Type detection -- figures out if the indicator is an IP (IPv4 regex) or domain
Cache check -- looks in the whisper_enrichment_cache KV Store for a cached result
API enrichment -- queries the Knowledge Graph with parameterized Cypher
Field mapping -- maps graph results to CIM-compliant field names with a whisper_ prefix
Event output -- appends the enrichment fields to the original event

Domain enrichment

Domain enrichment runs in two stages to stay within the API's 2-hop depth limit:

Resolve — maps the hostname to its IP addresses (1 hop)
Infrastructure — looks up BGP context for the first resolved IP via the ANNOUNCED_BY path (matching the IP enrichment path)

Stage 1: HOSTNAME → RESOLVES_TO → IPV4
Stage 2: IPV4 → ANNOUNCED_BY → PREFIX ← ROUTES ← ASN
         (then: ASN → HAS_NAME, ASN → HAS_COUNTRY as separate 1-hop queries)

The IP node's inline threat properties (threatScore, isThreat, isTor, etc.) are returned directly from Stage 2. A separate explain() call is only made when the inline data is absent.

Example:

index=dns sourcetype=dns
| whisperlookup field=query type=domain
| table query whisper_ip whisper_prefix whisper_asn whisper_asn_name whisper_country

Output fields: whisper_ip, whisper_prefix, whisper_asn, whisper_asn_name, whisper_country, whisper_cohost_count, plus inline threat fields when available (see Threat intelligence enrichment)

IP enrichment

Resolves an IP to its network context using the ANNOUNCED_BY path:

IPV4 → ANNOUNCED_BY → PREFIX ← ROUTES ← ASN → HAS_NAME → ASN_NAME
                                          ASN → HAS_COUNTRY → COUNTRY

Inline threat properties on the IPV4 node are returned in the same query, eliminating a separate explain() call when the data is present.

Example:

index=firewall sourcetype=pan:traffic
| whisperlookup field=dest_ip type=ip
| table dest_ip whisper_prefix whisper_asn whisper_asn_name whisper_country

Output fields: whisper_prefix, whisper_asn, whisper_asn_name, whisper_country, whisper_reverse_dns_count, whisper_cohost_count, inline threat fields (see below), and ASN threat reputation fields when available

Note: Private IP addresses (RFC 1918) are automatically skipped.

Threat intelligence enrichment

Threat data is returned in two ways:

Inline — threatScore, isThreat, isTor, isC2, and related boolean flags are properties on IPV4 nodes in the graph. The infrastructure queries (domain and IP) return these directly. No extra API call is needed.
explain() — a richer assessment including explanation text, contributing factors, per-feed sources, and first/last seen dates. The explain() call is skipped when inline data is already present (threat_score is non-null).

index=proxy sourcetype=squid
| whisperlookup field=dest_host include_threat_intel=true include_feeds=true
| where whisper_threat_score > 30
| table dest_host whisper_threat_level whisper_threat_score whisper_feed_names whisper_threat_explanation

Output fields from inline data:

Field	Description
`whisper_threat_score`	Numeric threat score (0-100+, unbounded float)
`whisper_threat_level`	NONE/INFO/LOW/MEDIUM/HIGH/CRITICAL (derived from score when API returns null)
`whisper_is_threat`	Boolean: indicator is known threat
`whisper_is_tor`	Boolean: Tor exit node
`whisper_is_c2`	Boolean: command-and-control server
`whisper_is_malware`	Boolean: malware distribution
`whisper_is_phishing`	Boolean: phishing host
`whisper_is_spam`	Boolean: spam source
`whisper_is_bruteforce`	Boolean: brute-force source
`whisper_is_scanner`	Boolean: network scanner
`whisper_is_blacklist`	Boolean: on public blacklist
`whisper_is_proxy`	Boolean: open proxy
`whisper_is_vpn`	Boolean: known VPN exit
`whisper_is_anonymizer`	Boolean: anonymization service
`whisper_is_whitelist`	Boolean: explicitly whitelisted
`whisper_threat_sources_count`	Number of threat intelligence sources listing this indicator
`whisper_threat_first_seen`	Earliest date this indicator appeared in any feed
`whisper_threat_last_seen`	Most recent date this indicator appeared in any feed

ASN threat reputation fields (returned with IP and domain enrichment):

Field	Description
`whisper_asn_threat_level`	ASN overall threat level: NONE/LOW/MEDIUM/HIGH/CRITICAL
`whisper_asn_threat_score`	ASN composite threat score (numeric)
`whisper_asn_max_threat_score`	Highest single-prefix threat score within the ASN
`whisper_asn_avg_threat_score`	Average threat score across the ASN's prefixes
`whisper_asn_has_threatening_prefixes`	Boolean: ASN contains at least one high-risk prefix

Null-safe fields: ASN threat fields are only present in enrichment output when the API returns a non-null value. Use isnotnull(whisper_asn_threat_level) in SPL to filter only events where the API provided ASN reputation data.

Additional fields from explain() (when called):

Field	Description
`whisper_threat_explanation`	Human-readable threat summary
`whisper_threat_factors`	Contributing factors (multivalue)
`whisper_threat_sources`	Per-feed source data (list of dicts)
`whisper_threat_feed_ids`	Feed identifiers for ES `threat_key`
`whisper_threat_breakdown`	Component scores from the explain API
`whisper_threat_available`	Whether threat data is available
`whisper_threat_cached`	Whether the explain response was cached

Score range: whisper_threat_score is an unbounded float (typically 0-100+), not a 0-1 fraction. Thresholds: >= 50 is high confidence, >= 10 is moderate.

WHOIS enrichment

Domain enrichment automatically includes WHOIS data when available:

HOSTNAME → HAS_REGISTRAR → REGISTRAR
HOSTNAME → REGISTERED_BY → ORGANIZATION
HOSTNAME → HAS_EMAIL → EMAIL
HOSTNAME → HAS_PHONE → PHONE
HOSTNAME → PREV_REGISTRAR → REGISTRAR (previous)

Example:

index=dns sourcetype=dns
| whisperlookup field=query type=domain
| table query whisper_registrar whisper_registrant_org whisper_registrant_email whisper_organization

Output fields:

Field	Description
`whisper_registrar`	Domain registrar name
`whisper_registrant_org`	Registrant organization
`whisper_registrant_email`	Registrant contact email
`whisper_registrant_phone`	Registrant phone number
`whisper_registration_date`	Domain registration date
`whisper_expiration_date`	Domain expiration date
`whisper_prev_registrar`	Previous registrar (registrar change detection)
`whisper_organization`	Registrant organization via REGISTERED_BY edge

Sparse WHOIS data: WHOIS data varies by domain. Not all fields will be populated for every domain. Fields use OPTIONAL MATCH and will be absent (not empty) when data is unavailable.

GeoIP city-level enrichment

IP enrichment automatically includes city-level geolocation:

IPV4 → LOCATED_IN → CITY

CITY nodes contain latitude, longitude, and country code embedded in the name.

Example:

index=firewall sourcetype=pan:traffic
| whisperlookup field=dest_ip type=ip
| table dest_ip whisper_geo_city whisper_geo_country whisper_geo_latitude whisper_geo_longitude

Output fields:

Field	Description
`whisper_geo_city`	City name (e.g., "Mountain View")
`whisper_geo_country`	Country code extracted from city name (e.g., "US")
`whisper_geo_latitude`	City latitude (decimal degrees)
`whisper_geo_longitude`	City longitude (decimal degrees)

Anycast IPs: Anycast IPs (e.g., 1.1.1.1) may not have a single LOCATED_IN edge. GeoIP fields will be absent for such IPs.

HOSTNAME threat properties

Domain enrichment queries threat properties directly from the HOSTNAME node, independent of any IPV4-derived threat data:

Example:

index=dns sourcetype=dns
| whisperlookup field=query type=domain include_threat_intel=true
| where whisper_hostname_threat_level="HIGH" OR whisper_hostname_threat_level="CRITICAL"
| table query whisper_hostname_threat_score whisper_hostname_threat_level

Output fields: All fields are prefixed with hostname_ to distinguish from IP-level threat data:

Field	Description
`whisper_hostname_threat_score`	HOSTNAME node threat score
`whisper_hostname_threat_level`	HOSTNAME threat level (NONE/LOW/MEDIUM/HIGH/CRITICAL)
`whisper_hostname_is_spam`	HOSTNAME is a spam source
`whisper_hostname_is_proxy`	HOSTNAME is a proxy
`whisper_hostname_is_vpn`	HOSTNAME is a VPN exit
(etc.)	All `is_*` booleans available with `hostname_` prefix

Prefix threat assessment

IP enrichment includes threat data from ANNOUNCED_PREFIX and REGISTERED_PREFIX nodes:

IPV4 → ANNOUNCED_BY → ANNOUNCED_PREFIX (BGP routing)
IPV4 → BELONGS_TO → REGISTERED_PREFIX (RIR allocation)

Output fields:

Field	Description
`whisper_announced_prefix`	BGP announced prefix name
`whisper_ap_threat_score`	Announced prefix threat score
`whisper_ap_threat_level`	Announced prefix threat level
`whisper_ap_is_threat`	Announced prefix is threat
`whisper_registered_prefix`	RIR registered prefix name
`whisper_rp_threat_score`	Registered prefix threat score
`whisper_rp_threat_level`	Registered prefix threat level
`whisper_rp_is_threat`	Registered prefix is threat

BGP hijack detection

IP enrichment automatically compares the announcing ASN (BGP) with the registered ASN (RIR) to detect potential route hijacking:

Example:

index=firewall sourcetype=pan:traffic
| whisperlookup field=dest_ip type=ip
| where whisper_bgp_hijack_detected="true"
| table dest_ip whisper_bgp_announcing_asn whisper_bgp_registered_asn

Output fields:

Field	Description
`whisper_bgp_hijack_detected`	Boolean: announcing ASN differs from registered ASN
`whisper_bgp_announcing_asn`	ASN currently announcing the prefix via BGP
`whisper_bgp_registered_asn`	ASN registered as the prefix owner with RIR
`whisper_bgp_announced_prefix`	The announced prefix
`whisper_bgp_registered_prefix`	The registered prefix

Warning: BGP hijack detection carries the highest risk score (70 points) in the risk scoring system. A detected hijack means the IP's traffic may be routed through an unauthorized network.

Web link graph enrichment

Domain enrichment includes web link data from the LINKS_TO relationship (billions of edges):

HOSTNAME → LINKS_TO → HOSTNAME (outbound)
HOSTNAME ← LINKS_TO ← HOSTNAME (inbound)

Example:

index=dns sourcetype=dns
| whisperlookup field=query type=domain
| where whisper_link_count > 0
| table query whisper_link_count whisper_outbound_links whisper_inbound_links

Output fields:

Field	Description
`whisper_linked_domains`	Deduplicated list of all linked domains
`whisper_link_count`	Total unique linked domains
`whisper_suspicious_link_count`	Number of links to/from suspicious or threat-listed domains
`whisper_outbound_links`	Domains this domain links to (up to 25)
`whisper_inbound_links`	Domains that link to this domain (up to 25)

Tip: Domains with many inbound links from legitimate sites are more likely to be trustworthy. Domains with no inbound links or linked only by suspicious sites are flagged in the risk score.

CNAME chain enrichment

Follows CNAME alias chains up to 5 hops:

HOSTNAME -[:ALIAS_OF*1..5]-> HOSTNAME

Example:

index=dns sourcetype=dns
| whisperlookup field=query include_cname=true
| where whisper_cname_depth > 0
| table query whisper_cname_chain whisper_cname_target whisper_cname_depth

Output fields: whisper_cname_chain, whisper_cname_depth, whisper_cname_target

Nameserver enrichment

Pulls nameservers for a domain:

HOSTNAME <-[:NAMESERVER_FOR]- HOSTNAME

Example:

index=dns sourcetype=dns
| whisperlookup field=query include_nameserver=true
| table query whisper_nameservers

Output fields: whisper_nameservers (comma-separated list)

CIM field mapping

Enrichment fields are aliased to CIM-compliant names:

Whisper Field	CIM Field	CIM Data Model
`whisper_ip`	`dest_ip`	Network Resolution
`whisper_country`	`dest_country`	Network Resolution
`whisper_asn`	`dest_asn`	Network Resolution
`whisper_threat_score`	`threat_score`	Threat Intelligence
`whisper_threat_level`	`threat_level`	Threat Intelligence
`whisper_risk_score`	`risk_score`	Threat Intelligence
`whisper_risk_level`	`risk_level`	Threat Intelligence
`whisper_is_threat`	`is_threat`	Threat Intelligence
`whisper_is_c2`	`is_c2`	Threat Intelligence
`whisper_is_tor`	`is_tor`	Threat Intelligence
`whisper_is_malware`	`is_malware`	Threat Intelligence
`whisper_is_phishing`	`is_phishing`	Threat Intelligence
`whisper_is_anonymizer`	`is_anonymizer`	Threat Intelligence
`whisper_is_spam`	`is_spam`	Threat Intelligence
`whisper_is_bruteforce`	`is_bruteforce`	Threat Intelligence
`whisper_is_scanner`	`is_scanner`	Threat Intelligence
`whisper_is_blacklist`	`is_blacklist`	Threat Intelligence
`whisper_is_proxy`	`is_proxy`	Threat Intelligence
`whisper_is_vpn`	`is_vpn`	Threat Intelligence
`whisper_is_whitelist`	`is_whitelist`	Threat Intelligence

The sourcetype whisper:enrichment is tagged with CIM tags: network, resolution, dns.

Also set automatically:

vendor = Whisper Security
vendor_product = Whisper Knowledge Graph

Caching

All enrichment results are cached in the whisper_enrichment_cache KV Store collection.

Setting	Default	Description
Cache TTL	3600 seconds (1 hour)	How long cached results are valid
Cache collection	`whisper_enrichment_cache`	KV Store collection name

The cache is keyed by indicator + indicator_type. The Whisper - Evict Expired Cache Entries saved search runs hourly (when enabled) to clean up expired entries.

To manually flush the cache:

| whisperflush collection=cache

Pre-computed watchlist enrichment

If you have indicators that need instant enrichment (say, for alerts), you can pre-compute results on a schedule:

Create a CSV or KV Store collection with indicators to watch
Configure the Whisper Watchlist Enrichment modular input
Set the enrichment interval (minimum 300 seconds)
Pre-computed results are stored in whisper_precomputed_enrichment

The whisperlookup command checks the pre-computed collection before making API calls.

Performance

Throughput

Scenario	Throughput	Notes
Cache hit	5,000+ events/sec	KV Store lookup only, no API call
Cache miss (IP)	10-30 events/sec	Single API call per unique IP
Cache miss (domain)	8-25 events/sec	Two-stage query (resolve + infrastructure)
Mixed (80% cache hit)	500-2,000 events/sec	Typical production workload

Optimization tips

Specify indicator type: Use type=ip or type=domain instead of type=auto to skip type detection
Disable unused enrichment: Set include_threat_intel=false, include_cname=false, include_nameserver=false to skip API calls you do not need
Filter before enriching: Apply where or search filters before whisperlookup to reduce the number of indicators
Use pre-built macros: Common investigation patterns are optimized in the 8 macros
Monitor cache hit rates: Caching reduces API calls by 5-10x for repeated indicators. Check | inputlookup whisper_enrichment_cache | stats count to monitor cache size
Pre-compute for alerting: Use the watchlist input for indicators that need instant lookup without API latency

See the Performance and Sizing reference for detailed benchmarks and sizing recommendations.

PreviousSearch Commands NextDashboards

Enrichment Pipeline

Enrichment Pipeline Documentation

Overview

Architecture

Enrichment pipeline

Domain enrichment

IP enrichment

Threat intelligence enrichment

WHOIS enrichment

GeoIP city-level enrichment

HOSTNAME threat properties

Prefix threat assessment

BGP hijack detection

Web link graph enrichment

CNAME chain enrichment

Nameserver enrichment

CIM field mapping

Caching

Pre-computed watchlist enrichment

Performance

Throughput

Optimization tips