Enrichment Pipeline

How the enrichment pipeline works: domain, IP, threat intel, WHOIS, GeoIP, BGP.

Updated April 2026Splunk Integration

Enrichment Pipeline Documentation

Overview

The enrichment pipeline adds infrastructure context from the Knowledge Graph (billions of nodes, tens of billions of edges) to your security events.

DiagramDiagram

Architecture

The enrichment pipeline is a streaming search command (whisperlookup) that processes events inline in the SPL pipeline. Each event passes through five stages:

DiagramDiagram

The pipeline is built from four modules:

ModuleRole
whisper_enrichment.pyOrchestrates the pipeline: type detection, cache check, API call, field mapping
whisper_enrichment_queries.pyBuilds parameterized Cypher queries for domain and IP enrichment
whisper_enrichment_parsers.pyParses API responses into flat dictionaries
whisper_field_mapper.pyMaps parsed results to whisper_ prefixed and CIM-aliased fields

All API calls go through WhisperAPIClient, which handles rate limiting, retries, and connection pooling. Results are cached in the whisper_enrichment_cache KV Store collection via whisper_cache.py.

Enrichment pipeline

  1. Type detection -- figures out if the indicator is an IP (IPv4 regex) or domain
  2. Cache check -- looks in the whisper_enrichment_cache KV Store for a cached result
  3. API enrichment -- queries the Knowledge Graph with parameterized Cypher
  4. Field mapping -- maps graph results to CIM-compliant field names with a whisper_ prefix
  5. Event output -- appends the enrichment fields to the original event

Domain enrichment

Domain enrichment runs in two stages to stay within the API's 2-hop depth limit:

  1. Resolve — maps the hostname to its IP addresses (1 hop)
  2. Infrastructure — looks up BGP context for the first resolved IP via the ANNOUNCED_BY path (matching the IP enrichment path)
Stage 1: HOSTNAME → RESOLVES_TO → IPV4
Stage 2: IPV4 → ANNOUNCED_BY → PREFIX ← ROUTES ← ASN
         (then: ASN → HAS_NAME, ASN → HAS_COUNTRY as separate 1-hop queries)

The IP node's inline threat properties (threatScore, isThreat, isTor, etc.) are returned directly from Stage 2. A separate explain() call is only made when the inline data is absent.

Example:

index=dns sourcetype=dns
| whisperlookup field=query type=domain
| table query whisper_ip whisper_prefix whisper_asn whisper_asn_name whisper_country

Output fields: whisper_ip, whisper_prefix, whisper_asn, whisper_asn_name, whisper_country, whisper_cohost_count, plus inline threat fields when available (see Threat intelligence enrichment)

IP enrichment

Resolves an IP to its network context using the ANNOUNCED_BY path:

IPV4 → ANNOUNCED_BY → PREFIX ← ROUTES ← ASN → HAS_NAME → ASN_NAME
                                          ASN → HAS_COUNTRY → COUNTRY

Inline threat properties on the IPV4 node are returned in the same query, eliminating a separate explain() call when the data is present.

Example:

index=firewall sourcetype=pan:traffic
| whisperlookup field=dest_ip type=ip
| table dest_ip whisper_prefix whisper_asn whisper_asn_name whisper_country

Output fields: whisper_prefix, whisper_asn, whisper_asn_name, whisper_country, whisper_reverse_dns_count, whisper_cohost_count, inline threat fields (see below), and ASN threat reputation fields when available

Note: Private IP addresses (RFC 1918) are automatically skipped.

Threat intelligence enrichment

Threat data is returned in two ways:

  1. InlinethreatScore, isThreat, isTor, isC2, and related boolean flags are properties on IPV4 nodes in the graph. The infrastructure queries (domain and IP) return these directly. No extra API call is needed.
  2. explain() — a richer assessment including explanation text, contributing factors, per-feed sources, and first/last seen dates. The explain() call is skipped when inline data is already present (threat_score is non-null).
index=proxy sourcetype=squid
| whisperlookup field=dest_host include_threat_intel=true include_feeds=true
| where whisper_threat_score > 30
| table dest_host whisper_threat_level whisper_threat_score whisper_feed_names whisper_threat_explanation

Output fields from inline data:

FieldDescription
whisper_threat_scoreNumeric threat score (0-100+, unbounded float)
whisper_threat_levelNONE/INFO/LOW/MEDIUM/HIGH/CRITICAL (derived from score when API returns null)
whisper_is_threatBoolean: indicator is known threat
whisper_is_torBoolean: Tor exit node
whisper_is_c2Boolean: command-and-control server
whisper_is_malwareBoolean: malware distribution
whisper_is_phishingBoolean: phishing host
whisper_is_spamBoolean: spam source
whisper_is_bruteforceBoolean: brute-force source
whisper_is_scannerBoolean: network scanner
whisper_is_blacklistBoolean: on public blacklist
whisper_is_proxyBoolean: open proxy
whisper_is_vpnBoolean: known VPN exit
whisper_is_anonymizerBoolean: anonymization service
whisper_is_whitelistBoolean: explicitly whitelisted
whisper_threat_sources_countNumber of threat intelligence sources listing this indicator
whisper_threat_first_seenEarliest date this indicator appeared in any feed
whisper_threat_last_seenMost recent date this indicator appeared in any feed

ASN threat reputation fields (returned with IP and domain enrichment):

FieldDescription
whisper_asn_threat_levelASN overall threat level: NONE/LOW/MEDIUM/HIGH/CRITICAL
whisper_asn_threat_scoreASN composite threat score (numeric)
whisper_asn_max_threat_scoreHighest single-prefix threat score within the ASN
whisper_asn_avg_threat_scoreAverage threat score across the ASN's prefixes
whisper_asn_has_threatening_prefixesBoolean: ASN contains at least one high-risk prefix

Null-safe fields: ASN threat fields are only present in enrichment output when the API returns a non-null value. Use isnotnull(whisper_asn_threat_level) in SPL to filter only events where the API provided ASN reputation data.

Additional fields from explain() (when called):

FieldDescription
whisper_threat_explanationHuman-readable threat summary
whisper_threat_factorsContributing factors (multivalue)
whisper_threat_sourcesPer-feed source data (list of dicts)
whisper_threat_feed_idsFeed identifiers for ES threat_key
whisper_threat_breakdownComponent scores from the explain API
whisper_threat_availableWhether threat data is available
whisper_threat_cachedWhether the explain response was cached

Score range: whisper_threat_score is an unbounded float (typically 0-100+), not a 0-1 fraction. Thresholds: >= 50 is high confidence, >= 10 is moderate.

WHOIS enrichment

Domain enrichment automatically includes WHOIS data when available:

HOSTNAME → HAS_REGISTRAR → REGISTRAR
HOSTNAME → REGISTERED_BY → ORGANIZATION
HOSTNAME → HAS_EMAIL → EMAIL
HOSTNAME → HAS_PHONE → PHONE
HOSTNAME → PREV_REGISTRAR → REGISTRAR (previous)

Example:

index=dns sourcetype=dns
| whisperlookup field=query type=domain
| table query whisper_registrar whisper_registrant_org whisper_registrant_email whisper_organization

Output fields:

FieldDescription
whisper_registrarDomain registrar name
whisper_registrant_orgRegistrant organization
whisper_registrant_emailRegistrant contact email
whisper_registrant_phoneRegistrant phone number
whisper_registration_dateDomain registration date
whisper_expiration_dateDomain expiration date
whisper_prev_registrarPrevious registrar (registrar change detection)
whisper_organizationRegistrant organization via REGISTERED_BY edge

Sparse WHOIS data: WHOIS data varies by domain. Not all fields will be populated for every domain. Fields use OPTIONAL MATCH and will be absent (not empty) when data is unavailable.

GeoIP city-level enrichment

IP enrichment automatically includes city-level geolocation:

IPV4 → LOCATED_IN → CITY

CITY nodes contain latitude, longitude, and country code embedded in the name.

Example:

index=firewall sourcetype=pan:traffic
| whisperlookup field=dest_ip type=ip
| table dest_ip whisper_geo_city whisper_geo_country whisper_geo_latitude whisper_geo_longitude

Output fields:

FieldDescription
whisper_geo_cityCity name (e.g., "Mountain View")
whisper_geo_countryCountry code extracted from city name (e.g., "US")
whisper_geo_latitudeCity latitude (decimal degrees)
whisper_geo_longitudeCity longitude (decimal degrees)

Anycast IPs: Anycast IPs (e.g., 1.1.1.1) may not have a single LOCATED_IN edge. GeoIP fields will be absent for such IPs.

HOSTNAME threat properties

Domain enrichment queries threat properties directly from the HOSTNAME node, independent of any IPV4-derived threat data:

Example:

index=dns sourcetype=dns
| whisperlookup field=query type=domain include_threat_intel=true
| where whisper_hostname_threat_level="HIGH" OR whisper_hostname_threat_level="CRITICAL"
| table query whisper_hostname_threat_score whisper_hostname_threat_level

Output fields: All fields are prefixed with hostname_ to distinguish from IP-level threat data:

FieldDescription
whisper_hostname_threat_scoreHOSTNAME node threat score
whisper_hostname_threat_levelHOSTNAME threat level (NONE/LOW/MEDIUM/HIGH/CRITICAL)
whisper_hostname_is_spamHOSTNAME is a spam source
whisper_hostname_is_proxyHOSTNAME is a proxy
whisper_hostname_is_vpnHOSTNAME is a VPN exit
(etc.)All is_* booleans available with hostname_ prefix

Prefix threat assessment

IP enrichment includes threat data from ANNOUNCED_PREFIX and REGISTERED_PREFIX nodes:

IPV4 → ANNOUNCED_BY → ANNOUNCED_PREFIX (BGP routing)
IPV4 → BELONGS_TO → REGISTERED_PREFIX (RIR allocation)

Output fields:

FieldDescription
whisper_announced_prefixBGP announced prefix name
whisper_ap_threat_scoreAnnounced prefix threat score
whisper_ap_threat_levelAnnounced prefix threat level
whisper_ap_is_threatAnnounced prefix is threat
whisper_registered_prefixRIR registered prefix name
whisper_rp_threat_scoreRegistered prefix threat score
whisper_rp_threat_levelRegistered prefix threat level
whisper_rp_is_threatRegistered prefix is threat

BGP hijack detection

IP enrichment automatically compares the announcing ASN (BGP) with the registered ASN (RIR) to detect potential route hijacking:

Example:

index=firewall sourcetype=pan:traffic
| whisperlookup field=dest_ip type=ip
| where whisper_bgp_hijack_detected="true"
| table dest_ip whisper_bgp_announcing_asn whisper_bgp_registered_asn

Output fields:

FieldDescription
whisper_bgp_hijack_detectedBoolean: announcing ASN differs from registered ASN
whisper_bgp_announcing_asnASN currently announcing the prefix via BGP
whisper_bgp_registered_asnASN registered as the prefix owner with RIR
whisper_bgp_announced_prefixThe announced prefix
whisper_bgp_registered_prefixThe registered prefix

Warning: BGP hijack detection carries the highest risk score (70 points) in the risk scoring system. A detected hijack means the IP's traffic may be routed through an unauthorized network.

Domain enrichment includes web link data from the LINKS_TO relationship (billions of edges):

HOSTNAME → LINKS_TO → HOSTNAME (outbound)
HOSTNAME ← LINKS_TO ← HOSTNAME (inbound)

Example:

index=dns sourcetype=dns
| whisperlookup field=query type=domain
| where whisper_link_count > 0
| table query whisper_link_count whisper_outbound_links whisper_inbound_links

Output fields:

FieldDescription
whisper_linked_domainsDeduplicated list of all linked domains
whisper_link_countTotal unique linked domains
whisper_suspicious_link_countNumber of links to/from suspicious or threat-listed domains
whisper_outbound_linksDomains this domain links to (up to 25)
whisper_inbound_linksDomains that link to this domain (up to 25)

Tip: Domains with many inbound links from legitimate sites are more likely to be trustworthy. Domains with no inbound links or linked only by suspicious sites are flagged in the risk score.

CNAME chain enrichment

Follows CNAME alias chains up to 5 hops:

HOSTNAME -[:ALIAS_OF*1..5]-> HOSTNAME

Example:

index=dns sourcetype=dns
| whisperlookup field=query include_cname=true
| where whisper_cname_depth > 0
| table query whisper_cname_chain whisper_cname_target whisper_cname_depth

Output fields: whisper_cname_chain, whisper_cname_depth, whisper_cname_target

Nameserver enrichment

Pulls nameservers for a domain:

HOSTNAME <-[:NAMESERVER_FOR]- HOSTNAME

Example:

index=dns sourcetype=dns
| whisperlookup field=query include_nameserver=true
| table query whisper_nameservers

Output fields: whisper_nameservers (comma-separated list)

CIM field mapping

Enrichment fields are aliased to CIM-compliant names:

Whisper FieldCIM FieldCIM Data Model
whisper_ipdest_ipNetwork Resolution
whisper_countrydest_countryNetwork Resolution
whisper_asndest_asnNetwork Resolution
whisper_threat_scorethreat_scoreThreat Intelligence
whisper_threat_levelthreat_levelThreat Intelligence
whisper_risk_scorerisk_scoreThreat Intelligence
whisper_risk_levelrisk_levelThreat Intelligence
whisper_is_threatis_threatThreat Intelligence
whisper_is_c2is_c2Threat Intelligence
whisper_is_toris_torThreat Intelligence
whisper_is_malwareis_malwareThreat Intelligence
whisper_is_phishingis_phishingThreat Intelligence
whisper_is_anonymizeris_anonymizerThreat Intelligence
whisper_is_spamis_spamThreat Intelligence
whisper_is_bruteforceis_bruteforceThreat Intelligence
whisper_is_scanneris_scannerThreat Intelligence
whisper_is_blacklistis_blacklistThreat Intelligence
whisper_is_proxyis_proxyThreat Intelligence
whisper_is_vpnis_vpnThreat Intelligence
whisper_is_whitelistis_whitelistThreat Intelligence

The sourcetype whisper:enrichment is tagged with CIM tags: network, resolution, dns.

Also set automatically:

  • vendor = Whisper Security
  • vendor_product = Whisper Knowledge Graph

Caching

All enrichment results are cached in the whisper_enrichment_cache KV Store collection.

SettingDefaultDescription
Cache TTL3600 seconds (1 hour)How long cached results are valid
Cache collectionwhisper_enrichment_cacheKV Store collection name

The cache is keyed by indicator + indicator_type. The Whisper - Evict Expired Cache Entries saved search runs hourly (when enabled) to clean up expired entries.

To manually flush the cache:

| whisperflush collection=cache

Pre-computed watchlist enrichment

If you have indicators that need instant enrichment (say, for alerts), you can pre-compute results on a schedule:

  1. Create a CSV or KV Store collection with indicators to watch
  2. Configure the Whisper Watchlist Enrichment modular input
  3. Set the enrichment interval (minimum 300 seconds)
  4. Pre-computed results are stored in whisper_precomputed_enrichment

The whisperlookup command checks the pre-computed collection before making API calls.

Performance

Throughput

ScenarioThroughputNotes
Cache hit5,000+ events/secKV Store lookup only, no API call
Cache miss (IP)10-30 events/secSingle API call per unique IP
Cache miss (domain)8-25 events/secTwo-stage query (resolve + infrastructure)
Mixed (80% cache hit)500-2,000 events/secTypical production workload

Optimization tips

  • Specify indicator type: Use type=ip or type=domain instead of type=auto to skip type detection
  • Disable unused enrichment: Set include_threat_intel=false, include_cname=false, include_nameserver=false to skip API calls you do not need
  • Filter before enriching: Apply where or search filters before whisperlookup to reduce the number of indicators
  • Use pre-built macros: Common investigation patterns are optimized in the 8 macros
  • Monitor cache hit rates: Caching reduces API calls by 5-10x for repeated indicators. Check | inputlookup whisper_enrichment_cache | stats count to monitor cache size
  • Pre-compute for alerting: Use the watchlist input for indicators that need instant lookup without API latency

See the Performance and Sizing reference for detailed benchmarks and sizing recommendations.