Best Practices

The habits that make WhisperGraph queries return in milliseconds, plus the do-this-not-that table and the graph's known limitations.

Updated July 2026

A slow or empty WhisperGraph query almost always traces to the same handful of mistakes: an unanchored scan over a billion-node label, a traversal with no LIMIT, a synthesized edge buried in a variable-length pattern, or an edge walked in the wrong direction. None are subtle once you know them. The difference between an instant answer and a timeout is almost always whether you anchored and bounded the query.

The graph holds roughly 7.4 billion nodes and 39 billion edges, so the engine relies on you to start narrow. Anchor on an indexed name, bound every fan-out, and let the pre-joined structure do the work. For the full model see the Graph Schema; for ready-made recipes, the Use Cases.

The performance habits

Anchor every query with {name: "value"} on billion-node labels. HOSTNAME and IPV4 are too large to scan. An anchored lookup is an indexed, instant operation that bounds every downstream hop. Inline {name: "..."} and WHERE h.name = "..." are not equivalent at this scale — the inline form is what triggers the index. Unanchored scans on these labels do not finish.
Bound high-fan-out intermediates with WITH ... LIMIT before expanding. Anchor, narrow to a handful of nodes, then traverse outward. A trailing LIMIT caps the output, but the engine still expands every intermediate row first, so cut a wide middle hop down mid-pattern.
Write synthesized edges as explicit single hops. ROUTES, ANNOUNCED_BY, LISTED_IN, HAS_NAME, PEERS_WITH, and CONFLICTS_WITH are computed at query time. Split them out with WITH; combining a synthesized hop inside a variable-length [*..] pattern silently returns zero rows.
Use STARTS WITH and selective, leading-dot ENDS WITH over regex. Both are index-backed on .name. Avoid CONTAINS on ASN.name — it scans.
Use OPTIONAL MATCH for sparse WHOIS fields. Many domains have partial or redacted registration data; a plain MATCH drops the whole row when one piece is missing.
Always add a LIMIT, including on CALL ... YIELD ... RETURN. Use count() to size a result before pulling it, and never run a global edge count — use the stats endpoint instead.
Use UNWIND for batch lookups. Each element stays an anchored lookup, so a batch of a few hundred indicators is fast.
Pick the right IP-to-prefix edge. BELONGS_TO gives the allocated (RIR) prefix; ANNOUNCED_BY gives the BGP-announced prefix. Use ANNOUNCED_BY when you intend to walk on to the routing ASN.
Remember mail and nameserver edges point server → domain. A domain's mail servers are (domain)<-[:MAIL_FOR]-(mx).
Anchor LINKS_TO traversals. The hyperlink graph is the largest edge set; an unanchored walk through it is expensive.
Reach for the procedures first. explain(), whisper.origins(), whisper.variants(), and whisper.history() answer the hardest questions in one call — usually faster and cleaner than a hand-written deep traversal.

Anchor, then bound

MATCH (ip:IPV4 {name: "203.0.113.10"})<-[:RESOLVES_TO]-(sib:HOSTNAME)
WITH sib LIMIT 200
MATCH (sib)-[:LISTED_IN]->(f:FEED_SOURCE)
RETURN sib.name, collect(f.name) AS feeds LIMIT 50

The WITH sib LIMIT 200 caps the co-tenant set before the second hop fans out again. Without it, the second hop runs against the entire unbounded co-tenant set first. Use count() when you don't know how wide a node fans out — check cardinality before you pull the rows.

Use the stats endpoint for global counts

A whole-graph aggregate like MATCH ()-[r]->() RETURN count(r) tries to touch every edge and times out. For live node and edge totals, hit the stats endpoint instead — the per-type counts are precomputed and instant.

curl -s -A "whisper-client/1.0" https://graph.whisper.security/api/query/stats

Do this, not that

Do this	Not that
`MATCH (h:HOSTNAME {name: "x.com"})`	`MATCH (h:HOSTNAME) WHERE h.name CONTAINS "x"`
Anchor, then `WITH ... LIMIT`, then expand	One deep all-in-one pattern
`ENDS WITH ".cloudflare.com"` (selective)	`ENDS WITH "cloudflare.com"` (broad scan)
`count(p)` for a magnitude question	`count(DISTINCT p)` when you just need order of magnitude
`GET /api/query/stats` for totals	`MATCH ()-[r]->() RETURN count(r)`
`OPTIONAL MATCH` for WHOIS/geo	`MATCH` that silently drops sparse rows
`(domain)<-[:MAIL_FOR]-(mx)`	`(domain)-[:MAIL_FOR]->(mx)` (wrong direction)
`(ip)<-[:RESOLVES_TO]-(h)` for reverse DNS	`(ip)-[:RESOLVES_TO]->(h)` (forward-only edge)
`[:CHILD_OF*1..3]` (bounded)	`[:CHILD_OF*]` (unbounded)
`CALL explain("AS13335")`	Scan `ASN → PREFIX → IP → LISTED_IN` (times out)
`CALL whisper.variants("brand.com")`	Manual `STARTS WITH` lookalike sweeps

Mind the edge directions

Walking an edge the wrong way returns zero rows with no error — the single most common cause of a "correct-looking" query that comes back empty. The directions that trip people up:

Edge	Stored direction	To go the other way
`RESOLVES_TO`	HOSTNAME → IP (forward only)	reverse DNS: `(ip)<-[:RESOLVES_TO]-(h)`
`NAMESERVER_FOR`	server → domain	a domain's nameservers: `(d)<-[:NAMESERVER_FOR]-(ns)`
`MAIL_FOR`	server → domain	a domain's MX: `(d)<-[:MAIL_FOR]-(mx)`
`CHILD_OF`	child → parent	a parent's children: `(parent)<-[:CHILD_OF]-(child)`
`LOCATED_IN`	IP → CITY (then `HAS_COUNTRY`)	chain it: `(ip)-[:LOCATED_IN]->(:CITY)-[:HAS_COUNTRY]->(:COUNTRY)`
`PEERS_WITH`	ASN ↔ ASN (symmetric)	matches either arrow

Don't hand-roll what a procedure already does

Several investigations look like a tempting multi-hop scan but are far faster, and safer, as a CALL. Procedures wrap the expensive logic server-side and won't time out the way an unanchored walk will.

Instead of hand-rolling…	Call this
Walking `ASN → PREFIX → IP → LISTED_IN` to score a network	`CALL explain("AS13335")` — scored verdict + factors + sources
`STARTS WITH` / regex sweeps for lookalike domains	`CALL whisper.variants("brand.com")`
Reconstructing WHOIS or BGP timelines by hand	`CALL whisper.history("indicator")`
Chasing real IPs behind a CDN	`CALL whisper.origins("domain.com")`

explain() in particular replaces the most dangerous pattern there is — a scan down a large ASN's prefixes to find threat listings, which reliably times out. Let the procedure do it. Full signatures are in the Procedures reference.

A clean or NONE verdict from explain() means "not listed at this granularity," not "safe" — no data is not the same as benign. Read the coverage before you treat an indicator as clean.

Known limitations

These are practical edges to work around, not bugs.

Variable-length traversals do not follow synthesized edges. ANNOUNCED_BY, LISTED_IN, ROUTES into ANNOUNCED_PREFIX, and the virtual side of BELONGS_TO are computed at query time and will not expand inside a [*..] pattern. Write them as explicit single hops joined with WITH.
ENDS WITH on a broad suffix is a full scan. ENDS WITH "google.com" (no leading dot, very common suffix) times out. Use a selective, leading-dot suffix (ENDS WITH ".cloudflare.com"), or for subdomain enumeration traverse CHILD_OF from the anchored parent instead.
CONTAINS on ASN.name times out. ASN.name is the AS number (AS13335), and a bare CONTAINS over the ASN label scans it. Anchor the ASN exactly ({name: "AS13335"}), reach it through ROUTES / HAS_NAME from a known prefix, or filter the network name on ASN_NAME. CONTAINS on HOSTNAME.name is fast when the query is already anchored.
CONTAINS and regex (=~) on a large label are scans. They run, but slowly, and a regex over a billion-node label will time out. Regex is full-match, capped at 1000 characters, with no nested quantifiers. Prefer STARTS WITH / ENDS WITH, and always add a LIMIT.
UNWIND into CALL explain() makes one backend call per item. It works and is the right way to score a list, but runtime scales with the list length — keep the unwound list short.
BGP routing history over a large network is slow. whisper.history() on a big ASN can take many seconds. Keep a LIMIT and expect a longer round trip.
shortestPath requires a bounded path length. Always cap the variable-length range, for example [*1..6], within your hop cap.
Queries are time-capped. A query that runs too long returns HTTP 408. If you hit it, anchor the query, add a LIMIT, or replace a scan with a procedure.
Subdomain WHOIS history is empty. WHOIS is captured per registrable domain — whisper.history("www.cloudflare.com") returns nothing. Use the parent domain (cloudflare.com).
Anycast and large-CDN IPs often lack geolocation. A geo query returns no rows rather than a wrong city; read the owning ASN's country instead. Don't draw geographic conclusions from GeoIP on anycast, mobile-carrier NAT, or VPN-exit IPs.
MOAS conflicts need context. Real hijacks and legitimate anycast both produce a multi-origin (MOAS) conflict. WhisperGraph reports it via CONFLICTS_WITH; interpretation depends on RPKI status, ASN reputation, and history.
A query that "returns nothing" is often a bad label or edge name. The graph uses HOSTNAME (never Domain or FQDN), IPV4 / IPV6, and ASN / PREFIX, and the property is always name. Check CALL db.labels() and CALL db.relationshipTypes() when a result looks empty — they don't count against quota.

Mind your hop budget

Traversal depth is capped by plan: anonymous access (keyless) is 2 hops, a free key raises it to 3, and paid plans go deeper. A query with more hops than your cap returns HTTP 400 with a query-depth-exceeded error. If a deep traversal returns nothing on the anonymous tier, you have likely hit the cap, not written a bad query. Check your tier and remaining quota with CALL whisper.quota() — it doesn't count against your quota. See the pricing page for the tiers and Getting Started to grab a free key.

Where to go next

Use Cases — copy-paste recipes that already follow these rules, organized by workflow.
Graph Schema — every label, edge, direction, and property.
Procedures — full signatures for explain, variants, history, and origins.
Cheat Sheet — the one-page dense reference.

PreviousFunctions NextCheat Sheet