Cypher Best Practices

Performance rules, common pitfalls, and the do-this / not-that quick-reference table for Cypher on WhisperGraph.

Updated May 2026Cypher

Cypher Best Practices Documentation

Performance and correctness guidance for writing Cypher queries against WhisperGraph. Apply these on every query — graph scans on billion-node labels will time out.

General rules

  • Anchor your starting node. MATCH (n:HOSTNAME {name: "example.com"}) does an indexed lookup. MATCH (n:HOSTNAME) scans billions of nodes.
  • Always use LIMIT. Especially on traversals that could fan out (LINKS_TO, RESOLVES_TO on CDN IPs). Start small and increase if needed.
  • Use OPTIONAL MATCH for WHOIS fields. Not every domain has a registrar, email, or phone. A required MATCH on a missing field returns zero rows and hides other results.
  • Use count() before pulling large result sets. Check cardinality first to avoid unexpectedly large responses.
  • Use UNWIND for batch lookups. Pass lists of indicators in a single request rather than making one request per indicator.
  • Specify edge types explicitly. [:RESOLVES_TO] is faster than [r] because the engine does not need to check all edge types.
  • Use ANNOUNCED_BY for current BGP routing. Use BELONGS_TO for the registered RIR allocation. They return different prefix types and may give different results.
  • Anchor LINKS_TO queries. The web link graph is one of the largest datasets. Queries without an anchored starting node will time out.
  • Avoid CONTAINS with special characters. Characters like & cause slow full-text scans. Use STARTS WITH or exact match when possible.
  • Anchor before traversing virtual edges. ANNOUNCED_BY, ROUTES into ANNOUNCED_PREFIX, HAS_NAME, BELONGS_TO into REGISTERED_PREFIX, LISTED_IN, and CONFLICTS_WITH are synthesized at query time. They only resolve when the source node of the traversal is anchored — an unanchored probe like MATCH (a:ASN)-[:HAS_NAME]->(n) returns nothing.
  • Don't put virtual edges in a variable-length pattern. A pattern like [:ANNOUNCED_BY*1..3] will not work — variable-length expansion only walks stored edges. Use fixed-length hops for ANNOUNCED_BY, LISTED_IN, and BELONGS_TO into REGISTERED_PREFIX.
  • Use GET /api/query/stats for global counts. A query that counts every edge in the graph — MATCH ()-[r]->() RETURN count(r) — will time out. The stats endpoint answers global node and edge counts instantly.

Do this / not that

Do thisNot thatWhy
MATCH (h:HOSTNAME {name: "example.com"})MATCH (h:HOSTNAME) WHERE h.name = "example.com"Inline property gets an indexed lookup
Always add LIMITOpen-ended traversalsPrevents timeout on billion-scale labels
OPTIONAL MATCH for WHOIS fieldsMATCH for sparse relationshipsAvoids losing rows when fields are missing
MATCH (sub)-[:CHILD_OF]->(h {name: "x.com"})WHERE h.name ENDS WITH ".x.com"CHILD_OF uses an indexed edge; ENDS WITH scans
STARTS WITH "www.goo"=~ "www\\.goo.*"STARTS WITH uses the FST index; regex does not
Pre-compute in WITH, then ORDER BY aliasCOUNT{} or COLLECT{} in ORDER BYSubquery expressions in ORDER BY are not supported
GET /api/query/stats for global countsMATCH ()-[r]->() RETURN count(r)A whole-graph edge count times out
Anchor one end, then traverse a virtual edgeUnanchored or variable-length virtual-edge traversalVirtual edges are synthesized at query time from the anchored node