Cypher Best Practices
Performance rules, common pitfalls, and the do-this / not-that quick-reference table for Cypher on WhisperGraph.
Updated May 2026Cypher
Cypher Best Practices Documentation
Performance and correctness guidance for writing Cypher queries against WhisperGraph. Apply these on every query — graph scans on billion-node labels will time out.
General rules
- Anchor your starting node.
MATCH (n:HOSTNAME {name: "example.com"})does an indexed lookup.MATCH (n:HOSTNAME)scans billions of nodes. - Always use LIMIT. Especially on traversals that could fan out (LINKS_TO, RESOLVES_TO on CDN IPs). Start small and increase if needed.
- Use OPTIONAL MATCH for WHOIS fields. Not every domain has a registrar, email, or phone. A required MATCH on a missing field returns zero rows and hides other results.
- Use count() before pulling large result sets. Check cardinality first to avoid unexpectedly large responses.
- Use UNWIND for batch lookups. Pass lists of indicators in a single request rather than making one request per indicator.
- Specify edge types explicitly.
[:RESOLVES_TO]is faster than[r]because the engine does not need to check all edge types. - Use ANNOUNCED_BY for current BGP routing. Use BELONGS_TO for the registered RIR allocation. They return different prefix types and may give different results.
- Anchor LINKS_TO queries. The web link graph is one of the largest datasets. Queries without an anchored starting node will time out.
- Avoid CONTAINS with special characters. Characters like
&cause slow full-text scans. Use STARTS WITH or exact match when possible. - Anchor before traversing virtual edges.
ANNOUNCED_BY,ROUTESintoANNOUNCED_PREFIX,HAS_NAME,BELONGS_TOintoREGISTERED_PREFIX,LISTED_IN, andCONFLICTS_WITHare synthesized at query time. They only resolve when the source node of the traversal is anchored — an unanchored probe likeMATCH (a:ASN)-[:HAS_NAME]->(n)returns nothing. - Don't put virtual edges in a variable-length pattern. A pattern like
[:ANNOUNCED_BY*1..3]will not work — variable-length expansion only walks stored edges. Use fixed-length hops forANNOUNCED_BY,LISTED_IN, andBELONGS_TOintoREGISTERED_PREFIX. - Use
GET /api/query/statsfor global counts. A query that counts every edge in the graph —MATCH ()-[r]->() RETURN count(r)— will time out. The stats endpoint answers global node and edge counts instantly.
Do this / not that
| Do this | Not that | Why |
|---|---|---|
MATCH (h:HOSTNAME {name: "example.com"}) | MATCH (h:HOSTNAME) WHERE h.name = "example.com" | Inline property gets an indexed lookup |
Always add LIMIT | Open-ended traversals | Prevents timeout on billion-scale labels |
OPTIONAL MATCH for WHOIS fields | MATCH for sparse relationships | Avoids losing rows when fields are missing |
MATCH (sub)-[:CHILD_OF]->(h {name: "x.com"}) | WHERE h.name ENDS WITH ".x.com" | CHILD_OF uses an indexed edge; ENDS WITH scans |
STARTS WITH "www.goo" | =~ "www\\.goo.*" | STARTS WITH uses the FST index; regex does not |
| Pre-compute in WITH, then ORDER BY alias | COUNT{} or COLLECT{} in ORDER BY | Subquery expressions in ORDER BY are not supported |
GET /api/query/stats for global counts | MATCH ()-[r]->() RETURN count(r) | A whole-graph edge count times out |
| Anchor one end, then traverse a virtual edge | Unanchored or variable-length virtual-edge traversal | Virtual edges are synthesized at query time from the anchored node |