Vector Search for Graph Workloads
Every major graph database now offers vector search: store externally computed embeddings as node properties, build an index, run similarity queries. The problem is that those embeddings know nothing about your graph.
WWKG does something fundamentally different. It derives embeddings from the graph structure itself — no external model, no API key, no embedding pipeline. The graph teaches itself what “similar” means.
Bolt-On Vector Search
- Embeddings from external model, external pipeline
- Text similarity only, graph structure ignored
- Vector indexes not versioned or encrypted
- Combining with graph queries requires middleware
WWKG Vector Search
- Graph-derived embeddings from relationships
- Structural similarity, not just text similarity
- Versioned, encrypted, branch-aware indexes
- Single query combines vector and graph traversal
What you could not do before
In every other graph database that supports vector search:
- Embeddings come from an external service — OpenAI, Cohere, or a model you host yourself. You manage an embedding pipeline alongside the database.
- Those embeddings capture text similarity, not graph similarity. Two entities with identical labels but completely different relationships get identical embeddings. The structure that makes a knowledge graph valuable is ignored.
- Vector indexes are not versioned. You cannot ask “what were the similarity results last Tuesday?” or run experimental embeddings on a branch without affecting production.
- Embeddings are stored as plaintext properties. Anyone with storage access can reconstruct entity relationships from the vectors.
- Combining vector search with graph traversal requires middleware — an external vector database, REST calls between systems, and result stitching in application code.
The result: vector search is a separate system bolted onto the side of the graph, with its own lifecycle, its own security gaps, and its own operational burden.
What WWKG enables
Graph-derived similarity. WWKG computes embeddings directly from the relationships in your data. Two cities that share similar trade routes, population patterns, and climate zones are recognized as similar — even if their names and descriptions have nothing in common. This is graph similarity, not text similarity, and it captures the relational structure that makes a knowledge graph worth building.
External embeddings too. You can also bring text-based embeddings from external models when text similarity is what you need. Both types compose in a single query — structural similarity and text similarity side by side.
Similarity as reasoning. Vector search is not a separate feature. It is a reasoning mode, composable with inference in a single query plan. Ask for “cities similar to Berlin that are also regional capitals” and the engine handles the similarity search and the inference together — one query, one result.
Versioned vector indexes. Every embedding is pinned to a specific point in your data’s history. This enables capabilities no other graph database offers:
- Time-travel similarity queries: query embeddings as they were at any historical point — useful for auditing how similarity results changed over time.
- Branch-specific embeddings: retrain embeddings on an experimental branch without affecting the main branch. Merge when you are satisfied.
Encrypted embeddings. Embeddings are encrypted with the same workspace key that protects everything else — before storage and before leaving the node. In other systems, embeddings are plaintext, which means anyone with storage access can reverse-engineer entity relationships from the vectors.
Smart incremental updates. Not every data change requires recomputing all embeddings. Adding new entities and relationships triggers an incremental update from existing results. Only structural schema changes trigger a full recomputation. The system classifies the update mode automatically.
Single-query GraphRAG. One query, no middleware. Combine text similarity, graph similarity, and graph traversal in a single SPARQL, Cypher, or GQL query. No external vector database, no REST calls between systems, no result stitching in application code.
A concrete scenario
A pharmaceutical company builds a knowledge graph of compounds, targets, pathways, and clinical outcomes. A researcher asks: “find compounds structurally similar to this candidate that also target the same pathway family.”
In a traditional setup, that requires an external embedding service for the compounds, a separate vector database, custom middleware to join vector results with graph traversals, and manual work to keep the embeddings in sync when the graph changes.
In WWKG, it is one query. The graph-derived embeddings capture compound similarity based on the full relational context — target interactions, pathway memberships, side-effect profiles — not just molecular descriptions. The query combines similarity search with graph traversal in a single execution. The results reflect the current state of the branch the researcher is working on.
What makes this different
The fundamental difference is where the embeddings come from. Every other graph database treats vector search as storage plus indexing: you compute embeddings elsewhere, you store them as properties, you query them. The graph structure — the thing that makes a knowledge graph different from a document store — plays no role in the similarity computation.
WWKG derives similarity from the graph itself. The embeddings encode relationships, not just text. This means the similarity results actually reflect the knowledge you have modeled — which is the entire point of building a knowledge graph.
Combined with versioning, encryption, and native query integration, this is not bolt-on vector search. It is similarity as a native property of the knowledge graph.
Next steps
Related features: Three Query Languages, Reasoning and Validation, and Peer-to-Peer Distribution. See the Vocabulary section for public terms used across the WWKG product and docs.