Using natural language to explore databases unlocks new opportunities for non-technical users. This tool, built with Neo4j and LangChain, combines semantic search and graph queries to provide intuitive, code-free access to data insights.
Imagine wanting to search a festival database with only a vague memory: “This American guitar player who performed solo, and it must have been the year when two well-known Scottish bands played as well.” To answer such queries, I created a tool using Neo4j and LangChain that combines semantic search and graph queries for intuitive data exploration.
Why combine graph and semantic search approaches? Consider what it would take to find that memorable performance in a typical database:
You might start with something like FIND Events WHERE Performer.Country = "USA" AND Location = "Katowice" AND Date BETWEEN "2020-06-01" AND "2020-08-31"
. While checking for a solo performance could be straightforward in a relational database, identifying the Scottish context would prove much more challenging.
This is where graphs excel at analyzing complex queries and network relationships. Think of challenges like finding events with the same group of participants, or music shows featuring specific instruments across multiple genres. Such queries, where complex relationships play a crucial role, are much more natural in graph databases.
That’s why I chose a knowledge graph, enriching simple tabular data with multiple relationships. But how can our tool translate a vague festival memory into a precise database query? This is where LLMs come into play, transforming your natural language requests into Neo4j’s Cypher queries.
Example | The query: “Find events where Alice participated but Bob did not” results in:
MATCH (g1:Guest)-[:PARTICIPATES_IN]->(e:Event)
WHERE toLower(g1.name) = 'alice'
AND NOT EXISTS {{
MATCH (g2:Guest)-[:PARTICIPATES_IN]->(e)
WHERE toLower(g2.name) = 'bob'
}}
RETURN e.name AS eventName, e.id AS eventId
While this clearly demonstrates how powerful LLMs are, you might wonder about the role of semantic search in this whole process. Cypher and other graph database queries still require specific instructions that rely on keywords. Semantic search, however, can better grasp the general meaning of a phrase, matching “American guitar player” to one specific artist among dozens of US performers who played that night alongside our Scotish stars: Mogwai and Primal Scream.
There are several implementation options: conducting both searches in parallel and combining results, or prioritizing graph search with semantic search as a fallback. For example, you could retrieve all relevant American performers from that year, then use semantic search to identify the specific guitar player you’re looking for.
Once you’ve obtained potential matches (which can be reranked for accuracy using JinaAI’s reranker model), they’re passed to an LLM for final judgment. The tradeoff is the execution time for these LLM calls, but enabling sophisticated queries by non-technical users seems worth it.
This hybrid retrieval strategy combines the best of both worlds: broad semantic understanding with the ability to process complex relationship-based queries (something LLMs can struggle with, especially for complex or nested queries involving multiple entities).
Check out the repository, particularly the README file, to learn more about the implementation details.
Tools I’ve used: #
- Gemini 2.0 Flash
- JinaAi: jina-reranker-v2-base-multilingual
- LangChain
- Neo4j Graph Database