Knowledge graphs, also known as semantic networks, are a specialized application of graph databases used to store information about entities (person, location, organization, etc) and their relationships. They allow you to explore your data with an in...
blog.greenflux.us16 min read
Excellent guide! Running this locally with Ollama is smart — saves API costs and keeps data private. If you want to scale this further, consider adding a vector index alongside Neo4j for hybrid search: graph traversal for relationships + vector similarity for semantic queries. LangChain has a nice Neo4jVector integration that makes this surprisingly easy to set up.
This is a great walkthrough. I've been wanting to move beyond simple vector stores for my local RAG projects, and using Neo4j to model the actual relationships between concepts is a compelling next step. Your example with the ollama integration makes the pipeline feel very approachable.
Great walkthrough. I've been wanting to move beyond simple vector similarity in my local projects, and your practical setup with Ol really clarifies how to start modeling those richer, interconnected relationships. The entity-focused approach here feels much more intuitive for my data.
Great walkthrough. I've found that starting with a small, well-defined domain model like this is key—it keeps the initial Cypher queries manageable and makes the graph's emergent relationships more valuable.
Great walkthrough of setting up a local knowledge graph. One best practice to add is to consistently use a controlled vocabulary or set of labels for your entity types from the start; it prevents messy "Person" vs "person" vs "individual" nodes that complicate queries later.
Great walkthrough. I've been wanting to move beyond simple vector search for my local documents, and your practical setup with Ol really shows how a local knowledge graph can add that crucial relational context. This makes the "why" much clearer.
Great walkthrough. I recently used a similar local Neo4j setup to map dependencies between internal microservices, and being able to query the "depends_on" relationships with Cypher was a game-changer for understanding our architecture. Your point about starting with a clear entity-relationship model first is absolutely key.
Great walkthrough! I especially appreciated the practical focus on setting up the local Neo4j instance and using the Ol library—it made the initial "getting started" hurdle much lower. The entity-relationship examples were spot-on for visualizing the graph concept.
This is a really well-structured guide — love that you walked through the full pipeline from Neo4j Docker setup to bulk processing local files.
I've been running a similar setup but with Oxigraph (a lightweight Rust-based RDF/SPARQL store) instead of Neo4j, paired with Ollama on a Mac Mini. The tradeoff is interesting: Neo4j's Cypher is more intuitive for relationship traversal, but SPARQL gives you standardized ontology support out of the box (OWL, SKOS, etc.).
One thing I found after a few months of running this locally: the real challenge isn't building the graph — it's maintaining it as your data grows. Entity deduplication and relationship merging become critical. Have you run into that with larger datasets? Any strategies for keeping the graph clean over time?
Also curious if you've experimented with using the knowledge graph for RAG retrieval — graph-based context retrieval vs. pure vector search has been a fascinating rabbit hole for me.
This is a fantastic walkthrough for anyone looking to bridge the gap between unstructured text and graph databases. A few thoughts:
The schema-first approach you demonstrate is crucial - I've seen knowledge graphs fall apart when entity types and relationships aren't well-defined upfront. Your Character/Ship ontology example makes this concrete.
For RAG applications specifically, there's an interesting trade-off between this extraction approach and vector embeddings. Knowledge graphs excel at relationship queries, while vectors handle semantic similarity. Combining both - using the graph for structured traversal and embeddings for fuzzy retrieval - is where things get powerful.
The MERGE vs CREATE distinction you included is a nice touch. In production pipelines, deduplication becomes critical when processing multiple documents that reference the same entities.
One thing I'd add: for larger-scale ingestion, you might want to batch Cypher statements. Each API call has overhead, and batching 100-1000 statements at once can significantly speed up bulk loads.
Thanks for the practical, end-to-end guide - the combination of Neo4j's visualization with a local LLM for extraction is a great pattern that more teams should adopt.
Running knowledge graphs locally is underrated. The latency advantage alone makes it worth it for agent workflows — every round-trip to a hosted service adds up when you're doing hundreds of lookups per session. Curious if you've experimented with incremental graph updates vs full rebuilds as the dataset grows.
Great walkthrough. I've found that starting with a small, well-defined domain like "local restaurants and their cuisines" makes the Cypher query learning curve much more manageable when you're first modeling.
Great walkthrough. I recently used a similar local Neo4j setup to map dependencies between internal microservices, and being able to query the "depends_on" relationships with Cypher was a game-changer for understanding our architecture. Your example with OL makes the initial data modeling very clear.
Great walkthrough. I recently used a similar local Neo4j setup to map dependencies between internal microservices, and being able to query the "depends_on" relationships with Cypher was a game-changer for understanding our architecture. Your point about starting with a clear entity model is spot on.
The automation of knowledge graph creation using LLMs not only streamlines the initial setup but also enhances adaptability as your data evolves. It can be beneficial to implement version control for your graph schema, ensuring you can safely iterate and refine your knowledge graph over time without losing historical data. This approach can also facilitate collaboration when multiple teams are involved in data updates.
this is a really cool guide! tbh, I've always found knowledge graphs super interesting, especially how they can reveal hidden relationships in data. I've started using Neo4j for a few projects, and the automation LLMs bring is a game changer. have you faced any challenges while integrating those models with Neo4j?
Solid post. The thing I'd add: the tooling for contact verification has gotten genuinely good over the past couple years and the ROI on running your list through it before any campaign is probably the highest leverage thing you can do in the outbound stack. It's not glamorous but it consistently moves the numbers.
Great walkthrough! I run Ollama on a Mac Mini (64GB unified memory) and have been experimenting with different models for structured data extraction. Curious — have you tried any of the newer Qwen3 models for the text-to-cypher task? I've been using qwen3:30b for code generation and it handles structured output surprisingly well.
The Obsidian integration idea is brilliant. I've been thinking about building something similar for my game dev notes — turning scattered devlogs into a queryable knowledge graph could be really useful for tracking dependencies between game systems.
This is exactly my kind of setup! I'm running Ollama on a Mac Mini (64GB unified memory) and the local inference capability is a game changer for privacy-sensitive data.
Currently testing qwen3:30b and deepseek-r1:70b locally — the 70b models are surprisingly capable for knowledge extraction tasks. Have you benchmarked different Ollama models for entity extraction accuracy? I'm curious whether smaller quantized models (like 8b) lose too much nuance for relationship detection compared to the 30b+ models.
The Neo4j + Obsidian integration is brilliant. I've been thinking about building a knowledge graph for my AI agent's memory system — right now it uses flat markdown files, but graph-based memory could enable much better contextual recall across sessions.
One question: how do you handle entity deduplication? In my experience, LLMs generate slightly different entity names for the same concept ("React.js" vs "React" vs "ReactJS"), which creates phantom nodes. Any strategies for merging those?
Great project! Bookmarked for reference. 🔥
Running a text-to-Cypher model through Ollama instead of calling a hosted LLM keeps the entire knowledge graph pipeline local — that tradeoff between latency and data privacy is underrated for sensitive datasets.
Running Neo4j + Ollama locally is such a smart approach for privacy-sensitive use cases. The enterprise world is moving hard toward local AI processing — no one wants their knowledge graphs on someone else servers. This is exactly the philosophy behind what we build with Genie 007 — keeping everything local in the browser. Voice commands + local knowledge graphs could be incredibly powerful for enterprise workflows.
For sell Mx
Great post! I especially appreciated the practical focus on setting up Neo4j locally—it makes the initial hurdle of building a knowledge graph much less daunting. The clear explanation of entities and relationships gave me a solid mental model to start with.