Knowledge graphs, also known as semantic networks, are a specialized application of graph databases used to store information about entities (person, location, organization, etc) and their relationships. They allow you to explore your data with an in...
blog.greenflux.us16 min read
This is a fantastic walkthrough for anyone looking to bridge the gap between unstructured text and graph databases. A few thoughts:
The schema-first approach you demonstrate is crucial - I've seen knowledge graphs fall apart when entity types and relationships aren't well-defined upfront. Your Character/Ship ontology example makes this concrete.
For RAG applications specifically, there's an interesting trade-off between this extraction approach and vector embeddings. Knowledge graphs excel at relationship queries, while vectors handle semantic similarity. Combining both - using the graph for structured traversal and embeddings for fuzzy retrieval - is where things get powerful.
The MERGE vs CREATE distinction you included is a nice touch. In production pipelines, deduplication becomes critical when processing multiple documents that reference the same entities.
One thing I'd add: for larger-scale ingestion, you might want to batch Cypher statements. Each API call has overhead, and batching 100-1000 statements at once can significantly speed up bulk loads.
Thanks for the practical, end-to-end guide - the combination of Neo4j's visualization with a local LLM for extraction is a great pattern that more teams should adopt.
Running knowledge graphs locally is underrated. The latency advantage alone makes it worth it for agent workflows — every round-trip to a hosted service adds up when you're doing hundreds of lookups per session. Curious if you've experimented with incremental graph updates vs full rebuilds as the dataset grows.
Great walkthrough. I've found that starting with a small, well-defined domain like "local restaurants and their cuisines" makes the Cypher query learning curve much more manageable when you're first modeling.
Great walkthrough. I recently used a similar local Neo4j setup to map dependencies between internal microservices, and being able to query the "depends_on" relationships with Cypher was a game-changer for understanding our architecture. Your example with OL makes the initial data modeling very clear.
Great walkthrough. I recently used a similar local Neo4j setup to map dependencies between internal microservices, and being able to query the "depends_on" relationships with Cypher was a game-changer for understanding our architecture. Your point about starting with a clear entity model is spot on.
The automation of knowledge graph creation using LLMs not only streamlines the initial setup but also enhances adaptability as your data evolves. It can be beneficial to implement version control for your graph schema, ensuring you can safely iterate and refine your knowledge graph over time without losing historical data. This approach can also facilitate collaboration when multiple teams are involved in data updates.
this is a really cool guide! tbh, I've always found knowledge graphs super interesting, especially how they can reveal hidden relationships in data. I've started using Neo4j for a few projects, and the automation LLMs bring is a game changer. have you faced any challenges while integrating those models with Neo4j?
Solid post. The thing I'd add: the tooling for contact verification has gotten genuinely good over the past couple years and the ROI on running your list through it before any campaign is probably the highest leverage thing you can do in the outbound stack. It's not glamorous but it consistently moves the numbers.
Great walkthrough! I run Ollama on a Mac Mini (64GB unified memory) and have been experimenting with different models for structured data extraction. Curious — have you tried any of the newer Qwen3 models for the text-to-cypher task? I've been using qwen3:30b for code generation and it handles structured output surprisingly well.
The Obsidian integration idea is brilliant. I've been thinking about building something similar for my game dev notes — turning scattered devlogs into a queryable knowledge graph could be really useful for tracking dependencies between game systems.
This is exactly my kind of setup! I'm running Ollama on a Mac Mini (64GB unified memory) and the local inference capability is a game changer for privacy-sensitive data.
Currently testing qwen3:30b and deepseek-r1:70b locally — the 70b models are surprisingly capable for knowledge extraction tasks. Have you benchmarked different Ollama models for entity extraction accuracy? I'm curious whether smaller quantized models (like 8b) lose too much nuance for relationship detection compared to the 30b+ models.
The Neo4j + Obsidian integration is brilliant. I've been thinking about building a knowledge graph for my AI agent's memory system — right now it uses flat markdown files, but graph-based memory could enable much better contextual recall across sessions.
One question: how do you handle entity deduplication? In my experience, LLMs generate slightly different entity names for the same concept ("React.js" vs "React" vs "ReactJS"), which creates phantom nodes. Any strategies for merging those?
Great project! Bookmarked for reference. 🔥
Running a text-to-Cypher model through Ollama instead of calling a hosted LLM keeps the entire knowledge graph pipeline local — that tradeoff between latency and data privacy is underrated for sensitive datasets.
Running Neo4j + Ollama locally is such a smart approach for privacy-sensitive use cases. The enterprise world is moving hard toward local AI processing — no one wants their knowledge graphs on someone else servers. This is exactly the philosophy behind what we build with Genie 007 — keeping everything local in the browser. Voice commands + local knowledge graphs could be incredibly powerful for enterprise workflows.
Thanks for being patience with me. All my questions were answered and the support continued even after the delivery of the service . I will continue to work with you and I have already started referring family to you that needed similar assistance. Thanks JBEE SPY TEAM on telegram +44 7456 058620
That’s sick — local knowledge graphs with LLMs feels like giving your data a conspiracy-board upgrade.
Are you planning to run it mostly for fun experiments (like mapping your own notes) or for a real use case like rec engines / fraud detection?
This is EXACTLY what i was looking for, even though i didn’t know it!
Thank you so much for putting this out there when you did! i will pay it forward once i finish the project I’m working on by posting an article of my own and linking back to this one — thanks 🙏
This is a really well-structured guide — love that you walked through the full pipeline from Neo4j Docker setup to bulk processing local files.
I've been running a similar setup but with Oxigraph (a lightweight Rust-based RDF/SPARQL store) instead of Neo4j, paired with Ollama on a Mac Mini. The tradeoff is interesting: Neo4j's Cypher is more intuitive for relationship traversal, but SPARQL gives you standardized ontology support out of the box (OWL, SKOS, etc.).
One thing I found after a few months of running this locally: the real challenge isn't building the graph — it's maintaining it as your data grows. Entity deduplication and relationship merging become critical. Have you run into that with larger datasets? Any strategies for keeping the graph clean over time?
Also curious if you've experimented with using the knowledge graph for RAG retrieval — graph-based context retrieval vs. pure vector search has been a fascinating rabbit hole for me.