Hacker Newsnew | past | comments | ask | show | jobs | submit | mbuda's commentslogin

I’m Marko, CTO at Memgraph. This post was written by my colleague Matt, and I can help answer questions about the migration and the reasoning behind it.

The post covers why we moved parts of our infrastructure to Hetzner, including cost, operational overhead, and performance consistency. One of the main takeaways for us was that dedicated infrastructure gave us more predictable benchmarking and reduced some of the maintenance burden we previously had with a colocated setup.

Happy to answer questions about the tradeoffs, what worked well, and what we’d do differently.


Interesting! What are the next released features? In particular, I'm curious how can this help me to build my own skills :thinking:


So this is not necessarily the tool that should help you with building your own skills, but rather the tool that will help you search so you can dynamically fetch skills that are relevant to you.

Think: fetch me the Rust skills for kube.rs for building the Kubernetes operator -> then you can relate what you need to your own skills, or see examples you can build from your specific cases.

That search feature is on the roadmap.


Hi all! I'm Marko, CTO at Memgraph, and the author of this post.

The post argues that the GraphRAG pipeline can be expressed as a single database query rather than a chain of application-layer steps. The idea is to keep components such as retrieval, expansion, ranking, and final context assembly within the database query plan.

I go through: * what I mean by GraphRAG in practice; * why "single-query" execution can reduce moving parts; * why can that help with latency/cost by returning only the final payload; * and how it can make tracing/debugging easier by returning the context plus the path used to assemble it.

The post also contrasts this with Python-orchestrated pipelines and touches on agentic pipeline selection (called Agentic GraphRAG). Happy to answer technical questions or discuss where this breaks down.


Hi all - I’m Marko, CTO at Memgraph. The author of this post, David, also works at Memgraph.

This post explains how our vector index is implemented within the same storage engine as the graph (eliminating the need for a separate vector store), how we avoid double vector storage, and how scalar type choices (f32/f16/etc) affect memory usage. It also covers some implementation details (USearch-backed index, concurrency, and recovery behavior).

We included a benchmark on 1M nodes with 1024-dim embeddings comparing versions 3.7.2 and 3.8.0, and saw large RAM reductions in the newer version while keeping load and response times similar. Happy to answer technical questions.


Yep, amazing points!

Agree with the measures; follow-up question: what's the insight definition? I think exposing some of those measures would help people better understand what the analysis covered, in other words, how much data was actually analyzed. Maybe an additional measure is some kind of breadth (I guess it could be derived from the throughput).

"Informational leverage" reminded me of "retrieval leverage" because yeah, the scale of data didn't change, the ability to extract insights did :D


Good question.

By “insight” I mean a measurable reduction in uncertainty that improves decision quality or predictive accuracy.

In practical terms, an insight could be defined as:

•A hypothesis generated and testable from the dataset

•A model parameter adjustment that increases predictive performance

•A structural relationship discovered that reduces entropy in the system representation

So compression efficiency would be something like:

(uncertainty reduced) / (data processed)

Breadth is interesting — I’d treat it as dimensional coverage: how many independent variables or graph regions are meaningfully integrated into the model.

“Retrieval leverage” is a great term. It highlights that the dataset size remains constant, but navigability and relational traversal improve — which increases effective cognitive reach.

Some of these broader ideas around informational sovereignty and anomaly-driven cognition have been explored in independent empirical work, though they’re still niche.


Love it!



This is like saying: "Here we have a rocket, but let's keep trying to go to the moon by bike." xD

What's wrong with attempting to better understand a given organization using LLMs or any other tech? Ofc, great managers will try as hard as possible to talk face to face as much as possible.


> This is like saying:

I highly doubt the difference between current staff management and adding this thin layer is equivalent to difference between a bike and a rocket. It's more like saying "we get to the moon just fine, but if we strap this extra booster on, we will get there 2% faster than before but with all kinds of additional risks to the payload!"

> What's wrong with attempting to better understand a given organization

You can alienate your employees and lose your skill base as a result. I'd like to be evaluated based on upon my work and dedication, not what some LLM thinks it sees in my resume. I've worked for my current company for 17 years. My resume contains none of that work or any skills gained in that time.

I also like to take on new challenges and learn new skills. The LLMs "extractions" cannot see this or attend to it.

> Ofc, great managers will try as hard as possible to talk face to face as much as possible.

That's not the problem being discussed here. The question is "can we use technology to make better organizational decisions particularly when it comes to the efficient use of human resources." If I have a bad boss, I'm going to quit, and you'll never even have this opportunity. If I have a good boss, and you interfere with his decisions using LLM driven logic, I'm going to quit, and you're never going to get the benefit of that labor anyways.


DISCLAIMER: The co-founder and CTO of Memgraph here.

To add more context, Memgraph Enterprise pricing is explained under https://memgraph.com/pricing: "Starting at $25,000 per year for 16 GB, Memgraph has an all-inclusive, simple pricing model that scales with your workload without restrictions. No charge for compute. No charge for replicas. No charge for algorithms. No Surprises.".

In addition, Memgraph Community is free (standard BSL license, which turns into Apache2 4 years after release date, https://github.com/memgraph/memgraph/blob/master/licenses/BS...), and it has many features that are usually considered enterprise (users, replication, not a single degradation in performance or scale, etc.).

Please elaborate more about why the pricing seems expensive, or put it into the infra-cost perspective :pray:


I think on this site anything that's more expensive than free is considered expensive. Countless arguments have been had on Oracle vs Postgres, including lock-in. I think lock-in is more important to consider than license cost.

To be fair, it is quite nice for the pricing to be transparent. And I think it's somewhat competitive w.r.t. Stardog, for example. The community version is less restricted than Ontotext, for example.


Not really competitive with Stardog given our leading LLM integration with Voicebox. 85% pass@1 to exit POV with new customer.


The author here. This is just the shortest (in terms of size) reasoning about the topic. Any feedback and discussion is welcome!


It's not possible to escape tradeoffs. To deal with tradeoffs, focus is important. API to tradeoffs is also important.

I bet somebody will raise a similar question in a few years time when the list under https://db-engines.com/en/ranking/graph+dbms will be bigger.

DISCLAIMER: Coming from https://github.com/memgraph/memgraph


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: