In working to reduce the cost and time taken to generate Global Graph RAG results, I noted a marked improvement in ultimate results as measured by accuracy of results, cost and time. I’m passing this along since it is very much a win-win-win scenario.
Global Graph RAG Overview
Retrieval-Augmented Generation (RAG) over graphs, often called GraphRAG, is a way to enhance your RAG searches by focusing more on the relationships between items. Given our library analogy, if a RAG AI is giving the librarian a few select books to choose from based on a search of books with phrases semantically similar to our query, then Global GraphRAG would be like giving the librarian a summary of groups of books to review. These answers necessarily have greater breadth, as they are more likely to pull in different subjects, but with less detail, as using summaries necessarily removes detail.
The Common Approach
The Global GraphRAGs by Microsoft and Llama Index both operate by taking every query and passing it to an LLM along with each community one at a time. This means that if you have 200 communities, each query will need to make 201 calls to the LLM—once for each community and then once to amalgamate them into a single answer. This leads to accurate answers, but it is expensive from a pure cost and performance standpoint. Depending on how many concurrent calls you can make to the LLM and how fast they process, a query can take five minutes or more to process, making it not viable for real-time queries.
A Better Approach
I decided to see how much less accurate the results would be if, instead of passing the query to the LLM with a single community summary, I batched the summaries. In other words, send a group of communities with each query instead of just one.
I am a strong proponent of automating your RAG AI evaluations so you can easily change models and try new things. Therefore, I was in a good position to see what effect various levels of batching have on the evaluation score.
My hypothesis was that I would see a minor drop in the evaluation score but a huge decrease in time and cost. My goal was to find the point where the costs were minimized while maximizing the evaluation score.
The Surprising Findings
To my surprise, larger batches tended to increase the evaluation score! This meant that by batching, I was improving accuracy while decreasing both speed and cost. A real win-win-win.
| Evaluation Score | GPT-4o | Batch of 1 | Batch of 5 | Batch of 10 | Batch of 25 |
|---|---|---|---|---|---|
| Score | 7 | 9 | 9.2 | 9.5 | Error. Exceeded Max Tokens |
Since the community summaries were already part of a Graph database, it was easy to cluster the summaries and pass them in in like groups. However, I found no difference when passing in clusters of like summaries or randomly passing clusters.
Conclusion
Using automated evaluation and trying some new approaches, I managed to make a major improvement in the evaluations for Global GraphRAG for my product. The reduction in cost alone is a significant benefit. I would strongly recommend everyone trying out GraphRAG to build in automated evaluation and see which level of batching works best for their implementation. By doing so, you may find, as I did, that you can achieve greater accuracy while also reducing time and costs, creating a more efficient and effective system.

Leave a Reply