This blog post explores why GraphDB can help lower the cost of LLMs in production as compared to a pure vector database-based approach.

Introduction
While 2023 was all about ChatGPT and large language modes (LLMs), in 2024 the rage has shifted to Retrieval Augmented Generation (RAG). Building a RAG prototype is relatively easy, but making it production-ready is hard with organizations routinely getting stuck in experimentation mode. KPIs around RAG applications like latency and relevance of results incur a high TCO (total cost of ownership) when transitioning from prototype to production.
Why not vanilla RAG?
In the process of chasing “RAG everything” or “plugging LLM integration everywhere into everything”, organizations often lose sight of the high compute and low ROI of traditional RAG. Using GPT and embeddings for similarity and retrieval by relevance doesn’t always perform better in terms of latency and costs.
Though vector embedding and high dimensional mapping to the Vector Space Model (VSM) has recently gained prominence, headspace, and usage with the advent of GenAI, it has been used as a key information retrieval technique for over two decades. Popular full-text search engines have been leveraging VSM for years.
Effective retrieval is foundational to RAG. Production RAG applications however run into challenges with the speed of retrieval and getting the “right” chunks to feed LLMs, leading to less contextually relevant LLM responses. Out of the box RAG struggles to connect dots, for questions that require traversing disparate chunks of data. RAG is less effective for structured data and performs poorly when there is a need to understand semantic concepts and relationships across documents or chunks. Accuracy degrades for complex aggregate-type queries where there is a need to understand relationships between entities in a user query.
The retrieval quality depends on similarity checking, chunking, and context window, and the balancing act of self-hosting vs using API-based deployments significantly affects latency and performance. It also results in cost overruns due to multiple complex GPT4 queries. Each of these elements requires careful consideration to optimize retrieval accuracy and system efficiency.
What is the Graphwise GraphDB approach?
Similarity search and ranking are easily accomplished with Graphwise’s out-of-the-box GraphDB-based Graph RAG implementation. This can be applied across various use cases for better economics of scale to avoid expensive and complex GPT4 queries.
Graph RAG involves several sub-tasks like:
- Retrieval of relevant document chunks
- Retrieval of relevant entities
- Identifying concrete entities mentioned in user queries (entity linking)
- Retrieval of data from a graph database
- Answering natural language questions based on relevant texts and data
LLMs are a must only for the last task. They can handle others, but quality and efficiency are subpar. GraphDB allows experimentation and optimization of the different tasks.
GraphDB’s semantic similarity allows the use of graph embedding, leveraging the benefits of graph connectivity and topology instead of flat text. In this way, it optimizes costs and latency to leverage LLMs only for what is worth. Its integration with Elasticsearch provides a lower TCO. For example, Elasticsearch does “stemming” to normalize “knives” to “knife”, while those are different words for a vector database-based approach to traditional RAG.
GraphDB out-of-the-box provides accurate entity linking, an essential component for Graph RAG to improve relevancy. For example, a mention of “Paris Hilton” in a user query should not be confused with the French capital. The Vanilla RAG approach with vector databases and ChatGPT cannot handle this well.
Leveraging GraphDB capabilities results in a simpler architecture with faster and cheaper indexing and query-answering. Combining LLMs-based embeddings and a vector database is not necessarily better than simply using Elasticsearch with GraphDB. It reduces time to market and allows the evaluation of performance and relevance of the results, based on the datasets being used and the type of questions being asked.
What are the benefits of the latest GraphDB 10.8?
GraphDB’s v 10.8 offers a palette of AI models with a toolbox of pre-integrated engines, analytic capabilities, and tools. Architects can choose the best option for each task, cutting down costs, improving performance, and reducing integration risks. Some of the advantages of GraphDB 10.8 include:
- Configurable Talk to Your Graph to ask natural language questions with persistence and contextual conversations
- AI-driven configurable assistant tailored to specific context based on dataset and user query with a chain of thought approach to dispatch and leverage the most appropriate tool for a user query
- No embeddings generation and vector stores are required, avoiding time-consuming configurations and complicated architecture
- Adequate tooling for customization, extensibility, and tuning, leveraging existing GraphDB retrieval abilities.
Conclusion
Selecting the right technical infrastructure and architecture, and ensuring data quality, with security, is the key to successful RAG deployment in production. Scaling RAG from POC to production should be solved with tactical investment in architecture, technology, and planning to overcome the associated challenges. It is not “no LLM” but making a good judgment call to leverage LLMs in a minority of user queries.