Select Page

Precise Semantic Retrieval: Implementing Chunk-level Vector Search with GraphDB and Elasticsearch

A step-by-step technical guide to implementing chunk-level vector search using GraphDB and Elasticsearch, explaining how to split large documents into semantically indexed nested objects for more accurate and reliable retrieval.

Main Takeaways

  • Document-level search isn't precise enough — large documents must be split into chunks and embedded individually, or you risk truncation and mixed-topic retrieval.
  • Nested fields are the key to accuracy — without nesting, Elasticsearch flattens chunk data, breaking the link between a chunk's text, position, and vector, leading to false matches.
  • GraphDB bridges RDF and vector search — the connector maps structured knowledge graph data into Elasticsearch, automatically generating embeddings per chunk via the Graphwise Transformer.
  • kNN searches the right level — similarity is scored per chunk, not per parent document, so results are precise even when multiple chunks from the same document match.

When working with large documents, semantic search at the document level is often insufficient. Large texts frequently contain multiple distinct topics, and indexing them as a single unit can cause truncation. This may happen either because the embeddings sent to the Graphwise Transformer client exceed 4 MB (larger documents cannot fit within gRPC limitations) or due to size restrictions of the embedding model used by the transformer.

A more precise approach is to split documents into smaller logical segments (chunks) and generate embeddings per chunk.

With the introduction of vector search within nested fields in GraphDB 11.3, it is now possible to model and index these chunks as structured nested objects in Elasticsearch while preserving their semantic integrity. This enables accurate chunk-level vector similarity search without losing the relationship between different object fields.

How to set it up

The following steps describe the complete process — from preparing chunked RDF data, through configuring the Graphwise Transformer client and GraphDB, to creating the Elasticsearch connector and executing nested vector search queries.

Step 1: Generate chunked RDF data

Use the chunker.py script to generate an RDF dataset from a TSV file. The script can be found at https://github.com/Ontotext-AD/document-chunker-rdf.

As mentioned earlier, if the source documents are very long, Elasticsearch may index only part of the content. To avoid this behavior, split each long document into smaller chunks and index each chunk separately.

Step 2: Start the Graphwise Transformer client and Elasticsearch

The Graphwise Transformer is a Python gRPC server that serves sentence embeddings via gRPC. By default, it listens on port 5050 and is started as follows:

docker build -t graphwise-transformer:$(git rev-parse –short HEAD) .

docker run –rm -p 5050:5050 -e GRAPHWISE_CONFIG=/app/config.properties graphwise-transformer:$(git rev-parse –short HEAD)

This uses the default configuration for port and sentence embeddings model – port 5050 and the sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 embedding model. Both can be changed by modifying the config.properties. Further information about the client is available in the Graphwise documentation.

Step 3: Start GraphDB with the required properties configured

GraphDB can be started using the official dockerhub image or the provided distributions for Windows, Linux, and MacOS.

Start GraphDB with the following properties configured (see the configuration reference in our documentation):

  • graphwise.transformer.address=localhost:5050
  • graphwise.transformer.batch.size=128
  • graphwise.transformer.embedding.model.name=<model_name> if the default model has been changed 

The third property is only required if the default model has been changed. More detail on embedding model configuration is available in the documentation.

Step 4: Create a general repository in GraphDB

Open GraphDB Workbench and navigate to the repository management section. Create a new general repository. No additional configuration is required at this stage — the default settings are sufficient to proceed.

Step 5: Import the generated RDF data into the repository

Once the repository is set up, navigate to the Import menu within GraphDB Workbench. Select the RDF file generated in the previous steps and import it into the newly created repository. After the import completes, verify that the data has been loaded correctly by checking the Explore menu.

Step 6: Create an Elasticsearch index configured for vector embeddings

Each chunk will be indexed as an individual document with its own embedding, using the native:nested datatype. This enables semantic similarity search across all chunks while keeping each chunk’s fields correctly associated with one another.

 
PREFIX :
PREFIX inst:

INSERT DATA {
	inst:chunks :createConnector '''
{
  "fields": [
	{
      "fieldName": "chunks",
      "propertyChain": ["http://schema.org/hasPart"],
      "datatype": "native:nested",
      "objectFields": [
        {
          "fieldName": "text_content",
          "propertyChain": ["http://schema.org/text"]
        },
		{
          "fieldName": "index",
          "propertyChain": ["http://schema.org/position"]
        },
        {
          "fieldName": "vector",
          "propertyChain": ["http://schema.org/text"],
          "datatype": "vector"
        }
      ]
    }
  ],
  "languages": [],
  "types": ["http://schema.org/CreativeWork"],
  "readonly": false,
  "detectFields": false,
  "importGraph": false,
  "skipInitialIndexing": false,
  "elasticsearchClusterSniff": false,
  "elasticsearchNode": "http://localhost:9200",
  "manageIndex": true,
  "manageMapping": true,
  "bulkUpdateBatchSize": 5000,
  "embeddingModel" : "com.ontotext.embeddings.GraphwiseTransformerClient"
}
''' .
}

This SPARQL INSERT DATA query creates an Elasticsearch connector (index in Elasticsearch) inside GraphDB named “chunks”. It indexes RDF resources of type schema:CreativeWork as Elasticsearch documents, each containing nested chunks. (“Chunks” field is created with datatype native:nested).

For each CreativeWork, the connector follows schema:hasPart to extract chunk resources and maps their schema:text and schema:position properties into nested object fields (text_content, index). The vector field is also derived from schema:text, but because its datatype is “vector”, GraphDB generates an embedding using the configured model com.ontotext.embeddings.GraphwiseTransformerClient, which connects to the graphwise-transformer service.

With manageIndex and manageMapping enabled, GraphDB automatically creates and maintains the Elasticsearch index and mapping, performs initial indexing of existing data, and pushes updates in bulk batches.  The result is a chunk-level nested vector index that supports semantic similarity search.

Step 7: Search for documents containing the most similar chunk

The following query searches for documents containing the chunk most semantically similar to a given input:

PREFIX inst: 
PREFIX conn: 

SELECT ?doc ?score WHERE {
    ?search a inst:chunks ;
            conn:query '''{
                "size": 5,
                "query": {
	         "nested": {
                        "path": "chunks",
                        "query": {
                            "knn": {
                                "field": "chunks.vector",
                                "query_vector": "Graph Database, then import  query data with the (OpenRDF) GraphDB Workbench, and finally explore and visualise data with the build in visualisation tools.",
                                "k": 5,
                                "num_candidates": 50
                            }
                        }
                    }
                }
            }''' ;
            conn:entities ?doc .
    ?doc conn:score ?score .
}

The query executes a vector similarity search through the Elasticsearch connector inst:chunks defined in GraphDB and returns the top 5 RDF documents ranked by semantic relevance. The conn:query JSON block is passed directly to Elasticsearch and performs a nested k-nearest neighbors (kNN) search on the chunks.vector field. This means that it searches inside the nested chunks objects created by the connector.

The query_vector is supplied as raw text rather than a numeric vector. GraphDB automatically converts this text into an embedding using the configured embedding model (via the Graphwise Transformer client) before sending the kNN request to Elasticsearch. The search retrieves the 5 most similar chunk vectors (k: 5) while evaluating 50 candidates for better recall (num_candidates: 50).

The connector then maps the matching Elasticsearch documents back to their corresponding RDF resources (conn:entities ?doc) and exposes their relevance score via conn:score ?score. In this way, the SPARQL result contains each matching RDF document along with its vector similarity score. Note that while each individual chunk can only be returned once, the same parent document, for example, CreativeWork, may appear multiple times in the results if several of its chunks independently match the query.

Why nested fields matter

Each schema:CreativeWork instance contains multiple logical sub-units (chunks), and each chunk carries its own text, position, and embedding vector. Declaring chunks as a nested field in Elasticsearch ensures that each chunk is stored and queried as an independent inner document, rather than being flattened into the parent document.

Without nesting, Elasticsearch flattens arrays of objects into parallel arrays of fields, breaking field associations. For example, chunks.text_content, chunks.index, and chunks.vector would lose their one-to-one relationship. A vector match from one chunk could then incorrectly combine with metadata from a different chunk inside the same parent document. This leads to false positives and incorrect scoring. The official Elasticsearch documentation provides a good explanation of this topic.

By using nested objects, Elasticsearch internally creates hidden child documents for each chunk. The nested query evaluates similarity per chunk, keeping scoring isolated within each specific chunk context. In the kNN query above, similarity is computed against each individual chunks.vector — not against the parent document as a whole — which is what makes chunk retrieval both precise and reliable.

Want to get some hands-on experience?

Details

What is Semantic Search

Semantic search leverages knowledge graphs and text analysis to understand the intent and context behind a query, rather than just matching keywords. By bridging the language gap between humans and machines, it delivers highly accurate, personalized, and contextually relevant results.

Learn more

FAQ

Any Questions? Look Here

Chunk-level vector search is a foundational retrieval technique that involves partitioning large, unstructured documents into smaller, manageable segments—or "chunks"—and converting them into high-dimensional vector embeddings for storage in a vector database. This process is critical for Retrieval-Augmented Generation (RAG) because it enables precise semantic retrieval, allowing the system to identify and supply only the most relevant snippets of information to a Large Language Model (LLM) while adhering to its limited input context window. By matching user queries to specific chunks based on meaning rather than exact keywords, chunk-level search ensures that AI-generated responses are grounded in pertinent factual context, significantly improving accuracy and reducing the noise associated with processing entire documents.

In semantic search, Elasticsearch’s nested field type works by indexing each object within an array as a separate, hidden document, which preserves the individual context of multiple vector embeddings within a single parent document. This approach prevents "cross-object" pollution—a common issue in standard object arrays where search criteria from different objects might incorrectly combine to trigger a match—by ensuring that similarity scores are calculated independently for each segment (e.g., individual paragraphs or sections). By treating each constituent part as a distinct searchable unit, the nested type allows for precise segment-level retrieval, enabling a large document to be accurately matched based on the specific semantic relevance of its sub-components rather than a single, diluted document-level average.

The best chunking strategy for RAG with large documents involves moving beyond arbitrary fixed-size token splits toward a structural or semantic approach, such as DOM Graph RAG or componentized content. Instead of treating documents as flat blobs of text, this method leverages the Document Object Model (DOM) to capture the document’s inherent hierarchy—including headings, tables, and lists—to preserve vital context and relationships. By breaking content into meaningful, structured components rather than random chunks, organizations can ensure that the retrieval process maintains the semantic integrity of the information, enabling more accurate multi-hop reasoning and significantly reducing hallucinations in complex or domain-specific environments.

To connect a knowledge graph to Elasticsearch for semantic retrieval, you typically use a specialized connector—such as the GraphDB Elasticsearch Connector—to synchronize RDF data into Elasticsearch indices at the entity level. This process involves mapping RDF property chains to Elasticsearch fields via SPARQL configuration, allowing the graph's structured relationships to be indexed alongside full-text content. For true semantic retrieval, you can leverage vector search by integrating an embedding model within the connector, which transforms graph entities into vector representations stored in Elasticsearch for similarity-based discovery. Once connected, you can perform hybrid queries that combine Elasticsearch's high-speed faceted search and vector similarity with the knowledge graph's complex reasoning capabilities through SPARQL, ensuring that retrieval is both contextually aware and semantically precise.

AI systems hallucinate when documents are too long primarily because of their probabilistic nature and the limitations of their context windows, which cause them to lose track of nuanced information in "lengthy interactions." When documents exceed a model's effective processing range, the system may fail to distinguish between relevant and irrelevant details, leading it to fill information gaps with statistical "guesses" or fabrications that sound plausible but lack factual grounding. This problem is often exacerbated by standard retrieval-augmented generation (RAG) methods that break long documents into arbitrary text chunks; this fragmentation strips away the structural and relational context necessary for the AI to understand the document as a whole, resulting in "context rot" where the model relies on patterns rather than verified facts.

RDF (Resource Description Framework) is a W3C-standardized graph data model that represents information as "triples"—structured statements consisting of a subject, predicate, and object. This framework provides the foundational structure for modern knowledge graphs, enabling disparate data sources to be integrated into a machine-readable network of interlinked concepts with formal semantics. In AI knowledge retrieval, specifically for systems like Retrieval-Augmented Generation (RAG), RDF-based knowledge graphs serve as a deterministic "ground truth" that complements probabilistic methods like vector search. By leveraging the explicit relationships and automated reasoning capabilities inherent in RDF, AI systems can retrieve highly precise, semantically accurate context, which improves the reliability and explainability of generated responses.