Select Page

Blog

Talk to Your Graph on top of Nature FIRST Biodiversity Knowledge Graph

August 21, 2025
Reading Time: 10 min

We demonstrate how the GraphDB Talk to Your Graph feature enables natural language querying of the Nature FIRST Biodiversity Knowledge Graph. It uses LLMs to translate questions into SPARQL queries, combining the strengths of knowledge graphs for factual accuracy with LLMs for natural language understanding.

 

This work has been jointly done with Robert David, and Jan-Kees Schakel, CEO at SensingClues.

In this blog post, we explore the use of the experimental Talk to Your Graph (TTYG) feature (an off-the-shelf feature offered by Graphwise GraphDB) applied to the Nature FIRST Biodiversity Knowledge Graph1Ahmeti, A., Schakel, J.-K., David, R., & Revenko, A. (2023). Towards preserving Biodiversity using Nature FIRST Knowledge Graph with Crossovers. In I. Fundulaki, K. Kozaki, D. Garijo, & J. M. Gómez-Pérez (Eds.), Proceedings of the ISWC 2023 Posters, Demos and Industry Tracks: From Novel Ideas to Industrial Practice co-located with 22nd International Semantic Web Conference (ISWC 2023), Athens, Greece, November 6-10, 2023 (Vol. 3632). CEUR-WS.org. https://ceur-ws.org/Vol-3632/ISWC2023_paper_458.pdf. This functionality enables a more intuitive and conversational way to interact with complex graph data.

The TTYG  component is designed to lower the barrier to accessing and retrieving factual knowledge from a knowledge graph. It allows users to interact with it by asking questions via natural language, rather than requiring expertise in SPARQL. Behind the scenes, the questions are translated into SPARQL using configured large language models (LLMs) guided by a defined ontology that provides the necessary semantic context.

The GraphRAG approach

Essentially, users can inspect the generated SPARQL queries, offering transparency and interpretability. It makes this a practical example of a neuro-symbolic approach to question answering, often referred to as GraphRAG. Unlike traditional Retrieval-Augmented Generation (RAG), which uses LLMs both to interpret the query and generate the answer, GraphRAG leverages LLMs only for query translation and answer summarization. The actual retrieval of facts is performed by executing the SPARQL query directly on the knowledge graph.

This separation of concerns helps mitigate the risk of LLM hallucinations by ensuring that natural language queries are systematically translated into faithful SPARQL queries aligned with the user’s intent . We combine the strengths of knowledge graphs and LLMs to deliver accurate and context-aware answers. 

The knowledge graph provides trusted, domain-specific information — along with relevant synonyms — while the LLM excels at interpreting natural language, especially for general or domain-agnostic terms. To illustrate how these strengths complement each other in practice, consider the following example, in the question: “What does the Ursus arctos eat in the period of fall?”, the term Ursus arctos refers to the brown bear species and is best resolved using the knowledge graph. Meanwhile, the phrase “fall” is a seasonal term that the LLM can easily interpret as synonymous with “autumn”.

The Nature FIRST Knowledge Graph

We focus on the Nature FIRST Knowledge Graph, which integrates diverse datasets related to biodiversity and conservation. These include information on Natura 2000 sites, habitat data from multiple versions (2012, 2017, 2021), species data from EUNIS and the IUCN Red List, CORINE Land Cover classifications, and more.

Our attention will center on a specific segment of this graph — the Ecological Knowledge Model (EKM) — where the brown bear serves as the primary ecological stakeholder. This model illustrates how complex, cross-domain data can be linked to support ecological reasoning and decision-making2Ahmeti, A., David, R., Revenko, A., & Schakel, J. K. (2025). A neuro-symbolic data architecture to modeling and preserving nature: Predicting brown bear movement based on knowledge graphs. Available at SSRN 5169300. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5169300..

Fig. 1 Nature FIRST Knowledge graph [1]

EKM captures the seasonal needs and behaviors of the brown bear, mapping them to the habitats and species present in those habitats as represented in the Nature FIRST Knowledge Graph. Developed and maintained using PoolParty Thesaurus Manager and PoolParty Ontology Management, the EKM is accessible at sensingclues.poolparty.biz/EcologicalKnowledgeModel.html, alongside the SPARQL endpoint.

As a guiding mechanism, it is important to create concise, unambiguous definitions of ontological terms, that is — classes, properties and attributes alongside their labels, description, which translate to rdfs:label and rdfs:comment, respectively. Whenever possible, we have used the domain/range in the naming convention of labels to be leveraged by LLM, that is — for :preysOn relation we create rdfs:label “fauna preys on”.  Also, in specific cases we use rdfs:comment to specify possible values that the property objects can take, for example. for :groupBehavior we have specified values like “colonial-living, eusocial, group-living, herding, pair-living, solistic”. This can be crucial for LLM in picking the right property in the translation process, in this case :groupBehavior  instead of the (incorrect) one :hasLifestyle.

The model reuses core ontologies from Nature FIRST Knowledge Graph, such as those for Habitats and Species, and also integrates well-established external ontologies including OBO Relations and Darwin Core, ensuring semantic consistency and interoperability across domains.

Fig. 2 Ecological Knowledge Model (EKM) describing the brown bear

Technical configuration and setup

EKM has been exported to an external GraphDB instance, where the TTYG agent is configured. This agent is configured with both SPARQL-based and full-text search query methods. Full-text search plays an important role in identifying relevant IRIs based on labels — an essential step in accurate SPARQL query generation from natural language.

While TTYG allows for customization of LLM parameters such as model type, temperature, and top-p sampling, in our setup we rely on the default settings. For further configuration details, you can refer to the official GraphDB TTYG documentation.

A critical component of the setup is providing an appropriate input ontology describing the knowledge graph to guide the SPARQL translation process performed by the LLM (in our case gpt-4o). Since the Nature FIRST Knowledge Graph includes a broad mix of ontologies — many of which cover domains not immediately relevant to the EKM — we focus the model’s attention using a subset view of the ontologies. This view is derived from PoolParty’s custom schemes and materialized via a bespoke SPARQL query, resulting in an ontology, stored in a separate named graph urn:demo-ontology. This ontology includes only the classes, properties, and attributes used in the EKM, helping the LLM generate more accurate and contextually relevant SPARQL queries, and not let it invent new ones. Furthermore, it groups domain/range classes that use blank nodes in RDF lists to form “owl:unionOf (Class1 Class2 ... Class N)”. This is intended to reduce the burden on LLM to traverse and follow the blank nodes that constitute multiple domain/range classes.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
CONSTRUCT { ?s ?p ?o . ?o ?q ?r . ?r ?r1 ?r2 . ?r rdfs:domain ?domains . ?r rdfs:range ?ranges  }
FROM <https://sensingclues.poolparty.biz/system/customschemas>
WHERE {
   { 
        ?s a rdfs:Container ; ?p ?o . ?o ?q ?r . OPTIONAL { ?r ?r1 ?r2 . } 
	filter not exists { ?r rdfs:domain|rdfs:range ?r2 . filter (isBlank(?r2)) }
    }
    UNION {
        ?s a rdfs:Container ; ?p ?o . ?o ?q ?r . 
        ?r rdfs:domain ?r2 . filter (isBlank(?r2))
	BIND (CONCAT("owl:unionOf ( ", str(?class), " ", ?group," )") as ?domains)
        {
	   select ?r2 ?class (group_concat(?class2; separator=" ") as ?group)  
	   {
		?s a rdfs:Container ; ?p ?o . ?o ?q ?r . 
		?r rdfs:domain ?r2 . filter (isBlank(?r2))
 		?r2 owl:unionOf/rdf:first ?class . 
	    	?r2 owl:unionOf/rdf:rest+/rdf:first ?class2 . filter (!isBlank(?class2))   
            } group by ?r2 ?class
        }
	}
    UNION {
        ?s a rdfs:Container ; ?p ?o . ?o ?q ?r . 
        ?r rdfs:range ?r2 . filter (isBlank(?r2))
	BIND (CONCAT("owl:unionOf ( ", str(?class), " ", ?group," )") as ?domains)
        {
	   select ?r2 ?class (group_concat(?class2; separator=" ") as ?group)  
	   {
		?s a rdfs:Container ; ?p ?o . ?o ?q ?r . 
		?r rdfs:range ?r2 . filter (isBlank(?r2))
 		?r2 owl:unionOf/rdf:first ?class . 
	    	?r2 owl:unionOf/rdf:rest+/rdf:first ?class2 . filter (!isBlank(?class2))   
            } group by ?r2 ?class
        }
	}
}

Prompt configuration and query translation

Another key input is the prompt configuration. Within the TTYG interface’s Additional Instructions, we describe the datasets in use and instruct the LLM to rely on SKOS labels (for example, brown bear/bear/ursus arctos for prefLabel/altLabels) for retrieving answers. To further improve translation accuracy — especially given the use of SKOS and OWL, which adds complexity to query construction — we also provide a one-shot example.

This helps guide the LLM in translating natural language into precise SPARQL queries by demonstrating the desired structure and vocabulary. This focused approach is especially critical for ensuring that the correct ontological terms — classes and properties — are used with their proper corresponding prefixes. Since the knowledge graph integrates multiple ontologies with (possibly) overlapping or similar terms, using incorrect or mismatched prefixes can result in invalid queries that return no results.

As an example here is a pair of query and the associated SPARQL translation. 
What does the brown bear eat in the period of autumn 

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT distinct ?foodPrefLabel WHERE {
  ?bear a <https://sensingclues.poolparty.biz/Ecological-Knowledge-Model-Custom-Ontology/Fauna> ;
        skos:prefLabel|skos:altLabel|rdfs:label ?bearLabel ;	<https://sensingclues.poolparty.biz/Ecological-Knowledge-Model-Custom-Ontology/eatsMostly> ?food .
  ?food skos:prefLabel|skos:altLabel|rdfs:label ?foodLabel .
  FILTER(LANG(?foodLabel) = 'en')
  FILTER(lcase(str(?bearLabel)) = "brown bear")
  ?autumn a <https://sensingclues.poolparty.biz/Ecological-Knowledge-Model-Custom-Ontology/Season> ; 
 skos:prefLabel|skos:altLabel|rdfs:label ?seasonLabel .
 FILTER(lcase(str(?seasonLabel)) = "autumn")
 ?food <https://sensingclues.poolparty.biz/Ecological-Knowledge-Model-Custom-Ontology/inPeriod> ?autumn .
?food skos:prefLabel|rdfs:label ?foodPrefLabels . BIND (str(?foodPrefLabels) as ?foodPrefLabel)
}

Fig. 3 The configuration of Nature FIRST TTYG agent using SPARQL and full-text search

Testing with Competency Questions

After we have set up the agent, we can ask Competency Questions in relation to EKM. 

  • What is the habitat in which the brown bear and wild boar occur together?
  • Apart from nuts what does the brown bear eat as well?
  • Which species are solistic?
  • Which habitat has the most species?
  • Which habitat has the most typical species?
  • Which species are nocturnal?
  • Which point of interest attracts some species, and repels others?
  • Which species has the highest average distance per day in km?
  • Can you show me the species and their max speed?
  • What does the bear eat in spring?
  • Which species compete for food in the period of autumn?
  • Which species have the highest home range?
  • Which attribute values do rhinos have?
  • What are some of the attributes that they share based on the generic rhino concept?
  • Provide me with the typical species of habitats that the bear may occur in? 

In the following, we see an example of translating the query “What is the habitat in which the brown bear and wild boar occur together?”. One can inspect the generated SPARQL query as an explanation to the answer retrieved.

As another example, we ask TTYG the question “Which species compete for food in the season of autumn?”. We note that despite that the relation “compete for food” does not exist as a relation in the ontology, still the correct intent is picked by the LLM in terms of the relevant ontological properties. 

Note how the translated SPARQL query has been augmented with the filter expression FILTER (str(?species1Label) < str(?species2Label) && ...) as it picks the instructions explicitly provided in the Additional instruction part, in order to remove the symmetric pairs.

As the next step, we may ask TTYG to remove synonyms. As one can see in the following, in this case no queries were posed against the triple store, but rather only the LLM’s summarisation of the previously returned answer is being employed.

Advanced query processing and multi-hop reasoning

As a final example, we show the multi-hop query “Provide me with the typical species of habitats that the bear may occur in?” that spans EKM and Nature FIRST Knowldedge Graph.

To wrap it up

In the light of examples presented, we demonstrated the GraphRAG approach implemented via TTYG, where a large language model translates natural language queries into SPARQL, subject to the provided ontology, in order to retrieve factual knowledge from the Nature FIRST Knowledge Graph.

To improve the accuracy and reliability of this translation process, we have devised several best practices that are essential:

  1. Enhance ontology definitions with clear rdfs:label, rdfs:comment, and explicit domain/range declarations to provide context for the LLM. For multiple domain/range classes, we have used a more compact notation that avoids blank nodes.
  2. Limit the ontology scope to a relevant subset—or “view”—that reflects only the parts actually used in the data. This minimizes ambiguity and prevents confusion from overlapping term definitions.
  3. As working with SKOS-based vocabularies can pose additional challenges, we include a one-shot example in the prompt configuration. This helps guide the LLM in accurately interpreting and translating competency questions that involve SKOS structures.

These strategies taken together enable a more robust, interpretable, and low-hallucination method of querying complex, semantically rich knowledge graphs using natural language, as exemplified in the case of Nature FIRST Knowledge Graph.

You can try out the TTYG feature on top of Nature FIRST biodiversity knowledge graph in our Graphwise Sandbox environment.

Footnotes

  • 1
    Ahmeti, A., Schakel, J.-K., David, R., & Revenko, A. (2023). Towards preserving Biodiversity using Nature FIRST Knowledge Graph with Crossovers. In I. Fundulaki, K. Kozaki, D. Garijo, & J. M. Gómez-Pérez (Eds.), Proceedings of the ISWC 2023 Posters, Demos and Industry Tracks: From Novel Ideas to Industrial Practice co-located with 22nd International Semantic Web Conference (ISWC 2023), Athens, Greece, November 6-10, 2023 (Vol. 3632). CEUR-WS.org. https://ceur-ws.org/Vol-3632/ISWC2023_paper_458.pdf
  • 2
    Ahmeti, A., David, R., Revenko, A., & Schakel, J. K. (2025). A neuro-symbolic data architecture to modeling and preserving nature: Predicting brown bear movement based on knowledge graphs. Available at SSRN 5169300. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5169300.

Subscribe to our Newsletter