Select Page

Blog post

Bridging Legal Data and AI: Our Experience with Talk to Your Graph and GraphDB

April 17, 2025
Reading Time: 9 min
This article shares insights from Cognizone’s collaborative experience with GraphDB and its Talk to Your Graph interface, provided by Ontotext (now part of Graphwise).

Legal information is vast, complex, and heavily structured. While this structure is essential for ensuring precision, it makes querying legal data a difficult task. SPARQL1SPARQL – SPARQL Protocol and RDF Query Language, a query language for RDF databases. is a powerful query language for structured data but is largely inaccessible to those without technical expertise. Meanwhile, natural language queries, though intuitive, struggle with precision when applied to legal data.

Imagine querying complex legal databases as easily as having a casual conversation over coffee. This approachable user experience was our goal when integrating Talk to Your Graph with GraphDB — empowering anyone, from legal experts to non-technical users, to effortlessly find the precise information they need.

We set out to find a balance — one that allows users to ask questions in everyday language while retrieving structured and reliable results. Our journey with Talk to Your Graph (TTYG)2Talk to Your Graph (TTYG) – AI-powered natural language interface for querying graph databases. and GraphDB3GraphDB – A graph database optimized for semantic data and knowledge graphs. aimed to make this possible. By leveraging AI to generate SPARQL queries dynamically, we created a system that could retrieve legal information efficiently without requiring users to write complex database queries.

What we quickly discovered, however, was that AI does not inherently understand the nuances of legal relationships, document classifications, or metadata requirements4Legal metadata requirements – Metadata in legal documents include classifications, references, and validity status.. Without guidance, it often retrieved incomplete or irrelevant information. While these early challenges highlighted important realities about the limits of current AI technology, each hurdle became an opportunity to refine our approach and better align the AI’s understanding with our domain-specific ontology. The process of making AI useful for querying legal data required refining ontologies, defining constraints, and allowing AI to learn from its mistakes5AI learning from mistakes – AI iteratively refines its responses through error correction and feedback loops. iteratively.

This article outlines the practical steps we took — from configuring GraphDB to structuring AI-generated queries — and the key lessons we learned in making legal knowledge more accessible through AI.

The foundation of any AI-powered query system is the data itself. Our project relied on Swiss Fedlex data6Fedlex – Swiss federal legislative database., a comprehensive dataset of legal acts, amendments, classifications, and metadata. This dataset is structured following FRBR principles7FRBR – Functional Requirements for Bibliographic Records, a model for organizing legal and bibliographic information., which define legal documents at multiple levels: Work, Expression, Manifestation, and Item8Work/Expression/Manifestation/Item – Different levels of abstraction in FRBR.

To make this data usable, we loaded it into GraphDB, ensuring that it was indexed and classified properly. One of the first steps was enabling full-text search9Full-text search – A method to retrieve documents based on keyword matching. as a fallback for when AI queries did not return the expected results.

Despite these efforts, early AI-generated queries struggled. AI did not always recognize the correct property mappings between legal acts and their versions, and it frequently returned excessive or irrelevant results. The complexity of legal relationships required additional fine-tuning in order to guide AI-generated SPARQL queries effectively.

A major hurdle in using AI for legal queries was ontology interpretation10Ontology interpretation – How AI understands and processes relationships between data entities.. While GraphDB stored structured legal data correctly, the AI did not always understand which fields to prioritise when forming queries. One recurring issue was its assumption that legal document titles were stored under rdfs:label, when in reality, they were under jolux:title11rdfs:label vs. jolux:title – RDF properties used for labeling entities, with jolux:title used explicitly in the Fedlex ontology for legal document titles.. Because of this, many initial queries returned no results at all. Beyond mislabelling, AI also struggled to distinguish between legal acts, amendments, and supporting documents12Legal acts, amendments, and supporting documents – Different types of legislative documents that require distinct handling in SPARQL queries.. Without guidance, it frequently retrieved obsolete versions of laws or included draft texts instead of enacted ones.

Another key challenge emerged from the sheer size and complexity of the ontology itself. The complete JOLux ontology proved too large and intricate for TTYG to process effectively in its entirety, leading to performance degradation and query inaccuracies. To address this, we streamlined the ontology, focusing only on the core classes, properties, and relationships that were most frequently queried. By reducing complexity and explicitly instructing AI through carefully tailored Additional Instructions, we improved the system’s accuracy.

These instructions included:

  • Always use full URIs instead of namespaces.
  • Recognize FRBR relationships to appropriately structure queries for legal amendments.
  • Filter out non-enacted versions of laws unless specifically requested.

Throughout this refinement process, we particularly appreciated TTYG’s rapid and user-friendly setup, which allowed even team members without technical expertise to quickly engage with the data and actively contribute. While these adjustments dramatically improved query accuracy, we found that legal queries inherently remain complex, requiring ongoing refinements and additional safeguards to achieve consistently precise results.

Once the ontology was refined, we focused on improving how AI structured its queries. In theory, AI could generate SPARQL queries based on natural language input, but without constraints, it often constructed inefficient or overly broad queries13Inefficient or overly broad queries – AI-generated queries that return excessive data or fail to apply necessary constraints..

One of the most common issues was namespace confusion14Namespace confusion – AI-generated queries that incorrectly mix ontology namespaces, leading to query failures or incorrect results.. AI sometimes attempted to mix different legal document references, leading to queries that either timed out due to excessive data retrieval or failed entirely due to missing relationships.

To address these challenges, we explicitly reinforced AI’s use of full URIs15Full URIs vs. prefixes – Using fully qualified URIs in SPARQL to avoid incorrect namespace references. instead of relying on prefixes. We also followed an iterative debugging process, where we fed failed SPARQL attempts back into the system, allowing AI to self-correct and refine its query structure over time.

Even with these improvements, we still saw inconsistency in how legal filters were applied. AI sometimes retrieves all versions of a legal act rather than focusing on currently valid ones16Version control in legal queries – Ensuring AI retrieves currently valid legal acts instead of all historical versions.. To fix this, we had to introduce structured constraints that would guide query formation at a deeper level.

One of the most significant breakthroughs in improving AI-generated legal queries was the use of SHACL Application Profiles (APs)17SHACL (Shapes Constraint Language) – A W3C standard for defining constraints on RDF data, improving SPARQL query accuracy.. SHACL provided a way to define specific constraints for query structure, preventing AI from generating ambiguous or incorrect queries.

By implementing SHACL constraints, we ensured that queries:

  • Match FRBR Semantics of legislation
  • Only returned currently valid legal acts unless otherwise specified.
  • Correctly linked amendments to their respective legal texts.
  • Filtered results based on publication date and document format.

For instance, if a user asked, “Give me some cc with titles and identifiers with status In force” the AI-generated query was initially too broad, returning irrelevant legal drafts and supporting documents.

After introducing SHACL constraints, the same query became:

SELECT ?consolidationAbstract ?title ?identifier WHERE {
  ?consolidationAbstract a <http://data.legilux.public.lu/resource/ontology/jolux#ConsolidationAbstract> ;
    <http://data.legilux.public.lu/resource/ontology/jolux#isRealizedBy> ?expression ;
    <http://data.legilux.public.lu/resource/ontology/jolux#inForceStatus> ?status ;
    <http://data.legilux.public.lu/resource/ontology/jolux#classifiedByTaxonomyEntry>/<http://www.w3.org/2004/02/skos/core#notation> ?identifier .
  ?status <http://www.w3.org/2004/02/skos/core#prefLabel> "In force"@en .
  ?expression <http://data.legilux.public.lu/resource/ontology/jolux#title> ?title .
} LIMIT 10

By adding these SHACL constraints, our AI-driven queries transformed from merely functional to reliable — allowing legal professionals to trust results and reduce time spent on manual searching. These improvements made queries increasingly practical for everyday legal research tasks.

Our implementation of AI-driven legal queries demonstrated how AI can enhance access to structured data, especially through the rapid, user-friendly setup offered by TTYG — something even non-developers can readily leverage. However, we also found that achieving precise and consistently valuable results requires careful fine-tuning and ongoing refinement. Critically, the effectiveness of AI-driven queries relies heavily on having a structured and explicit ontology supported by well-defined SHACL Application Profiles.

Additionally, we can note that combining structured SPARQL queries with a full-text search fallback substantially increased the system’s robustness. This hybrid approach improved query accuracy and provided greater flexibility when handling complex or ambiguous requests. The key lessons from our experience include:

  • AI alone is not enough — it needs explicit constraints, well-defined ontologies, and structured guidance to produce useful SPARQL queries.
  • SHACL is an indispensable tool for validating and refining AI-generated queries, ensuring that results remain accurate and relevant.
  • Iterative improvement is necessary — debugging AI-generated queries is not a one-time process; it requires ongoing testing, refinement, and adaptation based on user behaviour.

We see potential in TTYG’s future — not only as a scalable tool that integrates smoothly into diverse platforms via APIs but also as an intelligent assistant capable of dynamically loading SHACL constraints and adapting effortlessly to new legal domains. Imagine being able to query newly added regulations, contracts, or cases immediately, with accuracy steadily improving as you refine your ontologies and constraints.

By expanding its capabilities and refining AI’s understanding of structured data, we can make complex legal and regulatory information genuinely accessible to everyone, whether they are lawyers, policymakers, researchers, or everyday citizens seeking legal clarity. Ultimately, bridging legal data and AI through tools like TTYG and GraphDB isn’t merely a technological advancement — it’s a powerful way to democratise access to legal clarity, enabling informed decisions faster than ever.

If you’re considering enhancing your data accessibility or have questions about integrating similar AI solutions, we encourage you to reach out to our team — we’d be happy to discuss how these approaches could benefit your organisation.

Footnotes

  • 1
    SPARQL – SPARQL Protocol and RDF Query Language, a query language for RDF databases.
  • 2
    Talk to Your Graph (TTYG) – AI-powered natural language interface for querying graph databases.
  • 3
    GraphDB – A graph database optimized for semantic data and knowledge graphs.
  • 4
    Legal metadata requirements – Metadata in legal documents include classifications, references, and validity status.
  • 5
    AI learning from mistakes – AI iteratively refines its responses through error correction and feedback loops.
  • 6
    Fedlex – Swiss federal legislative database.
  • 7
    FRBR – Functional Requirements for Bibliographic Records, a model for organizing legal and bibliographic information.
  • 8
    Work/Expression/Manifestation/Item – Different levels of abstraction in FRBR
  • 9
    Full-text search – A method to retrieve documents based on keyword matching.
  • 10
    Ontology interpretation – How AI understands and processes relationships between data entities.
  • 11
    rdfs:label vs. jolux:title – RDF properties used for labeling entities, with jolux:title used explicitly in the Fedlex ontology for legal document titles.
  • 12
    Legal acts, amendments, and supporting documents – Different types of legislative documents that require distinct handling in SPARQL queries.
  • 13
    Inefficient or overly broad queries – AI-generated queries that return excessive data or fail to apply necessary constraints.
  • 14
    Namespace confusion – AI-generated queries that incorrectly mix ontology namespaces, leading to query failures or incorrect results.
  • 15
    Full URIs vs. prefixes – Using fully qualified URIs in SPARQL to avoid incorrect namespace references.
  • 16
    Version control in legal queries – Ensuring AI retrieves currently valid legal acts instead of all historical versions.
  • 17
    SHACL (Shapes Constraint Language) – A W3C standard for defining constraints on RDF data, improving SPARQL query accuracy.

Subscribe to our Newsletter