Blog post

Semantization of Regulatory Documents in AECO

November 29, 2024
Reading Time: 8 min

Learn how knowledge graphs combined with the latest natural language processing technologies aid regulatory information extraction and processing from unstructured data. The result is augmented traditional human-centric workflows that improve efficiency and enable the next generation of machine-ready applications for automated compliance checking.

Introduction

Unlike our not-so-distant hunter-gatherer ancestors, today most of us live in a built environment. According to the World Bank, about 56% of the world’s population (4.4 billion inhabitants) reside in cities. This trend is expected to continue and by 2050 the urban population is likely to be more than double its current size, with nearly 7 of 10 people living in cities. 

The rapid urbanization trend has serious implications for everyone inhabiting the planet and the rest of the biosphere. If we want to overcome the challenges of such transformations in sustainable ways, we need to look for solutions from multidimensional perspectives.

The role of knowledge graphs in AECO transformation

At present, knowledge graphs are the best-known technology capable of offering decentralized ways of going beyond existing data silos. They enable the interlinking of various data sources and provide deeper insights, considering multiple points of interest. One such central cross-junction of domains where the capabilities of a knowledge graph can address some urbanization challenges is Architecture, Engineering, and Construction (AECO). 

With a few exceptions, most cities did not appear overnight but have been rebuilt and redesigned repeatedly over the centuries. Some constructions have replaced previously demolished ones. Others have been readopted multiple times or gained heritage protection status. The practices and processes of planning, construction, renovation, and demolition have also evolved significantly over time. So have the related laws and regulations. 

In addition, digitalization has seriously changed the game for AECO, like for so many other industries. But even though technologies like Building Information Modelling (BIM) have finally introduced symbolic representation, in many ways, AECO still clings to outdated, analog practices and documents. 

Bridging AECO’s digital gaps with the ACCORD Project 

That is why projects like ACCORD (Automated Compliance Checks for Construction, Renovation or Demolition Works) couldn’t come at a better time. ACCORD is funded by the European Union’s Horizon Europe research and innovation programme (call HORIZON-CL4-2021-TWIN-TRANSITION-01).

The project aims to close some of the existing digitization gaps within the AECO industry by developing a semantic framework for European Digital Building Permitting. The framework will aid the digitization of permitting and compliance processes​. This will improve the productivity and quality of the design and construction processes and support a sustainable built environment​. 

To achieve its goals, the project focuses on formalizing permitting and compliance processes by using semantics. It covers use cases from Finland, Estonia, the UK, Germany, and Spain, addressing many aspects such as building accessibility, carbon footprint, fire safety, and urban planning. Here, one of the challenges involves digitizing the national specifics of regulatory documents and building codes in multiple languages. 

Another important aspect of digitization and automation of permitting compliance checks is standardization. ACCORD uses GraphDB to perform automated compliance checks as RDF (Resource Description Framework) knowledge graphs are compliant with W3C standards. Their strong rooting in formal logic also ensures a reliable way of automating manual permitting procedures. And, for automation to happen, the existing regulatory documents have to be converted from their original textual form into structured data and linked to the models where they apply. 

Since the first digitization attempts were made, the modeling of built environments has also evolved. This has resulted in heterogeneous models created in various applications and stored in multiple data formats. RDF is widely believed to be a universal standard that could facilitate the integration of other data standards, practices, and models. 

For example, it can aid the integration of BIM and Geography Information Systems (GIS). Semantization of building permits involves, at least partially, converting the existing building and land parcel models into more universal and homogeneous RDF datasets, where the different parts are interlinked. Storing this interlinked data in a knowledge graph provides unlimited querying capabilities supported by GeoSPARQL such as various compliance checks related to complex geometries.

Rule formalization process with knowledge graphs and the latest NLP technologies 

Over the past few decades, digitization within AECO was mainly designed for a human domain expert as the information consumer. This target user could understand and interconnect information stored in various formats and extract what was relevant for them to perform the necessary checks. 

However, the most recent advancements in natural language processing (NLP) have allowed software designers to take this one step further and also target machines as information consumers. ACCORD consortium partners have elaborated a pipeline to automate the process of extracting information from multiple documents in PDF into a form that allows machines to link relevant information. It is outlined in the diagram below (taken from the ACCORD Deliverable D2.3):

The first step is content extraction and conversion to structured data format where selected regulations are formalized into a clause form. In the next step, clauses are identified and logical relationships are formalized – either via an automated NLP process using a large language model (LLM) or by manual annotation and tagging using the RASE method. The rules are formalized automatically via NLP. Manual formalization requires expression authoring. Due to the inherent complexity of the regulatory texts and the present limitations of NLP techniques, the rule formalization process could not be fully automated. 

The RASE schema used in the manual part of the process is a formally defined markup format compatible with HTML. It renders normative, definitive, and descriptive text documents machine-interpretable. 

From the data flow point of view, the data transformation looks like the following:

The Ontotext research team chose YAML as the initial language for data serialization because it is way easier to read for humans. Its flavor YAML-LD adds context to the regulations parsed. Currently, YAML-LD cannot be loaded directly to GraphDB, so an intermediate step – YAML-LD to JSON-LD is applied to get regulation clauses in RDF.

Documents with rules formalized by the previously mentioned process were turned into RDF triples utilizing Architecture Engineering and Construction Compliance Checking and Permitting Ontology (AEC3PO), developed by the ACCORD consortium partners. This ontology allows professionals to explore, query, and understand various aspects of the compliance and permitting processes more comprehensively. You can check out these examples of applying the ontology to use cases from Finland, Estonia, Spain, and the UK. They cover regulation checks concerning accessibility, emissions, safety, cultural center, and timber structure and also demonstrate answers to the competency questions set for the ontology.

From theory to practice: an example

Below you can see a fragment of the Finnish Government Decree on Accessibility of Building that describes a limit value check: “a passageway of a ramp should be at least 1,200 mm width, with a smooth, hard and non-slippery surface”:

The YAML-LD file corresponding to that rule is the following: (Click to expand the code)
u002du002d-
'@base': https://regulations.accordproject.eu/
'@context':
- https://w3id.org/lbd/aec3po/aec3po.jsonld
- terms: https://identifier.buildingsmart.org/uri/accord/ACCORD-1.0/
  functions: https://functions.accordproject.eu/
$type:
- Document
subject: ANY
coverage: FI
title: Government Decree on Accessibility of building
issued: 2017-01-01
identifier: Government_Decree_on_Accessibility_of_building
$id: FI/Government_Decree_on_Accessibility_of_building
hasPart:
...
- $id: FI/Government_Decree_on_Accessibility_of_building/2
  identifier: u00222u0022
  $type:
  - DocumentSubdivision
  title: Passageway leading to a building
  hasPart:
  - $type:
    - Statement
    - RequirementStatement
    $id: FI/Government_Decree_on_Accessibility_of_building/2.1
    identifier: u00222.1u0022
    hasInlinePart:
    - $id: FI/Government_Decree_on_Accessibility_of_building/2.1.1
      identifier: 2.1.1
      asText: There shall be an easily noticeable
      $type:
      - Statement
    - asText: passageway
      $id: FI/Government_Decree_on_Accessibility_of_building/2.1.2
      identifier: 2.1.2
      isOperationalizedBy:
    $id: FI/Government_Decree_on_Accessibility_of_building/2.1.2_method
        identifier: 2.1.2_method
        hasBSDDTarget:
          $id: terms:type
          $type:
          - $id
        hasComparator: CheckMethodComparator-eq
        hasBSDDValue:
          $id: terms:Passageway
          $type:
          - $id
        $type:
        - CategoryCheckMethod
      $type:
      - CheckStatement
      - ApplicationStatement
    - $id: FI/Government_Decree_on_Accessibility_of_building/2.1.3
      identifier: 2.1.3
      asText: with a width of at least
      $type:
      - Statement
    - asText: u00221,200 millimetresu0022
      $id: FI/Government_Decree_on_Accessibility_of_building/2.1.4
      identifier: 2.1.4
      isOperationalizedBy:
        $id: FI/Government_Decree_on_Accessibility_of_building/2.1.4_method
        identifier: 2.1.4_method
        hasBSDDTarget:
          $id: terms:Width
          $type:
          - $id
        hasComparator: CheckMethodComparator-gt
        $type:
        - NumericalCheckMethod
        hasValue: u00221.2u0022
      $type:
      - CheckStatement
      - RequirementStatement
    - $id: FI/Government_Decree_on_Accessibility_of_building/2.1.5
      identifier: 2.1.5
      asText: and a
      $type:
      - Statement
    - asText: u0022smooth,u0022
      $id: FI/Government_Decree_on_Accessibility_of_building/2.1.6
      identifier: 2.1.6
      isOperationalizedBy:
        $id: FI/Government_Decree_on_Accessibility_of_building/2.1.6_method
        identifier: 2.1.6_method
        hasBSDDTarget:
          $id: terms:Smooth
          $type:
          - $id
        hasComparator: CheckMethodComparator-eq
        $type:
        - BooleanCheckMethod
        hasValue: u0022trueu0022
      $type:
      - CheckStatement
      - RequirementStatement
    - asText: hard
     $id: FI/Government_Decree_on_Accessibility_of_building/2.1.7
      identifier: 2.1.7
      isOperationalizedBy:
        $id: FI/Government_Decree_on_Accessibility_of_building/2.1.7_method
        identifier: 2.1.7_method
        hasBSDDTarget:
          $id: terms:Hard
          $type:
          - $id
        hasComparator: CheckMethodComparator-eq
        $type:
        - BooleanCheckMethod
        hasValue: u0022trueu0022
      $type:
      - CheckStatement
      - RequirementStatement
    - $id: FI/Government_Decree_on_Accessibility_of_building/2.1.8
      identifier: 2.1.8
      asText: and
      $type:
      - Statement
    - asText: non-slippery
      $id: FI/Government_Decree_on_Accessibility_of_building/2.1.9
      identifier: 2.1.9
      isOperationalizedBy:
        $id: FI/Government_Decree_on_Accessibility_of_building/2.1.9_method
        identifier: 2.1.9_method
        hasBSDDTarget:
          $id: terms:NonSlip
          $type:
          - $id
        hasComparator: CheckMethodComparator-eq
        $type:
        - BooleanCheckMethod
        hasValue: u0022trueu0022
      $type:
      - CheckStatement
      - RequirementStatement
After conversion into RDF/Turtle the same portion of text looks like this: (Click to expand the code)
@prefix aec3po: u003chttps://w3id.org/lbd/aec3po/u003e .
@prefix dct: u003chttp://purl.org/dc/terms/u003e .
@prefix owl: u003chttp://www.w3.org/2002/07/owl#u003e .
@prefix rdf: u003chttp://www.w3.org/1999/02/22-rdf-syntax-ns#u003e .
@prefix qudt: u003chttp://qudt.org/schema/qudt/u003e .
@prefix unit: u003chttp://qudt.org/vocab/unit/u003e .
@prefix xsd: u003chttp://www.w3.org/2001/XMLSchema#u003e .
@prefix ex: u003chttp://example.com/exampleOntology#u003e .


# Instantiate the Finnish Accessibility Document as a Document
ex:Finnish_Accessibility_Doc a aec3po:Document, owl:NamedIndividual ;
   dct:identifier u003chttps://ym.fi/documents/1410903/35099218/Accessibility+of+Buildings.pdf/56f06cd3-4a27-6ee3-e553-e35731ffa70b/Accessibility+of+Buildings.pdf?t=1680607572789u003e ;
   dct:issued u00222017-04-05u0022^^xsd:date ;
   dct:references u003chttps://ym.fi/en/the-national-building-code-of-finlandu003e ;
   dct:coverage aec3po:Finland .


# Instantiate the DocumentSubdivision that contains the Ramp example as a DocumentSubdivision
ex:Finnish_Accessibility_DocSubdivision_S2_SubS_2 a aec3po:DocumentSubdivision, owl:NamedIndividual ;
   dct:identifier u0022Finnish Accessibility/Section 2u0022 ;
   dct:title u0022Passageway leading to a buildingu0022 .


# Link the DocumentSubdivision to the Document using hasPart property
ex:Finnish_Accessibility_Doc dct:hasPart ex:Finnish_Accessibility_DocSubdivision_S2_Sub_2 .


# Instantiate the statement
ex:ramp_statement a aec3po:Statement , owl:NamedIndividual ;
   aec3po:asText u0022There shall be an easily noticeable passageway with a width of at least 1200 millimetres and a smooth, hard and non-slippery surface....u0022 .


# Link the statement to the DocumentSubdivision using hasPart property
ex:Finnish_Accessibility_DocSubdivision_S2_Sub_2 dct:hasPart  ex:ramp_statement .


#u002du002du002du002du002du002d-Instantiate the checkStatements expressed in the statement


# Instantiate the ramp_humanEvaluatedCheckStatement_Noticeable as a subClassOf CheckStatement
ex:ramp_humanEvaluatedCheckStatement_Noticeable a aec3po:HumanEvaluatedCheckStatement , owl:NamedIndividual ;
   aec3po:asText u0022There shall be an easily noticeable passagewayu0022 ;
   rdf:type aec3po:CheckStatement .


# Instantiate the ramp_humanEvaluatedCheckStatement_Surface as a subClassOf CheckStatement
ex:ramp_humanEvaluatedCheckStatement_Surface a aec3po:HumanEvaluatedCheckStatement , owl:NamedIndividual;
   aec3po:asText u0022...and a smooth, hard and non-slippery surfaceu0022 ;
   rdf:type aec3po:CheckStatement .


# Instantiate the ramp_numericalCheckStatement_Width as a subClassOf CheckStatement
ex:ramp_numericalCheckStatement_Width a aec3po:NumericalCheckStatement , owl:NamedIndividual;
   rdf:type aec3po:CheckStatement .


...


#u002du002du002du002du002du002d-Required Data informed by the Statement


# Instantiate the ramp as a FeatureOfInterest
ex:ramp a aec3po:FeatureOfInterest , owl:NamedIndividual ;
   rdf:type ifcOWL:ifcRamp ;
   #aec3po:hasContext u0022https://search.bsdd.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcRampu0022 .


# Define the properties of the ramp and constraints based on the Statement
ex:rampNoticeable a aec3po:Property , owl:NamedIndividual ;
   aec3po:hasValue u0022trueu0022^^xsd:boolean .


ex:rampSurface a aec3po:Property , owl:NamedIndividual;
   aec3po:asText u0022smooth, hard and non-slippery surfaceu0022 ;
   aec3po:hasValue u0022trueu0022^^xsd:boolean .


ex:rampWidth a aec3po:Property , owl:NamedIndividual ;
   aec3po:hasValue 1200 ;
   qudt:hasUnit unit:MilliM ;
   aec3po:hasComparator aec3po:CheckMethodComparisonOperator-ge .
...


ex:ramp_statement aec3po:hasRequiredData
   ex:rampNoticeable ,
   ex:rampStraight ,
   ex:rampSurface ,
   ex:rampWidth ,
   ...
.

Digitized regulation clauses are then available in knowledge graphs for further combination with 3D building information model and 2D land use parcels knowledge graphs to perform sophisticated SPARQL querying.

Getting it ready for the next generation of applications within AECO

The described rule formalization process within AECO showcases the use of a semantic knowledge graph combined with other technologies, like LLM-based NLP. This helps improve existing digitization practices and enables more automated extraction and linking of data contained in unstructured documents. Such data could be used for more automated checks, taking digitization beyond the traditionally human domain expert-targeted workflows and siloed data in single-purpose applications. 

It shows that this technology can play a key role in efficient information processing at scale to keep up with the pace of contemporary rapid urbanization. The building-related regulatory information extracted from unstructured text is stored in the knowledge graph and can be further linked to industry-specific models, such as BIM, and geolocated with GIS models. In this way, AECO data can be linked with data produced by other industries, covering various aspects like energy, transportation, the environment, and more.

In our next posts, we’ll look at how regulatory information transformed into a knowledge graph could be linked to 2D land parcel models and how geolocated 3D building information models can be converted into a knowledge graph conformant to Open Geospatial Consortium standards.

So stay tuned!

Meanwhile, you can download GraphDB and use documentation to see some examples of how this enriched data could be retrieved with GeoSPARQL!

ACCORD project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No:101056973. Views and opinions expressed are however those of the author only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.

Subscribe to our Newsletter

Subscribe to our Newsletter