Select Page

How AI and Taxonomy Builder Support the Building of Taxonomies

Why taxonomy building is harder than it looks, how generative AI addresses its core challenges, and why Graphwise's Taxonomy Builder delivers better, more workflow-integrated results than using public LLM tools directly.

Main Takeaways

  • Taxonomy building is harder than it looks — anticipating synonyms, avoiding bias, and writing machine-readable definitions for every concept is far more work than designing a simple hierarchy.
  • Use LLMs for branches, not the whole tree — AI handles synonym generation, term comparison, and gap-filling well; full taxonomy generation still produces messy, classification-style output.
  • Prompting is a skill most taxonomists don't have — purpose-built tools remove the barrier of crafting expert prompts every time you need a usable result.
  • Being inside the workflow is the real advantage — reviewing and accepting suggestions without leaving the tool beats copying and pasting from ChatGPT every time.

Taxonomies and ontologies are both fundamental semantic components of knowledge graphs, semantic layers, and the semantic backbone for enterprise data, information, and knowledge access. Taxonomy modeling, especially compared with ontology modeling, is relatively simple, since it is based on just hierarchies. Taxonomy building is not so easy. The apparent ease of taxonomy modeling, however, may result in not devoting enough resources to actual taxonomy building. 

Taxonomy building involves various challenges, including: 

  • Understanding the detailed concepts of a subject domain
  • Anticipating all the variant (alternative) names or labels for a concept that may occur in sources or searches
  • Knowing how best to construct unbiased, logical hierarchies that are machine-readable and can support various applications, instead of an individual’s navigation design
  • Supporting the various context-supporting requirements of AI systems, such as GraphRAG, including writing definitions for all concepts
  • Having sufficient time and resources to create a full taxonomy, especially when it’s part of a larger project that requires competing resources.

How AI addresses the challenges of building of taxonomies

Generative AI and LLMs can help with all of these challenges. In general, taxonomies connect varied users to varied content so anticipating possible text strings in searches and in target text is an important part of creating concepts and their labels. It is best to get viewpoints from multiple participants or stakeholders when creating a taxonomy.  LLMs are good at anticipating text strings, so LLMs are well-suited for generating taxonomies. Furthermore, LLMs can serve as “additional viewpoints” for creating a taxonomy, and can provide even more suggestions when a generation request is repeated. 

Understanding the subject domain

LLMs are trained on huge data sets in all subject domains, so they can define, relate, and compare concepts to create taxonomies instantly. In comparison, human taxonomists have limited subject matter expertise and require a lot of time to research concepts without AI. 

Generating sufficient variant/alternative labels for concepts

LLMs similarly identify all possible synonyms or alternative labels for concepts, whereas an individual human may think up far fewer and needs to spend time researching multiple sources to come up with more alternative labels.

Knowing how best to construct hierarchies:

Those who are not taxonomy experts tend to create taxonomy hierarchies modeled on navigation hierarchies or classification schemes that make sense to them alone, but are not valid taxonomies for general use. LLMs correctly interpret prompts for taxonomies, and do a reasonably good and consistent job at it. 

Creating definitions for concepts 

This is a tedious task for taxonomists or subject matter experts, who don’t usually spend their time writing definitions. LLMs extract from multiple sources to generate original and correct definitions, especially for the newer requirements of newer AI applications and GraphRAG systems. 

Having the time and resources to create a full taxonomy

Taxonomies are increasingly built as part of a larger system, such as a semantic layer, a knowledge graph, or a GraphRAG system. There greater resources are spent on other technical components, leaving limited time and resources for taxonomy development.. LLMs and generative AI are a great time saver. 

Why full taxonomy generation falls short

Taxonomists and subject matter experts have been experimenting with generative AI in taxonomy creation and its related tasks since as soon as these tools became available. Early attempts at generating full taxonomies were not acceptable. The LLMs either extracted from multiple sources and pieced them together in an inappropriate and inconsistent manner (resulting in, for example, the same concept appearing in different levels of the hierarchy or with different preferred labels), or the LLMs took original copyrighted taxonomies. 

Although taxonomy generation results have improved over the intervening years, generating a full taxonomy from the top is never perfectly suited for a specific use case. No matter how detailed your prompt is, it cannot sufficiently explain everything. It’s best to manually design the highest structure of the taxonomy, the concept schemes, to take into consideration your users, your data and content, and your implementation.

Where LLMs excel: taxonomy sub-tasks

LLMs have proven more successful in other tasks than building full taxonomies. Taxonomy owners usually use LLMs to assist with creating smaller parts of taxonomies, such as specific hierarchy branches. Taxonomists also use LLMs for various related tasks. 

These tasks have included, among other uses:

  • Generating a list of narrower concepts for a specific concept or category, especially in generic (non-proprietary) subject domains, such as science, technology or medicine.
  • Generating a list of synonyms/alternative labels for a concept
  • Comparing two technical (unfamiliar) terms to determine if they are synonyms, one is broader than the other, or if they are overlapping and/or related
  • Structuring a flat list of terms, such as extracted by text analytic, search logs, or user brainstorming, into a 2-3 level hierarchy
  • Generating SPARQL queries with instructions in order to generate reports or batch edit taxonomies in SKOS, which is based on the RDF data model
  • Developing stakeholder interview questions for the taxonomy design phase


AI generation for sub-tasks to the entire taxonomy generation allows more opportunity for human review. There is also a greater likelihood to get sources generated by the LLM for limited-scope tasks. In contrast, there is little transparency to the sources when generating an entire taxonomy, compared to more limited tasks.

What is also interesting is that both novice and experienced taxonomists, along with subject matter experts who are not taxonomists, use LLMs and Generative AI to assist with taxonomy creation and development. Thus, all can benefit from the integration of LLMs into taxonomy management systems, such as Graphwise Graph Modeling (PoolParty). 

What Graph Modeling’s Taxonomy Builder feature can do

Graphwise began developing ways to integrate LLMs into the taxonomy creation workflow of PoolParty as soon as LLMs became publicly available. The initial focus was on using LLMs to suggest (or advise) specific sub-tasks of taxonomy generation: narrower concepts, alternative labels, and definitions for a concept, in a feature called Taxonomy Advisor, introduced in PoolParty 2024. 

The ability to generate with AI a full hierarchical taxonomy, whether as a branch from any starting node or even from the start of an empty taxonomy project, was introduced earlier this year as the Taxonomy Builder feature with PoolParty 10.1. What was previously called Taxonomy Advisor has become the “Extend Your Taxonomy” function of Taxonomy Builder in PoolParty 10.2.

Taxonomy Builder builds or extends taxonomies that are customized for your domain by taking into account the context of the taxonomy you have created so far and the additional optional settings and instructions you provide. Alternative labels and definitions can be generated at the same time as the concept hierarchy generation or can be generated from an individual selected concept as part of its details. All generated concepts, alternative labels, and definitions are first presented as suggestions for the user to accept, edit, or deselect before committing to the taxonomy. 

Taxonomy builder advantages over publicly accessible LLMs

It’s true that  Taxonomy Builder does not do everything that LLMs possibly can do with respect to taxonomy development (although more features for it are planned). But using  Taxonomy Builder as a part of Graph Modeling has distinct advantages and benefits over simply going to a public generative AI web tool (ChatGPT, Claude.ai, Google Gemini, etc.)

Incorporation into the taxonomy building workflow

It’s far easier and quicker to select from among generated suggestions right within the Graph Modeling system. With a few clicks the user can add AI-generated concepts, alternative labels or definitions to the appropriate taxonomy node. This is more efficient than switching to another application and then copying and pasting results or reformatting generated answers into an importable spreadsheet. As a consequence, you are more likely to use LLMs for taxonomy creation when using Taxonomy Builder than if you had to go out to a web URL of a generative AI platform. This means you are more likely to build out a fuller taxonomy.

Use of the optimal combination of LLMs

On your own, you would need to select which generative AI tool to use, and you might not be sure which is best for the task. Taxonomy Builder utilizes an appropriate combination of LLMs, because different LLMs are better at different tasks pertaining to taxonomy development.

No need for writing complex, detailed prompts

A major obstacle to getting suitable generative AI results is the challenge of writing sufficiently clear and detailed prompts. Prompting for various taxonomy development tasks is not something that most taxonomists do often enough to get good at. 

Taxonomy Builder facilitates the task by taking information from multiple sources (project name and description, concept scheme name and description, taxonomy context, selections from among the setting, in addition to a user’s instructions) along with its own prescription prompt templates to create the optimal set of instructions. 

Taking existing taxonomy context into consideration

Taxonomy Builder instructs the LLMs to build out a taxonomy that follows the example of existing taxonomy concepts that are broader, narrower, or sibling concepts of the selected concept from which to generate taxonomy. Context informs the label format style, the specificity of concepts. For example, a hierarchy of food concepts will not be the same if they are of different contexts of agriculture production, grocery store offerings, recipes, or for nutritional recommendations. 

Quickly generating usable starter taxonomies

The “Build Your Taxonomy” feature in Taxonomy Builder very quickly generates a multi-level taxonomy with suitable concepts and added features of alternative labels and definitions, to which you may add more depth or more sibling concepts. By contrast, Generative AI web tools, in their free versions, either take too long, may time out and give up, or take too many follow-up instructions (using up the daily free allowance) before generating results in a usable form.

More suitable AI results with Taxonomy Builder

Comparison tests of full taxonomy generation with identical prompts in ChatGPT, Claude, Google Gemini, and Graphwise’s Taxonomy Builder for the same subject resulted in taxonomies with more suitable concepts for tagging in Taxonomy Builder. In comparison, the other generative AI applications had results that were more like classification schemes with comprehensive categories at each level.In the case of Claude, the hierarchical levels were even named (Domain, Category, and Tag).

It seems as if Taxonomy Builder has a better understanding of “taxonomy” for information retrieval. The other generative AI services also sometimes suggest incomplete or adjectively (and thus ambiguous) labels, such as Enterprise, Creative, and Infrastructure, to refer to types of software. 

In comparing results for generated alternative labels in the public generative AI tools (with the prompt for “synonyms or alternative labels”) and Taxonomy Builder, the public generative AI tools provide many more, not so suitable results, often with broader meanings. For example, for Productivity Software, synonyms generated included  Information Worker Software, Professional Tools, Collaboration & Productivity Tools, Workforce Software, and Enterprise Software. By contrast, the alternative label suggestions from Taxonomy Builder are more tight and focused: Business software, Business tools, Office applications, Office programs, Office software, Productivity tools, Task management software

Furthermore, the long, detailed answers that generative AI services now tend to provide can take time to read through. Taxonomy Builder provides more concise responses that can quickly be selected from.

To wrap it up

Taxonomy Builder provides appropriate results, of a suitable number, that can efficiently be reviewed and selectively included within the taxonomy creation and editing workflow. It assists subject matter experts, novice taxonomists, and experienced taxonomists. The result is a fuller taxonomy, built in less time, with better coverage — and an AI system that can actually use it.

FAQ

Any Questions? Look Here

AI accelerates taxonomy construction by leveraging Generative AI and Large Language Models (LLMs) to automate labor-intensive tasks such as hierarchy creation, synonym generation, and definition drafting. By using a "human-in-the-loop" approach, tools like the Graphwise Taxonomy Builder provide domain-specific suggestions that experts can review and approve, which can greatly reduce specialist labor and eliminate the "cold start" problem in new knowledge modeling projects.

A taxonomy is a hierarchically structured controlled vocabulary that organizes concepts into broader/narrower relationships to facilitate information retrieval and navigation. In contrast, an ontology is a more complex semantic model that defines a wider variety of relationships, attributes, and logical rules between entities, enabling machine-readable reasoning and a deeper representation of domain knowledge beyond a simple tree structure.

A good taxonomy for AI and search systems is defined by robust vocabulary control and clear hierarchical structure, serving as a "canonical truth" that unifies synonyms under unambiguous concepts. By organizing concepts into logical relationships and facets, it provides a semantic backbone that enables AI models to reduce hallucinations, supports semantic reasoning for contextually grounded answers, and bridges the gap between diverse user search terms and structured data for precise, relevant retrieval.

To keep a taxonomy up to date as your business evolves, it must be treated as a dynamic, ongoing process supported by a robust governance framework and documented policies for continuous revision. This involves establishing a collaborative workflow between taxonomists and subject matter experts using specialized management software to ensure terminology remains relevant across all departments. By leveraging automated capabilities such as corpus analysis and AI-driven concept suggestions, organizations can proactively identify emerging trends and concepts from current data, maintaining the taxonomy as an agile "single source of truth" that accurately reflects the changing business landscape.

Taxonomy quality is a critical factor in the success of Retrieval-Augmented Generation and AI retrieval, serving as the semantic backbone that ensures precise, context-aware information discovery. A high-quality taxonomy resolves ambiguity and normalizes language by mapping varied search terms to canonical concepts, which significantly reduces AI hallucinations and can increase retrieval accuracy from approximately 40% to nearly 80% in enterprise settings. By providing a robust structure of hierarchies and associative relationships, a well-governed taxonomy, extended with an ontology, enables advanced AI capabilities such as multi-hop reasoning and structured filtering, transforming "flat" document silos into interconnected knowledge graphs that ground AI outputs in trusted domain expertise.

The most common mistakes in building taxonomies include using vague or poorly distinguished terms, creating concepts that are either too granular or too broad for the specific use case, creating incorrect hierarchies, and failing to provide failing to provide appropriate synonyms. Additionally, organizations often struggle with siloed taxonomies managed in inefficient tools like spreadsheets, a lack of robust governance for long-term maintenance, and the failure to collaborate among interdisciplinary stakeholders.