Select Page

Fundamentals

What is Taxonomy?

A taxonomy is a controlled, organized set of unambiguous concepts that describe content, information, or data, and that which users may be interested in querying about.
Reading Time: 7 min

We hear a lot about knowledge graphs, semantic layer, GraphRAG, and increasingly even ontologies, as essentially methods for complex enterprise information management and retrieval. Less attention may be given to taxonomies, even though they typically serve as the fundamental backbone knowledge organization structure within these larger systems.

Whether your goal is to build an enterprise knowledge graph, a GraphRAG application, or an improved semantic search application, a taxonomy is a key component in any system that enables users to find or discover the information they need.

What are taxonomies

A taxonomy is a controlled, organized set of unambiguous concepts that describe content, information, or data, and that which users may be interested in querying about. A taxonomy links users to the information they seek by bringing together various users’ search terms with the terms that occur in the content or data. A taxonomy often provides the values for metadata properties. A taxonomy, whether browsed or searched, primarily serves as an intermediary between users and the content they seek, although it may have other uses.

In a taxonomy, varied terms or labels for the same thing are brought together in a single concept, and the concepts are organized into groupings or categories of similar types, often further organized into hierarchies. This organization of the concepts supports browsing for and identifying the desired concept within its context, whether for tagging or for retrieving information. Prior to the emergence of modern taxonomies in digital environments, indexes at the back of printed books or in collected volumes for journal articles had been serving a similar role.

A taxonomy thus can refer to various kinds of knowledge organization systems. So, it may help to clarify what a taxonomy is not.

What taxonomies are not

Although taxonomies have become increasingly common within enterprises and on websites, they are not always well understood by those who want to implement them. To understand a taxonomy, it also helps to distinguish it from similar systems:

  • Classification systems: Classification schemes (for example, industrial or medical classifications codes) have mutually exclusive classes to which items are assigned. This allows for comparison, analysis, identification, location, and other actions. Taxonomies, instead, allow concepts to exist in multiple places for discovery.
  • Navigation systems: While taxonomies can be navigated, they are not the same as navigation systems. Taxonomies are more similar to “Indexes” (one concept, many tags) that can be browsed and searched for their concepts and which can grow and change continuously as needed. Navigation systems act as “Table of Contents” (one link per page), common in websites or web applications for browsing, which are not frequently or easily updated.

History of taxonomies for information retrieval

Taxonomies used by businesses for information management and retrieval have their roots in library knowledge organization systems, such as Library of Congress Subject Headings (LCSH), which helped patrons find books and media by subject. Information thesauri emerged in the mid-20th century for the more specialized needs of government agencies, scientific publishers, and large engineering and technology companies, introducing hierarchical (broader/narrower term), associative (related term), and synonymous equivalence relationships.

The word “taxonomy” for a hierarchical structure of terms used for tagging and retrieval became popular in the 1990s, driven largely by advances in software and web interfaces that enabled interactive browsing of hierarchies. The combination of interactive interfaces with database technology also gave rise to faceted taxonomies — dynamic filtering of search results by selected taxonomy terms — which have since become a dominant implementation across many use cases.

To support machine-readability and interoperability of taxonomies in web-based applications, the World Wide Web Consortium published the Simple Knowledge Organization System (SKOS) standard in 2009. Most taxonomies and thesauri today are based on SKOS, including Graphwise Modeling, which is also interoperable with ontology data models and standards.

Taxonomy features

A taxonomy is defined by two primary features: Vocabulary Control and Structure. Together, these transform a simple list of words into a powerful information service tool:

  • Vocabulary Control: Brings together synonyms (for example, “car” = “automobile”), under a single concept. This ensures a search on either term retrieves both.
  • Structure (Hierarchies and Facets): Allows users to browse from broad to specific without guessing keywords. A faceted taxonomy structure additionally supports searching for or filtering results on multiple aspects simultaneously, giving the user more control over complex search queries. Hierarchies also have the benefit of providing context for more accurate tagging.

Additional, optional features of taxonomies include related-concept relationships (borrowed from thesauri), notes, and definitions for concepts.

Graphwise Modeling screenshot showing hierarchy structure in the left pane and vocabulary control in the right pane.

Taxonomy benefits

Taxonomies have multiple uses beyond hierarchies or facets for users to browse. Most fundamentally, taxonomies support consistent and comprehensive tagging. Once content is properly tagged (or annotated, assigned with metadata, or made “structured”), the benefits extend far beyond traditional keyword matching:

  • Semantic search: User queries are matched to semantics (meaning) of the taxonomy concept rather than the words found in the text of searched content. This significantly reduces “noise” in search results.
  • Knowledge discovery: Users can find valuable information they didn’t know existed by surfacing related concepts through hierarchical browsing. 
  • Consistent descriptive metadata: As taxonomy concepts are often applied as metadata to content, taxonomies provide structure to unstructured content, and enable not just search and retrieval but also deep analysis, comparison, and automated workflows

Taxonomy applications

The applications for a well-built taxonomy are virtually limitless and scale across the entire organization:

  • Content management (CMS/DAM): Setting up curated content in alerts, feeds, or info boxes. Managing and retrieving content for content reuse (for example, proposals, presentations, reports, marketing content, images, and video files).
  • Internal operations: Supporting Data Management systems, Intranets, Learning Management systems, HR systems (for example, connect employees with the right training and compliance documentation)
  • Governance and workflows: Driving automated workflows for document retention, audience targeting, and intellectual property rights management. Organizing company information, research information for R&D, and document management for regulatory compliance.
  • Combined with ontologies: Supporting unified search and discovery for content and data, especially when implemented as part of a semantic layer.
  • GraphRAG: Providing semantic context for LLMs in support of Retrieval Augmented Generation

    Benefits of combining taxonomies with ontologies

    Taxonomies are especially valuable when combined with an ontology to enrich the taxonomies’ semantics. An ontology is a model of a knowledge domain that defines and describes the types, properties and interrelationships (classes, attributes and relations) of entities in a particular domain, in which the relations contain meaning or are “semantic.” 

    Combining or extending a taxonomy with an ontology provides many capabilities beyond what can be done with taxonomies alone: 

    • Complex querying: Supports multi-part searches across diverse data sets, not just text-based content.
    • Advanced logic: Enables reasoning and inferencing (for example, the system can “infer” that a specific document belongs to a project based on related metadata). It also enables better visualization of concepts and their semantic relationships.
    • Deep personalization: Powers recommendation engines by understanding the explicit relationships between products, users, and topics. This ultimately supports the building of better personalization and recommendation systems, and AI-based applications. 

    Additionally, taxonomies bring linguistic features that ontologies alone lack. Taxonomies have alternative labels for the same concept, which allows users to search for concepts with different names and supports consistent tagging of the same concepts which are named differently in different sources. Taxonomies also support multilingual labels for concepts, so users can search in their own language, and content in different languages can be tagged for retrieval. 

    Conclusion

    Taxonomies are far more than simple lists of terms. At their core, they serve as the fundamental connective structure between users and the information they seek — bringing order to complexity, consistency to content, and meaning to data that would otherwise remain siloed and hard to surface. As enterprises increasingly invest in AI-driven applications, the taxonomy’s role has not diminished; if anything, it has become more critical, providing the semantic backbone that makes larger systems actually work.

    A taxonomy is also the foundation on which broader semantic architectures are built. Combined with a knowledge graph, it enables unified search and seamless discovery across an enterprise’s repositories, platforms, and applications — and together they form the core of a semantic layer that exposes meaning to users and tools across the organization. Whether the goal is better search, a GraphRAG application, or a full enterprise knowledge graph, the taxonomy is where it all begins.

    Subscribe to our Newsletter