Introduction to Metadata

Our understanding of the world is facilitated by our ability to associate things, to compare and contrast, to categorize, and to form abstract relationships. To shape information in ways that allow others to better understand, we deliberately describe the information around us to shape it, creating new forms of knowledge. When communicating with computers, we can do this using metadata.

Metadata is simply a piece of information that describes other information. For example, let’s look at some text, a headline from nytimes.com:

Bush Continues to Push Congress for Resolution on Iraq


By THE ASSOCIATED PRESS 12:30 PM ET


President Bush today kept up pressure on Congress to approve action against Iraq amid new criticism from Democrats.


  • Video: Bush Speaks on Iraq Issues
  • C.I.A. and F.B.I. Defend Counterterrorism

The data in this case is the headline and summary:


Bush Continues to Push Congress for Resolution on Iraq


President Bush today kept up pressure on Congress to approve action against Iraq amid new criticism from Democrats.

The metadata is the surrounding information that helps us understand the context or to categorize the data:

Published by: THE ASSOCIATED PRESS


Publish time: 12:30 PM ET


Related information:

  • Video: Bush Speaks on Iraq Issues
  • C.I.A. and F.B.I. Defend Counterterrorism

There may also be other metadata that isn’t displayed but which helps the system display or organize the data:

Desk: National


Information Type: News


Format: Column

To allow readers to search or browse their news, the New York Times might collect one taxonomy of terms – a form of metadata – and display all these terms together. For example, the Desk taxonomy looks like this:

International


National


Politics


Business


Technology


Science


Health


Sports


New York Region


Education


Weather


Obituaries

This collection is called a metadata schema, meaning a systematic combination of elements.

Metadata can describe other things as well, such as people or places.

<--


There are several types of schemes that can be used when organizing metadata:


[ insert chart ]


adapted from “Levels of Control” from and “An Ontology Spectrum” from Deborah McGuiness


–>

Essentially, the benefits of these metadata schema are:


  • improved browsing and searching by making it easy for the users of a system to find information
  • improved communication among people by creating a common vocabulary
  • simpler maintenance by reducing chaotic use of language

Here’s some basic definitions to help tell the different kinds of schema apart:


  • Synonym Ring: A grouping of similar words or phrases. Synonyms might be used in a search engine by locating relevant information when someone searches on a related term.


  • Glossary: a collection of terms and definitions within a particular domain. A glossary could be used to simply help people agree and understand a common terminology.


  • Taxonomy: An arrangement and naming of metadata, usually hierarchical. A taxonomy might be a list of category names.


  • Faceted Taxonomy: A taxonomy with attributes and attribute values. If News is a term than an attribute could be Country and an attribute value of Country could be France.


  • Thesaurus: A taxonomy that also includes terms that are associated and terms that are related. The term Newspaper is associated with the term Journal and related to the term Town Crier.

  • The above are often referred to as “controlled vocabularies”. If we try to go beyond formal vocabularies and formalize our knowledge of a subject this is known as “knowledge representation”.


  • Ontology: the specification of one’s conceptualization of a knowledge domain. Ontologies resemble faceted taxonomies but use richer semantic relationships among terms and attributes, as well as strict rules about how to specify terms and relationships.

It might help to define some related terms:


Controlled Vocabularies – a defined set of preferred terms. Types of controlled vocabularies include Synonym Rings,


Authority Files, Taxonomies, Faceted Taxonomies, and Thesauri. Ontologies are not usually considered a form of controlled vocabulary but rather a form of knowledge representation.

Attribute – an aspect of an object, such as the publisher name. Attributes are alternately called “facets” when applied to taxonomies, “slots” when applied to ontologies, or “fields” when applied to databases.

Attribute Value – a value assigned to an attribute. For example the attribute “Publisher Name” can have a value of “New York Times”.

{show examples of all these}

A note on metatags: metadata and metatags are related, but are different things. Metatags are found within markup code (like HTML pages) to identify certain attributes of that information. Metadata goes *into* metatags, but metadata has many other uses as well.