Introduction to Metadata
Our understanding of the world is facilitated by our ability to associate things, to compare and contrast, to categorize, and to form abstract relationships. To shape information in ways that allow others to better understand, we deliberately describe the information around us to shape it, creating new forms of knowledge. When communicating with computers, we can do this using metadata.
Metadata is simply a piece of information that describes other information. For example, let's look at some text, a headline from nytimes.com:
Bush Continues to Push Congress for Resolution on Iraq
By THE ASSOCIATED PRESS 12:30 PM ET
President Bush today kept up pressure on Congress to approve action against Iraq amid new criticism from Democrats.
The data in this case is the headline and summary:
Bush Continues to Push Congress for Resolution on Iraq
President Bush today kept up pressure on Congress to approve action against Iraq amid new criticism from Democrats.
The metadata is the surrounding information that helps us understand the context or to categorize the data:
Published by: THE ASSOCIATED PRESS
Publish time: 12:30 PM ET
Related information:
There may also be other metadata that isn't displayed but which helps the system display or organize the data:
Desk: National
Information Type: News
Format: Column
To allow readers to search or browse their news, the New York Times might collect one taxonomy of terms - a form of metadata - and display all these terms together. For example, the Desk taxonomy looks like this:
International
National
Politics
Business
Technology
Science
Health
Sports
New York Region
Education
Weather
Obituaries
This collection is called a metadata schema, meaning a systematic combination of elements.
Metadata can describe other things as well, such as people or places.
<--
There are several types of schemes that can be used when organizing metadata:
[ insert chart ]
adapted from "Levels of Control" from and "An Ontology Spectrum" from Deborah McGuiness
-->
Essentially, the benefits of these metadata schema are:
Here's some basic definitions to help tell the different kinds of schema apart:
The above are often referred to as "controlled vocabularies". If we try to go beyond formal vocabularies and formalize our knowledge of a subject this is known as "knowledge representation".
It might help to define some related terms:
Controlled Vocabularies - a defined set of preferred terms. Types of controlled vocabularies include Synonym Rings,
Authority Files, Taxonomies, Faceted Taxonomies, and Thesauri. Ontologies are not usually considered a form of controlled vocabulary but rather a form of knowledge representation.
Attribute - an aspect of an object, such as the publisher name. Attributes are alternately called "facets" when applied to taxonomies, "slots" when applied to ontologies, or "fields" when applied to databases.
Attribute Value - a value assigned to an attribute. For example the attribute "Publisher Name" can have a value of "New York Times".
{show examples of all these}
A note on metatags: metadata and metatags are related, but are different things. Metatags are found within markup code (like HTML pages) to identify certain attributes of that information. Metadata goes *into* metatags, but metadata has many other uses as well.
Saturday, September 28, 2002 | Permalink | Filed in Knowledge Base-Driven Systems