Noise Between Stations

Category: Controlled Vocabulary

Falseonomies

The Clay Shirky spin on the ethnoclassification discussion is amusing. How such a large percentage of the community manages to discuss a classification technique without explicitly illustrating how it aids in retrieval misses the whole point of classification. But I’ve already covered that ground, so let’s move along to the more interesting, subtle developments.

At one point along that spectrum of control (’cause it ain’t just a dichotomy of ethnoclassification vs. controlled vocabulary) is Peterme suggesting a Flickr tag for a specific purpose. We might say he’s playing librarian and exerting control over this tag and bending others to his preferred method of classification. Or, we might say he’s a user heroically and socially distributing this tag in a cheap, scalable way. But really, in 2005, what’s the difference? We have authority control exerted on Flickr, and distributed contributions to traditional thesauri. Once again, we see balance is a good thing, and to each context its own balance.

1/21/2005
Ethnoclassification and retrieval
I’ve been doing some product development consulting for a content-centric company, and started to wonder if ethnoclassification could benefit them somehow. I was hard-pressed to think of how, after the content was tagged, the tags would lead to better sets of content. Although they have a lot of content, they only have a few authors who do the classification. When I returned to what others have written on the topic, I think the idea of ethnoclassification was so enthralling we didn’t look as closely at how it facilitates retrieval and the relationship to the content set and authoring process.

It seems that selecting a classification technique should factor in how much control we want over retrieval, how much content there is, and how many authors there are. Ethnoclassification becomes more useful when there’s a ton of authors and you can start to see relationships emerge among tags.

For example:
- Large content set, many authors, low control over retrieval: Ethnoclassification makes sense when seredipitous retrieval is desired and a critical mass of authors are generating tags, like on Flickr
- Large content set, many authors, high control over retrieval: A robust structure is needed to ensure end users can find information, like the ontology-driven Cisco.com
- Medium content set, medium authors, medium control over retrieval: This situation allows for interesting hybrids, like the Pratt Talent site that balances ethnoclassification with a controlled vocabulary
- Medium content set, medium authors, high control over retrieval: traditional controlled vocabularies/taxonomies make sense, and scale down to the point where there is so little content/few authors that no classification is needed
12/4/2004
Massaging social classification

There’s been some great conversation on social classification among myself, Jess, Stewart, Gene and Alex.

I just realized that James already built a system that combines the best of social and constructed categories, for example by creating equivalent associations on the backend to correct for too many, similar categories. His design has the virtue of having extremely nifty interaction design that let’s users type what they’re thinking while showing them what’s already there. Less cognitive load for the users, more yummy findability in the end.

8/24/2004
Faceted Metadata and Choosing the Right User Interface

Tanya’s revealing of her technique for setting facets for her blog was the catalyst that reminded me of a comment Peterme made regarding facets: ‘The system can never know which particular strategy a given user wants to employ — so why not avail them of them all?‘ I don’t think he really means all facets we could imagine, I assume he means all facets that correspond to the most important metadata fields for a type of information.

I’m thinking about this because I’m working on a system that stores the information using a number of facets, but which is presented with a hierarchical browsing user interface: it displays information pre-filtered by a couple facets, and as you select items it displays more items further filtered by the facet you selected. I didn’t design it, but I must complement those who did as it probably (usability testing will confirm this) maps to the mental model of the user. And yet the faceted scheme on the back end keeps the data set flexible and available to display using alternate schemes.

And I mention this for two reasons: One, I’ve noticed a tendency to want to use every facet we have in the user interface instead of relying on our knowledge of users to serve them only the facets that are useful without extra clutter. Second, even experienced IAs want to literally represent the back end structure in the user interface. I see this especially with semantic networks of information stored as nodes and connections, where people want to rely on Thinkmap like interfaces. The information model just might be a better way to store and manage information that, in user interface form, corresponds to user’s built-in understanding of categories.

(Out of courtesy I should mention this is not a criticism of either person or site mentioned above, they were just catalysts.)

8/12/2003
Radio Alphabet

A Alpha

B Bravo

C Charlie

D Delta

E Echo

F Foxtrot

G Golf

H Hotel

I India

J Juliet

K Kilo

L Lima

M Mike

N November

O Oscar

P Papa

Q Quebec

R Romeo

S Sierra

T Tango

U Uniform

V Victor

W Whiskey

X X-ray

Y Yankee

Z Zulu

Another sometimes used, for example by the California Police:

A Adam

B Boy

C Charles

D David

E Edward

F Frank

G George

H Henry

I Ida

J John

K King

L Lincoln

M Mary

N Nora

O Ocean

P Paul

Q Queen

R Robert

S Sam

T Tom

U Union

V Victor

W William

X X-Ray

Y Yellow

Z Zebra

2/20/2003
Controlled Vocabularies in the Trenches

Has anyone written about what it’s like to create controlled vocabularies (CVs) in the context of actual project work? I can’t think of any. Below are some spurious notes of my recent experience, probably not understandable to anyone else ’cause I don’t have time to instruct. We’re racing to deadlines and I’m driving as fast as I can…

The exercise and deliverables can be rather abstract for some folks, especially if they view the world through a technology lens. It helps to create illustrations and screen shots to show what is meant and how terms are used.

CVs become a very helpful as a way of recording the tacit knowledge of an organization, helping everyone communicate and use their information. The process of creating the terms helped clarify their use for the team in understanding the business. (It’s tempting at this point to ramble on about language being symbols for meaning, so we’re actually controlling meaning, and those who control meaning have the power to define reality (the power to name – Jansen)).

If the CV is just informing other artifacts and not something that will be maintained or otherwise carried forward, state that so there is no confusion (e.g. do users of a CMS have to look at a thesaurus to discern meaning, or some other more user-friendly artifact?).

Illustrating the CV within the process: Business’s understanding of reality -> CV -> CMS Manual and CMS user interface -> data and metadata -> UI (e.g. web pages). The CV helps build a bridge between the organization and the user interface.

In all the talk of technology, IA, etc., ultimately people are coming for content. If the content sucks, the site sucks. (Content is king). Given that content creation and migration is also expensive, to properly honor the content developer’s work we need to put considerable thought into how content is created. A CV helps ensure the builders understand the subject domain and can communicate it to future content authors.

There’s never a perfectly controlled vocabulary; the number of terms is finite and language can only be clarified to a certain extent. It may be necessary to state this fact to set expectations. It involves judgment calls regarding which terms to control, how to control them (e.g. supplying the definition vs. restricting use via the user interface), and how to define them.

As with any design exercise, you need a scope, users and user intentions to guide the work. When defining the scope of the CV, determining the number of terms may seem arbitrary, but may be necessary given the other factors like time, money, and user response to the system.

CVs for CMS may be used in various ways: to populate menus in the UI (“hard” control), to offer examples, to define terms for a manual (“soft” control), to determine metadata relationships, etc. Specify what you’re using it for explicitly to show its value.

Not creating a CV before building a system can lead to expensive design and technology changes later if the designer’s conception of reality don’t match the users.

10/29/2002