We’ll Reach the Semantic Web One Small Step at a Time

XFN and FOAF were two small steps in that direction, and Google just built on them with the Social Graph API (watch the friendly little video intro).

Any day now we’ll see an application that not only helps us generate XFN and FOAF data, but does so in a way that manages our online identities, particularly with regards to search. It’ll tip the balance of art and science in SEO toward science.

Top-down semantic web visions were judged by skeptical-but-realistic critics to be overly systematic. Well, yes, but if we get there a piece at a time, helping people understand, implement, experiment, and capitalize with each little piece, we’ll get there in an organic way.

Time to go generate some XFN…

Links to what others are saying.

Did you know there’s a vaccine for chickenpox?

I had no idea until a friend of mine, an adult, just caught chickenpox (she actually knew about the vaccine but her doctor told her she’d been ‘exposed’ and didn’t need it). Apparently the vaccine was approved in the U.S. in 1995. If you have never had chickenpox and haven’t been vaccinated, you can no longer say no one told you.

In 1995 I had my first New York City apartment and whatever I was doing, it wasn’t reading Prevention magazine. But this is the case with many people: it’s impossible to keep up with all the things we should know. And with all the hoopla over bottom-up information organization these days, my natural inclination is to go the other way and ask, how about a radically top-down, people-centric approach? What does that look like? Maybe something like this

Need To Know
Our educational system has failed to adapt to the changing information needs of the 21st century. The typical education lacks crucial information on such issues as managing our health, finances, and careers. Need To Know is a radically top-down approach to education. A massive research project is collecting data on millions of Americans to determine what most people do most of the time, and therefore what information they need to fruitfully live their lives. We will use this information to produce a one-hour television program composed of 360 ten-second video tutorials.

Ten seconds may not sound like a lot, but it’s enough to say, A vaccine for chickenpox was released in 1995. If you were born before 1977 and have never had chickenpox, go to a doctor and get the vaccine.

Better generalizations: finding true relationships between categories and traits

Malcolm Gladwell’s Troublemakers extends his Blink thinking to how we generalize. The takeaway is “It doesn’t work to generalize about a relationship between a category and a trait when that relationship isn’t stable — or when the act of generalizing may itself change the basis of the generalization.

In the article he asks whether pit bulls are dangerous dogs. It turns out they are only dangerous if bred, trained, or raised to be dangerous. A better indication of whether a dog may attack is if the dog displays aggressive behavior and has a negligent owner. Not an earth-shaking conclusion, but one we don’t always take the time to investigate.

Massaging social classification

There’s been some great conversation on social classification among myself, Jess, Stewart, Gene and Alex.

I just realized that James already built a system that combines the best of social and constructed categories, for example by creating equivalent associations on the backend to correct for too many, similar categories. His design has the virtue of having extremely nifty interaction design that let’s users type what they’re thinking while showing them what’s already there. Less cognitive load for the users, more yummy findability in the end.

Faceted Metadata and Choosing the Right User Interface

Tanya’s revealing of her technique for setting facets for her blog was the catalyst that reminded me of a comment Peterme made regarding facets: ‘The system can never know which particular strategy a given user wants to employ — so why not avail them of them all?‘ I don’t think he really means all facets we could imagine, I assume he means all facets that correspond to the most important metadata fields for a type of information.

I’m thinking about this because I’m working on a system that stores the information using a number of facets, but which is presented with a hierarchical browsing user interface: it displays information pre-filtered by a couple facets, and as you select items it displays more items further filtered by the facet you selected. I didn’t design it, but I must complement those who did as it probably (usability testing will confirm this) maps to the mental model of the user. And yet the faceted scheme on the back end keeps the data set flexible and available to display using alternate schemes.

And I mention this for two reasons: One, I’ve noticed a tendency to want to use every facet we have in the user interface instead of relying on our knowledge of users to serve them only the facets that are useful without extra clutter. Second, even experienced IAs want to literally represent the back end structure in the user interface. I see this especially with semantic networks of information stored as nodes and connections, where people want to rely on Thinkmap like interfaces. The information model just might be a better way to store and manage information that, in user interface form, corresponds to user’s built-in understanding of categories.

(Out of courtesy I should mention this is not a criticism of either person or site mentioned above, they were just catalysts.)

Radio Alphabet

A Alpha

B Bravo

C Charlie

D Delta

E Echo

F Foxtrot

G Golf

H Hotel

I India

J Juliet

K Kilo

L Lima

M Mike

N November

O Oscar

P Papa

Q Quebec

R Romeo

S Sierra

T Tango

U Uniform

V Victor

W Whiskey

X X-ray

Y Yankee

Z Zulu

Another sometimes used, for example by the California Police:

A Adam

B Boy

C Charles

D David

E Edward

F Frank

G George

H Henry

I Ida

J John

K King

L Lincoln

M Mary

N Nora

O Ocean

P Paul

Q Queen

R Robert

S Sam

T Tom

U Union

V Victor

W William

X X-Ray

Y Yellow

Z Zebra

Bottom Up

Peter Morville has an interesting new article at New Architect: Bottoms Up. In it he advocates more bottom-up design, reacting against overly reductionist top-down methods. But I think his philosophy is actually a balance of the two, and he’s trying to advocate this balance. Take this excerpt:

I have been flabbergasted in recent months by taxonomy construction projects in Fortune 500 companies. Some completely lack user research, and there is often a fierce resistance to discussing how the taxonomy will be used. Let’s just focus on the taxonomy, they say. We don’t want to get distracted by implementation details.

Interestingly, I’ve been experiencing the opposite scenario. Recently I’ve been meeting people, usually technologists toking at the XML pipe, who only want to do bottom up design. When I ask, ‘Who are the users? What are their intentions? What is the scope of your project?’ I find a lack of solid answers. Balance (of top-down and bottom-up) is my new rallying cry.

Ontology Building: A Survey of Editing Tools

Excerpts from Ontology Building: A Survey of Editing Tools:

With databases virtually all of the semantic content has to be captured in the application logic. Ontologies, however, are often able to provide an objective specification of domain information by representing a consensual agreement on the concepts and relations characterizing the way knowledge in that domain is expressed.

All ontologies have a part that historically has been called the terminological component. This is roughly analogous to what we know as the schema for a relational database or XML document. It defines the terms and structure of the ontology’s area of interest. The second part, the assertional component, populates the ontology further with instances or individuals that manifest that terminological definition. This extension can be separated in implementation from the ontology and maintained as a knowledge base.

CL – Common Logic is the emerging successor to the KIF ontology construction language.

The wide array of information residing on the Web has given ontology use an impetus, and ontology languages increasingly rely on W3C technologies like RDF Schema as a language layer, XML Schema for data typing, and RDF to assert data.

…tools, like Microsoft’s Visio for Enterprise Architects, use an object-oriented specification language to model an information domain (in this case, the Object Role Modeling language). These tools presently lack useful export capabilities, although independent tools to convert between UML and ontology languages like DAML+OIL are under development.

Methodology…in today’s tools…explicit support for a particular knowledge engineering methodology (like KADS) is not common.

Interoperability…Ontologies are for sharing…One consideration in the enterprise realm, for example, is the ability of a domain ontology to accommodate specialized XML languages and controlled vocabularies being adopted as standards in various industries. None of the current ontology editors address this capability. Interoperability, instead, is being addressed simply through an editor’s ability to import and export ontologies in different language serializations.

Usability…The standard approach is the use of multiple tree views with expanding and contracting levels. A graph presentation is less common, although it can be quite useful for actual ontology editing functions that change concepts and relations. The more effective graph views provide local magnification to facilitate browsing ontologies of any appreciable size. The hyperbolic viewer included with the Applied Semantics product, for example, magnifies the center of focus on the graph of concepts (without labeled relations). Other approaches like the Jambalaya plug-in for Protégé-2000 achieve a kind of graphical zooming that nests child concepts inside their parents and allow the user to follow relations by jumping to related concepts. Some practitioners however, such as GALEN users, indicate a preference for non-graphic views for complex ontologies.

Controlled Vocabularies in the Trenches

Has anyone written about what it’s like to create controlled vocabularies (CVs) in the context of actual project work? I can’t think of any. Below are some spurious notes of my recent experience, probably not understandable to anyone else ’cause I don’t have time to instruct. We’re racing to deadlines and I’m driving as fast as I can…

The exercise and deliverables can be rather abstract for some folks, especially if they view the world through a technology lens. It helps to create illustrations and screen shots to show what is meant and how terms are used.

CVs become a very helpful as a way of recording the tacit knowledge of an organization, helping everyone communicate and use their information. The process of creating the terms helped clarify their use for the team in understanding the business. (It’s tempting at this point to ramble on about language being symbols for meaning, so we’re actually controlling meaning, and those who control meaning have the power to define reality (the power to name – Jansen)).

If the CV is just informing other artifacts and not something that will be maintained or otherwise carried forward, state that so there is no confusion (e.g. do users of a CMS have to look at a thesaurus to discern meaning, or some other more user-friendly artifact?).

Illustrating the CV within the process: Business’s understanding of reality -> CV -> CMS Manual and CMS user interface -> data and metadata -> UI (e.g. web pages). The CV helps build a bridge between the organization and the user interface.

In all the talk of technology, IA, etc., ultimately people are coming for content. If the content sucks, the site sucks. (Content is king). Given that content creation and migration is also expensive, to properly honor the content developer’s work we need to put considerable thought into how content is created. A CV helps ensure the builders understand the subject domain and can communicate it to future content authors.

There’s never a perfectly controlled vocabulary; the number of terms is finite and language can only be clarified to a certain extent. It may be necessary to state this fact to set expectations. It involves judgment calls regarding which terms to control, how to control them (e.g. supplying the definition vs. restricting use via the user interface), and how to define them.

As with any design exercise, you need a scope, users and user intentions to guide the work. When defining the scope of the CV, determining the number of terms may seem arbitrary, but may be necessary given the other factors like time, money, and user response to the system.

CVs for CMS may be used in various ways: to populate menus in the UI (“hard” control), to offer examples, to define terms for a manual (“soft” control), to determine metadata relationships, etc. Specify what you’re using it for explicitly to show its value.

Not creating a CV before building a system can lead to expensive design and technology changes later if the designer’s conception of reality don’t match the users.

Looking into Storytelling

I’m designing a site for my sister’s fiance’s business, he does interior design and construction contracting. While my idea for the IA and navigation is simple and workable, it just ain’t compelling. About Us, Services, Portfolio [yawn]… I asked the designer to spruce it up and she was like, um, ok…

What do customers get from him in person? I think they get something like a story. Not told in the fashion of a story of course, but he tells them about what the company does (plot), where they do it (setting), who they are (characters), and – since he’s good at what he does – a happy ending.

I’ve attended an IBM seminar on using storytelling in design and caught the flavor of it, and thought it might work perfectly here. So, what do I need?

Some characteristics:

  • A single theme, clearly defined
  • A well developed plot
  • Style: vivid word pictures, pleasing sounds and rhythm
  • Characterization
  • Faithful to source
  • Dramatic appeal
  • Appropriateness to listeners

Dan Gruen from Lotus describes them as

  • Fleshed-out Characters
  • Detailed Settings
  • Goals and Obstacles
  • Causality
  • Dramatic Elements

Whereas he applies them as user-centered designers would use scenarios, I’m more interested in how the knowledge management people would use them, to actually convey information quickly and effectively, as well as compellingly. For example, I’ve heard that when the U.S. Secret Service needs to convey a lot of important information quickly – say, briefing the Secretary of State during a ride across town – they use a storytelling format.

I’m imagining the general storytelling format might make it more interesting and perhaps easier to digest the basic information even if the actual presentation – a few web pages in my case – don’t actually build up a whole lot of “dramatic elements”. I’ll retain the usual navigation so the visitors can bypass the story or get more details at the end.

A simple mapping resulting in four web pages:

  • Setting -> “about us” type content, the where, who and what with a sense of character development
  • Action -> A summary of the services, in language that describes the activities
  • Suspense -> A challenge to imagine how this could be benefit you, and a challenge to the visitor’s conviction
  • Resolution -> Testimonials that reflect happy endings, a list of references

Michael points to the article on narrative voice and I remember the advice I’m always giving others: remove that cold, corporate tone by writing in the the first person. (It’s harder to write ridiculous happy talk in the first person because it sounds ridiculous even to numb marketing types.)

Article: The Semantic Website

Get a pot of coffee or two in me and out pops another article, this time in Digital Web under the unwieldy title of Smarter Content Publishing, Building a semantic website to increase the efficiency and usability of publishing systems. Don’t bother reading it, I’ll sum it up for you:

  1. Manually marking up HTML is lame, computers should automatically do that for us
  2. The content management system trend is making publishing easier and less expensive (see Movable Type)
  3. CMS still has a lot of inefficiencies, requiring business users to think about web design instead of business
  4. Metadata to the rescue! Just keep layering the stuff until you’re talking business terms instead of design terms
  5. It’s hard, but that never kept us from using technology before
  6. It’s actually an application of the Semantic Web, but aren’t you glad I didn’t tell you that at the beginning?

If you do bother to read it, hopefully you won’t agree with this guy from the Web Ontology Working Group who wrote in to tell me it was great. ‘Cause that doesn’t help me improve. I need some constructive feedback. Did you like the topic but thought the level was too hard or too easy? Do you want more examples? More theory? Too long, too short? Tell me.

Introduction to Ontologies from McGuinness

Deborah L. McGuinness, ontology goddess, released Ontologies Come of Age, a chapter to an upcoming book. A relatively gentle introduction, along the way she illustrates the difference between controlled vocabularies and ontologies: the former have implicit is-a relationships and the latter have explicit is-a relationships (e.g. in a taxonomy a Merlot is a narrower term of Red Wine, whereas in an ontology a Merlot is-a Red Wine). Expressing those relationships explicitly helps computers understand what we understand. So it’s more like knowledge representation, though it relies on the classification techniques of controlled vocabularies.

She’s done hardcore research at Rutgers, AT&T, Lucent, & Stanford and seems to be looking for wider applications of this work via Sandpiper Software.