kwc.org Photos Spare Cycles MythBusters

Tag clouds are teh suck

Zeldman discusses several of the problems with tag clouds, but I thought I'd hit on a couple of more from a different viewpoint.

First, as a primer, a tag cloud (as seen on my Flickr account, but also seen on sites like del.icio.us (experimental) and 43things):

 700   animal   ape2005   architecture   armstrong   beach   bike   bird   blue   boulders   bridge   buddha   bunny   cacti   california   castro   cave   chaparral   child   christinethornburg   christmas   cliff   condor   contrail   cute   cycling   deyoung   ekimov   endangered   evil   flight   flower   football   gate   gehry   getty   goldengatepark   green   halloween   herzog   house   incredibles   iris   japanesemaple   japaneseteagarden   lamb   lancearmstrong   landscape   leaves   licenseplate   lights   lizard   losangeles   maple   metaldetector   meuron   momiji   moon   morganhill   mountain   nationalpark   nerd   orange   pagoda   paulmccartney   peligro   pinnacles   pipes   rabbit   race   railing   red   richardmeier   robonexus   rock   sanfrancisco   sanfransciscograndprix   santamonica   sfmoma   sidewalk   sign   silhouette   sonoma   spiderman   spire   spires   stonelantern   stones   sunset   tattoo   teagarden   tmobile   tonybennett   tree 

Tag clouds follow a very basic principle: the font size of the word is scales linearly with the number of times the tag has been used.

At first glance, there appear to be several things right with this sort of display. You can see, for example, that I have a ton of photos tagged "Richard Meier", and that I have a lot more "architecture" photos than "ape2005" photos. IMHO, however, this is all fluff -- it's has the appearance of being a statistical visualization but instead conveys information crudely and inaccurately. For example, for each of these pairs, answer the question, "Which do I have more photos tagged with?"

  • japaneseteagarden or goldengatepark?
  • richardmeier or architecture?
  • sanfranciscograndprix or house?

With close examination you will probably get these right, but my point is that it takes a bit of thought (and you have the chance of getting it wrong). One of the fundamental problems is that the "tag cloud" display is using the size of the word to convey how many tags are associated with it. However, the size of the word is related to (a) the number of characters in the word (sanfranciscograndprix vs. house) and (b) the font size of the word, which grows in two-dimensions. Instead of trying to convey:

size of word ~= (# of tagged items)

we instead have the relation

size of word ~= (# of tagged items * length of word)2

So as a statistical display, it's bunk -- appearing to help you understand relative tag distribution, but not in an accurate manner.

Aesthetically, in order to try and convey this pseudo-statistical information, it completely throws the list out-of-whack: lines grow to arbitrary heights, one's ability to scan quickly across the entire list is lost, large words are constantly drawing your attention from smaller words, etc..., and, to borrow from Zeldman, navigation skews towards popularity rather than findability.

The fact that "richardmeier" is one of my most prominent tags entirely relates to the fact that (a) I took a ton of photos of the Getty one day, and (b) I was testing out my new Flickr Pro upload limits. They are not my "best" category of photos, I don't frequently take "richardmeier" photos, and they are not the photos I most want people to see. But the tag cloud design dictates that visitors will forever feel "richardmeier"'s gravitational force (that is, until I go crazy with another photo upload).

My own tag/category display could use some work, but I offer it here as a comparison (feel free to critique in the comments):

histogram

Post a comment


tags.

related entries.

what is this?

This page contains a single entry from kwc blog posted on May 4, 2005 10:10 PM.

The previous post was Bird stare-down.

The next post is untologies and my first (Flickr) group-forming experience.

Current entries can be found on the main page.