Tufte talk at Stanford


I don't have too much to say about this talk, as it was mostly Tufte recounting his life as an academic all the way from his undergraduate education at Stanford to posts at both Yale and Princeton, followed by a slideshow of his more recent sculpture work. There was one very fun moment for us GG followers:

Edward Tufte, master of the display of statistical information, has put up a slide with sparklines and Galileo's description of Saturn. Suddenly, he gasps -- the projection screen is too far away! How will he ever properly point to Galileo's illustration of Saturn's rings?!?!?

Never fear, GadgetGuy is here -- sitting in the third row, actually, taking notes on his MacBook. GadgetGuy quickly reaches into his pocket and produces his trusty green laser pointer. The talk is saved!

Tag clouds are teh suck


Zeldman discusses several of the problems with tag clouds, but I thought I'd hit on a couple of more from a different viewpoint.

First, as a primer, a tag cloud (as seen on my Flickr account, but also seen on sites like (experimental) and 43things):

 700   animal   ape2005   architecture   armstrong   beach   bike   bird   blue   boulders   bridge   buddha   bunny   cacti   california   castro   cave   chaparral   child   christinethornburg   christmas   cliff   condor   contrail   cute   cycling   deyoung   ekimov   endangered   evil   flight   flower   football   gate   gehry   getty   goldengatepark   green   halloween   herzog   house   incredibles   iris   japanesemaple   japaneseteagarden   lamb   lancearmstrong   landscape   leaves   licenseplate   lights   lizard   losangeles   maple   metaldetector   meuron   momiji   moon   morganhill   mountain   nationalpark   nerd   orange   pagoda   paulmccartney   peligro   pinnacles   pipes   rabbit   race   railing   red   richardmeier   robonexus   rock   sanfrancisco   sanfransciscograndprix   santamonica   sfmoma   sidewalk   sign   silhouette   sonoma   spiderman   spire   spires   stonelantern   stones   sunset   tattoo   teagarden   tmobile   tonybennett   tree 

Tag clouds follow a very basic principle: the font size of the word is scales linearly with the number of times the tag has been used.

At first glance, there appear to be several things right with this sort of display. You can see, for example, that I have a ton of photos tagged "Richard Meier", and that I have a lot more "architecture" photos than "ape2005" photos. IMHO, however, this is all fluff -- it's has the appearance of being a statistical visualization but instead conveys information crudely and inaccurately. For example, for each of these pairs, answer the question, "Which do I have more photos tagged with?"

  • japaneseteagarden or goldengatepark?
  • richardmeier or architecture?
  • sanfranciscograndprix or house?

With close examination you will probably get these right, but my point is that it takes a bit of thought (and you have the chance of getting it wrong). One of the fundamental problems is that the "tag cloud" display is using the size of the word to convey how many tags are associated with it. However, the size of the word is related to (a) the number of characters in the word (sanfranciscograndprix vs. house) and (b) the font size of the word, which grows in two-dimensions. Instead of trying to convey:

size of word ~= (# of tagged items)

we instead have the relation

size of word ~= (# of tagged items * length of word)2

So as a statistical display, it's bunk -- appearing to help you understand relative tag distribution, but not in an accurate manner.

Aesthetically, in order to try and convey this pseudo-statistical information, it completely throws the list out-of-whack: lines grow to arbitrary heights, one's ability to scan quickly across the entire list is lost, large words are constantly drawing your attention from smaller words, etc..., and, to borrow from Zeldman, navigation skews towards popularity rather than findability.

The fact that "richardmeier" is one of my most prominent tags entirely relates to the fact that (a) I took a ton of photos of the Getty one day, and (b) I was testing out my new Flickr Pro upload limits. They are not my "best" category of photos, I don't frequently take "richardmeier" photos, and they are not the photos I most want people to see. But the tag cloud design dictates that visitors will forever feel "richardmeier"'s gravitational force (that is, until I go crazy with another photo upload).

My own tag/category display could use some work, but I offer it here as a comparison (feel free to critique in the comments):


Tufte's sparklines


Tufte has posted some of the material from his upcoming Beautiful Evidence book that covers his concept of 'sparkline.' Sparklines are essentially tiny graphics that can convey trends very quickly, and in some cases they can provide very specific data. They aren't earth-shattering -- most of them look like shrunken versions of familiar information graphics such as stock graphs -- but the idea that it is now very easy to embed such high-information-density graphics directly into our text is a good proposition.

Tufte briefly covered these when I saw him speak, but it's nice to see his written text, which I prefer to his speaking.

Edward Tufte: Sparklines

Talk: Tufte


Photo of Edward TufteI was really excited to go see Tufte today for his day long course. His writings and teachings are excellent, and I find them useful whenever I am displaying visual information, even if I cannot live up to the standards that they profess. Sometimes I wish Tufte would sell software with his design principles built-in, rather than the pie-chart glory of Excel. In fact, I learned during the talk from Peter Norvig (slightly more on this later) that the Autocomplete Wizard in Powerpoint started off as a joke by the engineering people to the marketing people, along the lines of "oh yeah, and we can just have the application fill in all the content for you." Clearly, engineers don't understand marketing.

So, on to the talk, which I discuss my rants and raves in the extended entry.

You don't need me to tell you that Edward Tufte's Visual Display of Quantitative Information (amazon) is a good book. I've posted my outline notes below (mostly for my own benefit, as this is useless without Tufte's pretty [and ugly] examples).

I hate powerpoint


I found this great parody of the drain of powerpoint: Gettysburg Address in Powerpoint. There's also Tufte's analysis of Boeing's report on Columbia Tile Damage. Powerpoint really takes the crown for robbing the corporate coffers of brain cells. I never thought I would say this, but I miss good ole's overhead transparencies that cost $2/piece to make. At least once they were made no one would even dare suggest using a synonym for the second word in bullet four on slide three.

Credit: IDblog (entry) for the link to Dan Brown's Understanding Powerpoint: Special Deliverables #5, which provided the GA link.

BTW: Tufte uses ArsDigita Community System. Good stuff, even if you don't like Greenspun.