kwc.org Photos Spare Cycles MythBusters

CHI Monday notes

Attention and Interruption

How it Works: A field Study of Non-Technical Users Interacting with an Intelligent System

Joe Tullio, Motorola Labs, USA
Anind K. Dey, Jason Chalecki, Carnegie Mellon University, USA
James Fogarty, University of Washington, USA

We describe a novel field study of how users’ mental models develop around an intelligent system. Designers can use our results to design user interfaces to correct flawed mental models.

Door displays were distinguished by whether or not they included addition icons (up to three) that denoted sensor input that contributed to the interruption state (e.g. > 50 windows open in past n minutes, talking on the phone). Despite the fact that some displays included this information, users did not include it in their mental model. Users relied on their own observation of the room state to determine their model.

More encouragingly, some users did form mental models that were similar to higher-level machine learning concepts: e.g. decision trees, averages, etc... One possible implication is that designers may (a) wish to use algorithms that align well with the non-technical-user baseline concepts and/or (b) reveal some of these higher-level features to the user.

Different models: * Wizard of Oz (1 user): thought person in room directly set state, no software intelligence * Simple Rules (2 users): thought system had if/then like conditions for setting display * Prioritized Cases (2 users): activity recognition/decision tree model

Other notes: * used Subtle toolkit (Fogarty) for interruption training * used static models produced from 3 months of training. Chose static models for easier evaluation * # of perceived inputs/sensors decreased over time, % correct increased over time

Software or Wetware? Discovering When and Why People Use Digital Prosthetic Memory (best paper)

Vaiva Kalnikaite, Steve Whittaker, The University of Sheffield, UK

A laboratory study examining the factors influencing people’s choice of when to use prosthetic memory or organic memory and why. Can assist in developing effective memory aids.

Study when we use organic memory (OM) and prosthetic memory (PM), how and when do people exploit PM to remember, what is the relationship between the two. The experiment focused on memory for conversational speech (i.e. stories). Users were given different PM devices with different properties and their ability to answer questions about the stories was compared.

Memex vision: complete and accurate prosthetic memory (PM), assume people will exploit this for retrieval

Key points

  • users prefer efficiency over accuracy
  • PM devices should target where OM is poor (e.g. prospective memory, alzheimer's)
  • Speaker emphasized synergy of PM and OM
  • Users took less notes when using CC (more focus on annotation)

NOTE: the study did not test any PM devices that included speech recognition. I wonder if such a technology could score even higher given that with PP people felt that they weren't sure the ink note was correct and not everything is noted.

Experiment

  1. Organic Memory
  2. Pen and Paper (PP): slow input
  3. Dictaphone (DP): complete and accurate
  4. ChittyChatty (CC): merge PP + DP.

ChittyChatty

HP iPaq prototype. Handwritten + speech input with temporal coindexing.

Click on ink input to retrieve recorded audio indexed at same time

Experiment and results

Users were read 3 stories over 3 sessions and had to answer questions. Seven days later, the users were given back the same device that they used and had to answer different questions. They also repeated the process a month later.

  • PM was more accurate (both objective and subjective).
  • OM was more efficient, though CC was rated highly.
  • PM use related to confidence
  • More likely to use efficient and accurate (DP scored worse than PP, even though more accurate)

Do Life-Logging Technologies Support Memory for the Past? An Experimental Study Using SenseCam

Abigail Sellen, Andrew Fogg, Microsoft, UK
Mike Aitken, University of Cambridge, UK
Steve Hodges, Carsten Rother, Ken Wood, Microsoft, UK

Experimentally evaluates the efficacy of still images in triggering the remembering of past personal events, having implications for how we conceive of and the claims we make about “life-logging” technologies.

SenseCam: fisheye camera worn around neck, aka "black box" for living

Focus on episodic memory, not semantic memory

Questions

  1. Do we want to remember everything (importance of forgetting)
  2. What do we mean by "digital memory"?
  3. What do we mean by capturing an experience
  4. Can a machine remember

Life-logging technologies provide cues for remembering.

Easier questions (with unsupported claims/hypothesis)

  1. Do lifelogging system in fact support human memory? yes
  2. In what ways? by helping us recollect our personal past, retrieve from our past, reflect on our past
  3. How does this change over time? THey will help us both in the short, long, and very long term
  4. What kinds of data should we capture, and how? They more data we captured the better, the more kinds the better. Capture should be effortless and automatic (no user intervention necessary)

Very little HCI literature that provides strong evidence that still images and audio can trigger episodic memory.

Q1. Do SS images improve memory for past personal events * supported by diary studies that show written cues act as triggers (Brewer, Linton, Wagnenaar)

Q2. In what ways do images hep us connect with our past * Do people recognize their own images? * Do images help people recollect or simply know

Q3. How does cue strength decline over time

19 undergrads

2 variables of interest * condition: Sensecam vs. control images (paired with another subject, shown images from other subject for control) * interval: short vs. long (one group tested 3 days after, other 10 days after, some subjects tested 4 months after)

Wore Sensecam two consecutive days, day before day after were control days (used images from other paired subject as control)

Recall test

  1. Free recall of events
  2. Viewing and ordering of images
  3. Final free recall to add or amend events in different color pen

Recognition test: presented with images previously not shown (control and Sensecam) and had to distinguish

Results: * Novelty effect: prior to viewing images, subjects had better recall for Sensecam days (experimental flaw, should have worn disabled SenseCam on control days) * After viewing images and adjusting for novelty effect, Sensecam better than control for recall * Forgetting curve for recall same for Sensecam and control * Still able to 'know' (semantic memory) better with Sensecam over long term. * Sensecam better for supporting semantic memory (knowing something happened) than episodic memory (recalling) * Able to recognize own images better, just as good 4 months later * Inability to order control images suggests that users were able to use personal knowledge, not schematic information, to order images. * Active photo taking performed worse for recall than automatic photo taking. Hypothesis is that users were taking photos for events that were already salient to them.

Ubicomp Tools

Momento: Support for Situated Ubicomp Experimentation

Scott Carter, FX Palo Alto Laboratory, USA
Jennifer Mankoff, Carnegie Mellon University, USA
Jeffrey Heer, Univeristy of California, Berkeley, USA

We present the iterative design of Momento, which supports situated ubicomp experimentation, and demonstrate its use in three studies. Momento supports remote testing and can gather quantitative and qualitative data. Presents two studies (one static, one mobile) investigating the use of tactile feedback for enhancing touch-screen buttons. Improves performance and reduces workload, even when users are mobile.

Needfinding * Surveys * Brainstorming * Methods: Focus groups/Diary studies/ESM

Central control interface for entering in experiment data. Prefuse-based timeline of events from participants. Able to send feedback to participants on the fly.

Prototyping * Hardware kits * APIs * Methods: Wizard of Oz, End-user tools

Needs: * Low-threshold for iterating * Context data is key (location, proximity) * Remote communication and control

Mobile client: * study description * continuous capture of audio, bluetooth/gps * buttons (look and feel, launch capture applications, remotely configurable)

Subtle: Toolkit Support for Developing and Deploying Sensor-Based Statistical Models of Human Situations

James Fogarty, University of Washington, USA
Scott E. Hudson, Carnegie Mellon University, USA

Presents Subtle, a toolkit that enables sensor-based statistical models of human situations. Subtle focuses research on applications and datasets, instead of the difficulties in collecting sensor data and learning models.

Fully automated technique for taking lower-level sensing features and generating higher-level features for machine learning algorithms.

The Promise of Context-Aware Computing: Sense -> Model -> Interact * e.g. auto-toggling audio mute, away status, auto-forward e-mail to phone * Sense is hard, Model is hard, Interact is hard * Subtle: toolkit for Sense and Model, leaving Interact to researcher

Creating and deploying applications that learn personal statistical models of concepts that: * can be described by context Subtle collects * Are based on instantaneous, nominal labels * For which the necessary labels can be collected * Not (yet) good at continuous-valued labels (e.g. how interruptible is a person?), parameterized labels (how interruptible is a person for X?), sharing data and models * Not likely to be good at: quickly prototyping a statistical model (focused on background process), end-user exploration or modification of a model (auto-generated features hard to explain)

Desktop analyses, audio, wifi -> filtered by hash-based privacy policy -> database -> timestamped labels added -> model learner

Iterative feature generation and selection * Create higher-level features automatically from lower-level sensors. Higher-level features shown to be more predictive from Eclipse experiment * Sensed context -> potential features -> filtered potential features -> model (repeat feature/model generation) * Type-based feature generation: create new features by adding operations based on feature's type and history of values * Features have to pass one of four correlate filters, then a optimal subset filter

Whistle: app built on Subtle to auto-mute audio

AmIBusy Prompter: app for learning personal model of self-reported interruptibility

Authoring Sensor-Based Interactions by Demonstration with Direct Manipulation and Pattern Recognition (best paper)

Bjoern Hartmann, Leith Abdulla, Stanford University, USA
Manas Mittal, MIT, USA
Scott R. Klemmer, Stanford University, USA

Contributes method and tool for rapidly designing sensor-based interactions by demonstration, emphasizes control of generalization criteria through integrating direct manipulation and pattern recognition, and offers theoretical and first-use lab evaluations.

How would you develop prototyping toolkit for the Wii? DDR?

Challenge: converting sensor data into higher-level logic (sensors noisy, gulf of execution)

Idea: Program by demonstration * leverages tacit knowledge, but challenge becomes generalization

Exemplar: * Cycle: Demonstrate - edit (waveform view)- review * Exports events so it can be interfaced with flash, d-tools * Left pane: small multiples of individual sensors * Left control: filters * Central pane: horizontal timeline of sensor data * Right pane: processed discrete events * User highlights regions of the sensor data in the central pane -- feedback is immediately given by highlighting matching regions elsewhere in the sensor data * Thresholds, timeouts, smoothing, rate of change, offset, etc... * Can threshold error graph as well to make generalization easier * Dynamic time warping: allow demonstration signal to match signals of different length as long as shape is the same

Built in Eclipse

Evaluating Evaluation

The Evolution of Evaluation (30 min)

Joseph ‘Jofish’ Kaye, Phoebe Sengers, Cornell University, USA

We provide a historical context for assessing evaluation methods by explicating the history of evaluation in HCI. We trace the history of evaluation in the field from electrical engineering and computer science, to experimental approaches drawn from cognitive science, to usability’s emphasis on in-situ studies and expertise.

Evaluation part of defining a field

Example: Virtual Intimate Object (VIO) * device for couples in long distance relationships to communicate intimacy * clicking on dot causes it to light up and then fade * it's about the experience, not about th etask. * how do you measure intimacy and transmission thereof

Evolution of evaluation * Evaluation by engineers * Users: engineers and mathematicians * Evaluators: engineers * Limiting factor: reliability * Evaluation by Computer Scientists * Users: programmers * Evaluators: programmers * Limiting factor: speed of the machine (still pervasive today, though other computing platforms like cellphones don't use) * First uses of human-computer interaction and computer-human interaction (though barely used before 1980) * Evaluation by experimental psychologists and cognitive scientists * Users: users (computer is a tool, often in offices) * Evaluators: cognitive scientists and experimental psychologists (used to measuring things through experiment) * Limiting factor: what human can do * ExPsych Case Study (text editor, cogsci evaluation): three criteria - objectivity, thoroughness, ease-of-use (of the evaluation method). * "Text editor are the white rats of HCI" -- Thomas Green, 1984, in Grudin, 1990 * Evaluation by HCI professionals * Users: often white collar * Evaluators: HCI professionals * Limiting factor: user accomplishing the job * Believe in expertise over experiment (Nielsen 1984) * Made a decision to decide to focus on better results, regardles of whether they were experimentally provable or not * Damaged Merchandise setup: * Early 80s * Wayne Gray Panel at CHI'95: Discount or Disservice? Discount Usability Analysis at a Bargain Price or Simply Damaged Merchandise * Clash of paradigms: experimentation vs. expertise

  • Experience-focused HCI: possibly emerging sub-field
    • User: people choosing to use technology for the joy of it and to do what they want in everyday life
    • Evaluators: us
    • For example, how do you evaluate a car?

from Mice to Men — 24 Years of Evaluation in CHI (20 min)

Louise Barkhuus, University of Glasgow, UK
Jennifer A. Rode, University of California, Irvine, USA

This paper analyzes trends in the approach to evaluation taken by CHI Papers in the last 24 years. A set of papers was analyzed according to our schema for classifying type of evaluation. Our analysis traces papers’ trend in type and scope of evaluation. Findings include an increase in the proportion of papers that include evaluation and a decrease in the median number of subjects in quantitative studies.

Evaluation is practically require for CHI

Grudin's work studying role, but nothing empirical

Classified type of evaluation: empirical or analytic, quant or qual

Taxonomy: * emperical + qual: ethno studies, think aloud interviews * emp + quant: lab studies * analyt + qual: cog walkthroughs, heuristic eval * analyt + quan: GOMS, analysis of logs

Historical trends: * Evaluation has increased dramatically from 1983-2006 * Informal evaluation virtually non-existent * Analytic evaluation less than 2% * Fewer papers on actual evaluation methods * 2006 most evaluation was quant empirical * mixed approach is decreasing * median number of subjects in quant has decreased * median number of subjects in qual has increased, but possibly at cost of studying for less length in time

Student-computer interaction: roughly half of test subjects are students * students used to represent typical users, but that is no longer the case

Gender (strong male bias): * half of papers not mentioning gender * % of papers using 66%+ men is the same as paper using 33%+ women

Conclusions * evaluation commonplace * quant empirical studies most common * slight increase in qual * decrease in number of participants * male bias

Questions going forward * are we losing groundbreaking research that isn't easily evaluated with common techniques * are we sacrificing analytic approaches? And not allow our methods to evolve? Not viewed as practitioner technique/too business-y? Worried about expert bias? * Are quantitative empirical studies becoming a standard necessary for CHI paper acceptances

Make Evaluation Poverty History

Gilbert Cockton, University of Sunderland, UK

Argues for the need to ground evaluation in achieved worth rather than established psychological measures, and proposes the use of worth maps, based on approaches from consumer psychology, to do so, providing a shared representation for design and evaluation.

Evaluation is a hungry wolf eating cool design

Makers don't feed Evaluators

Designers fend for themselves

Early HCI evaluators had to snack: count the burgers

Need to shift to worth-centered principles * designs are means to ends * "create meaningful connections among people, ideas, art, and technology, shaping the way people understand their relationships with ... new products" - Clement Mok, former AIGA president

Purpose of evaluation is to evaluate achievement of worth for design purpose

The Worth of IM: Meaningful Connections, Aschmoneit and Heitmann, HICSS 2003, Hierarchical Value Model

W/AMs: Worth/aversion maps for each key stakeholder

Post a comment


related entries.

what is this?

This page contains a single entry from kwc blog posted on April 30, 2007 3:08 PM.

The previous post was So glad I don't have to commute to Berkeley anymore.

The next post is CHI Tuesday notes.

Current entries can be found on the main page.