/
Evaluating Usability Evaluating Usability

Evaluating Usability - PowerPoint Presentation

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
398 views
Uploaded On 2015-09-21

Evaluating Usability - PPT Presentation

Data amp Analysis Types of Data And how to read them What types of data are relevant to our interests When evaluating how usable a design is there are many data you may want to take into account whether a user can complete a task how long it takes them to complete a task survey response ID: 135612

task data success time data task time success interval tasks fair complete user ordinal good poor efficiency number performance

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Evaluating Usability" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Evaluating Usability

Data & AnalysisSlide2

Types of Data

And how to read themSlide3

What types of data are relevant to our interests?

When evaluating how usable a design is, there are many data you may want to take into account: whether a user can complete a task, how long it takes them to complete a task, survey responses, etc.

But before thinking about what data you will collect, you must understand the basic types of data that exist and what they can tell you about your interface. Slide4

Nominal Data

Unordered

groups or categories (e.g. apples and oranges; no fruit is inherently better)

Examples:

Binary success (whether a user was or was not able to complete a task)

Some demographic information, such as gender or whether or not the participant owns a smartphone

But not others, such as age or annual household incomeSlide5

Ordinal Data

Ordered

groups or categories

Examples:

Survey rankings

“Would you describe this website as excellent, good, fair, or poor?”

Levels of task completion (for non-binary success)If I tell you to draw a circle and you draw an oval, shouldn’t you get partial credit? Slide6

Caveat for Ordinal Data

Would you describe this website as excellent, good, fair, or poor

?”

Although good

is better than fair and fair is better than poor, we have no way of knowing whether the “distance” between good and fair is greater than the “distance” between fair and

poorYou cannot do arithmetic with ordinal data, but you can summarize it

with histograms.

GOOD

FAIR

POOR

GOOD

FAIR

POOR

OR

?Slide7

Interval Data

Data points which are measured

along

a scale where each point is equidistant from one another

Examples:

5-star ratings such

as those used by Yelp, Google Local, etc.Semantic differentials: “Would you describe these slides as…Ugly □ □ □ □ □ Beautiful

”Slide8

Caveat for Interval Data

You cannot multiply or divide interval

data.

There

is no way something can be “twice as beautiful” or “three times as ugly” because there is no meaningful zero point. Slide9

Ordinal vs. Interval

Interval data provides more opportunity for analysis than nominal or ordinal data do, but the scales used often look the same:

□ Poor □ Fair □ Good □ Excellent

Vs.

Poor □ □ □ □ Excellent

How do these different formats affect your participants’ responses? How do they effect what you can and cannot do with the data? Slide10

Working with Ordinal and Interval Data

Because ordinal data is not uniformly distributed along its scale, we cannot treat it like interval data

Remember, you cannot do

any

arithmetic on ordinal data

This means no averaging!

It may seem pedantic, but you cannot treat the response “fair” the same way as “1/3 of the way between poor and excellent”Slide11

Ratio Data

Like interval data, but with the addition of an inherently meaningful absolute zero

Examples:

Time to task

completion

Number of page views, mouse clicks, etc. Slide12

Interval vs. Ratio Data

The

concept of an absolute zero means you can do any type of arithmetic you like with ratio data

You can make relative statements

You can say that one participant took twice as much time to complete a task as another, but you can’t say one movie is twice as good as another

You can take the geometric mean

Jakob Nielsen believes this is important, because it prevents a single big number from skewing the result and it accounts fairly for cases in which some of the metrics are negativeSlide13

Evaluating a UI

There are qualities you may wish to gather user data for: performance and satisfaction

Performance data is typically “hard data” relating to how easily a UI can be used to accomplish a given set of tasks; e.g.

Percentage of successfully completed tasks

Average time to task completion

Number of

clicks-through Satisfaction data covers emotional response to a UI and is generally self-reported; e.g. How aesthetically pleasing the UI wasHow easy to use the participant found the UI vs. how easy they thought the task wasSlide14

Performance vs. Satisfaction

How much data you gather about each of those qualities will depend on what type of a system you are building

If you are building a stock-trading application to be used in-house by Goldman Sachs, chances are you are about performance more than satisfaction

If you are building a social game for

Zynga

, chances are you will care more about satisfaction

Does performance imply satisfaction? Sometimes, a bit of speed can be sacrificed for better user experience (e.g. iPhone animations) Slide15

Performance metrics

Measuring success and efficiency Slide16

Task success

To measure success, you must first define a clear desired end-state for your task

“Find the current price of a share of GOOG stock” (clear end-state)

“Research ways to save for retirement” (unclear end-state)

Binary success

– tasks which are necessarily pass/failLevels of success – tasks which may be partially completed or completed in less-than-optimal waysSlide17

Looking at levels of success

Let’s say you are evaluating the GIMP interface and one of your tasks is having participants draw a circle

What are the different possible levels of success for this task?

Where is the cutoff for failure? Slide18

Issues in Measuring Success

Deciding what constitutes success

Deciding when to end a task if the participant is not successful

Tell participants to stop trying at the point where, in the real world, they would give up or seek assistance

Allow participants a certain number of attempts to complete the task

Issue: What constitutes an “attempt”?

Set a time limitSlide19

Time-On-Task

This data can be analyzed in a number of ways:

Looking at the median or geometric mean (typically less skewed than the mean)

Creating ranges to report frequency of users falling into each interval

Create a threshold which models the “acceptable” amount of time to complete a particular task

Look at distribution to identify outliers—especially important for remote testing, which often yields “noisy” data (e.g., a participant goes and gets a sandwich halfway through a task)Slide20

Issues in Measuring Time-On-Task

Should you include time on unsuccessful tasks?

How will including or throwing out this data affect your results?

Will asking users to voice their thoughts while completing the task alter their time to completion?

Will voicing thoughts aloud cause users to complete tasks more quickly/slowly than they would otherwise?

Quantitative methods are only part of the goal for usability testing. The voice-aloud method can provide you with useful qualitative data.

Should you tell participants that their time until completion is being measured? Slide21

Efficiency

Most people think of efficiency as equivalent to time-on-task

Efficiency can also be considered a measurement of the amount of effort required to complete a task.

Effort is a quantification of the number of actions a user takes (e.g. mouse clicks, page views, keystrokes, etc.)

To measure efficiency, you must define what your units of meaningful action are and what the precise start and endpoints are for your task

Typically, effort is only calculated for successfully completed tasksSlide22

Lostness

Another measure of efficiency, especially important for websites, is that of “

lostness

”, which can be modeled using the following formula:

 Slide23

Learnability

We have already discussed learning curves in class

You hope that, the more times a user has used your UI, the less time it will take them to complete a task and the less effort it will require

In this sense, we can model learnability as time-on-task, success rate,

and/or

efficiency over timeSlide24

Self-Reported Metrics

These slides are

AWESOME!!!11!1”Slide25

The Importance of Self-Reported Data

So far, we have focused mostly on data which is gathered by observing the user

However, it is sometimes necessary to gather data directly from the user

This is especially common when you are looking to evaluate user satisfaction rather than performanceSlide26

Likert Scales

Likert

scales consist of a statement, either positive or negative, followed by a 5-point scale of agreement

Example:

“I found this website easy to use.”

□ Strongly disagree

□ Disagree

□ Neither agree nor disagree

□ Agree

Strongly agree

This is an example of ordinal data!Slide27

Semantic Differential Scales

These are similar to

Likert

scales, but run on scales anchored by opposing adjectives

Example:

“I would describe this website as…”

Ugly □ □ □ □ □ BeautifulThis is an example of interval data!Slide28

Expectation Measure

Some experts (Albert and Dixon 2003) believe that the best way to assess the ease or difficulty of a given task is relative to how easy or difficult the participant thought it was going to be

Thus, for each task you might ask participants to rate both how easy/difficult they thought it would be and how easy/difficult it actually was

Doing so allows you to calibrate what is otherwise essentially an arbitrary measure

Remember, individuals may have different ideas of how “easy” and “

very easy” compare!Slide29

Post-Task vs. Post-Session Evaluation

It is generally helpful to chunk your lab time into a number of tasks

You can gather self-reported data either after each task, after the entirety of the session, or both

The previously described methods can be used either post-task or post-sessionSlide30

System Usability Scale

A System Usability Scale can be used for post-session self-reporting

There are many variants on this test. Here is a classic example: Slide31

Other Types of Self-Reporting

Self-reported metrics can also be used to evaluate specific elements (e.g. navigation bar) or overall attributes (e.g. visual appeal) of a UI

When evaluating elements, it is helpful to examine gaps between awareness and usefulness

To do this, you might ask, “Were you aware of this functionality prior to this study? (Yes/No)” followed by “On a scale of 1 to 5, how useful is this functionality to you? (1=Not at all useful; 5=Very useful)