# Completion Rate – an overview

## Quantifying user research

Jeff Sauro, James R. Lewis, in Quantifying the User Experience (Second Edition) , 2016

### Completion rates

Completion rates, also called success rates, are the most fundamental of usability metrics (Nielsen, 2001). They are typically collected as a binary measure of task success (coded as a 1) or task failure (coded as 0). You report completion rates on a task by dividing the number of users who successfully complete the task by the total number who attempted it. For example, if eight out of ten users complete a task successfully, the completion rate is 0.8 and usually reported as 80%. You can also subtract the completion rate from 100% and report a failure rate of 20%.

It is possible to define criteria for partial task success, but we prefer the simpler binary measure because it lends itself better for statistical analysis. When we refer to completion rates in this book, we will be referring to binary completion rates.

The other nice thing about a binary rate is that they are used throughout the scientific and statistics literature. Essentially the presence or absence of anything can be coded as 1s and 0s and then reported as a proportion or percentage. Whether this is the number of users completing tasks on software, patients cured from an ailment, number of fish recaptured in a lake, or customers purchasing a product, they can all be treated as binary rates.

URL:

https://www.sciencedirect.com/science/article/pii/B09888888

## Did we meet or exceed our goal?

Jeff Sauro, James R. Lewis, in Quantifying the User Experience (Second Edition) , 2016

### Key points

The statistical test you use for completion rates depends on the sample size: A sample size is considered small unless you have more than 15 successes and 15 failures.

For determining whether a certain percentage of users can complete a task for small sample sizes use the midprobability from the binomial distribution.

For determining whether a certain percentage of users can complete a task for large sample sizes use the normal approximation to the binomial.

You can always convert continuous rating scale data into discrete-binary data and test a percentage that agrees with a statement, but in so doing, you lose information.

For comparing a set of satisfaction scores from a survey or questionnaire with a benchmark, use the one-sample t-test for all sample sizes.

For determining whether a task time falls below a benchmark, log-transform the times and then perform a one-sample t-test for all sample sizes.

Table 4.2 provides a list of formulas used in this chapter.

Table 4.2. List of Chapter 4 Formulas

Type of Evaluation Basic Formula Notes
Binomial probability formula

$p\left(x\right)=\frac{n!}{x!\left(n-x\right)!}{p}^{x}{\left(1-p\right)}^{\left(n-x\right)}$

Used in exact and mid-p binomial tests (small sample)
Normal approximation to the binomial (Wald)

$z=\frac{\stackrel{ˆ}{p}-p}{\sqrt{\frac{p\left(1-p\right)}{n}}}$

Used for large-sample binomial tests (large sample if at least 15 successes and 15 failures)
One-sample t-test

$t=\frac{\stackrel{ˆ}{x}-µ}{\frac{s}{\sqrt{n}}}$

Used to test continuous data (e.g., satisfaction scores, completion times)
t-based confidence interval around the mean

$\overline{x}±{t}_{\left(1-\frac{\alpha }{2}\right)}\frac{s}{\sqrt{n}}$

Used to construct confidence interval as alternative test against a criterion for continuous data

URL:

https://www.sciencedirect.com/science/article/pii/B09888888

## Case Studies

Bill Albert, … Allison O’Keefe-Wright, in Beyond the Usability Lab , 2010

### 9.4.2 Results and discussion

These results surprised us: the success and completion rates seemed relatively high in light of the low success rate from the previous qualitative study and some of the open-ended comments from this study, which expressed frustration and confusion about the navigation (see Table 9.6).

Table 9.6. Average success and ratings for all five tasks.

5-Task Average: All Users (Scale of 1–7)
Success Error/Abandon
Task Success Rate 76% (avg time 1:33) 24%
Ratings Weighted for All Users
Overall Task 4.3 (Successful users only: 5.4)
“It was clear how to start searching” 4.2 (Successful users only: 5.2)
“Finding x information was easy” 4.2 (Successful users only: 5.2)
“Satisfied with results” 4.8 (Successful users only: 6.0)
“Increased my confidence in the UCSF Medical Center” 4 (Successful users only: 5.0)
“Task took a reasonable amount of time” 4.3 (Successful users only: 5.3)

Examining the results further, we determined that there was a large discrepancy between users who used the search function on the site versus users who did not (see Table 9.7). We also found that the proportion of users who employed search during this study was far higher than average, which led us to conclude that, due to our own oversight, users were significantly assisted in their tasks because the tasks and search terms were explicitly shown to them in the UserZoom browser bar. For instance, users were assisted in finding the phone number for returning pediatric cerebrovascular patients simply because the term was right there in front of them, complete with perfect spelling, whereas in normal usage they might not have this exact term to search for and may be forced to navigate the site manually, leading to a lower success rate.

Table 9.7. Average success and ratings for all five tasks for those users who did not use the search functionality.

5-Task Average: Users Who Never Used Search (Scale of 1–7)
Success Error/Abandon
Task Success Rate 52% (avg time 1:35) 48%
Ratings Weighted for All Users
Overall Task 3.1 (Successful users only: 4.9)
“It was clear how to start searching” 3 (Successful users only: 4.7)
“Finding x information was easy” 3 (Successful users only: 4.5)
“Satisfied with results” 3.3 (Successful users only: 5.1)
“Increased my confidence in the UCSF Medical Center” 2.6 (Successful users only: 4.0)
“Task took a reasonable amount of time” 3.1 (Successful users only: 4.8)

The fact that a significant percentage of users (even successful ones) reported difficulties with browsing- and navigation-related activities corroborated our suspicion that search inflated the success rate artificially.

URL:

https://www.sciencedirect.com/science/article/pii/B09888888

## How precise are our estimates? Confidence intervals

Jeff Sauro, James R. Lewis, in Quantifying the User Experience (Second Edition) , 2016

### Chapter review questions

1.

Find the 95% confidence interval around the completion rate from a sample of 12 users where 10 completed the task successfully.

2.

What is the 95% confidence interval around the median time for the following 12 task times: 198, 220, 136, 162, 143, 130, 199, 99, 136, 188, 199

3.

What is the 90% confidence interval around the median time for the following 32 task times:

 251 21 60 108 43 34 27 47 48 18 15 219 195 37 338 82 46 78 222 107 117 38 19 62 81 178 40 181 95 52 140 130
4.

Find the 95% confidence interval around the average SUS score for the following fifteen scores from a test of an automotive website: 70, 50, 67.5, 35, 27.5, 50, 30, 37.5, 65, 45, 82.5, 80, 47.5, 32.5, 65

5.

With 90% confidence, if 2 out of 8 users experience a problem with a registration element in a web-form, what percent of all users could plausibly encounter the problem should it go uncorrected?

URL:

https://www.sciencedirect.com/science/article/pii/B09888888

## How Precise Are Our Estimates? Confidence Intervals

Jeff Sauro, James R. Lewis, in Quantifying the User Experience , 2012

### Chapter Review Questions

1.

Find the 95% confidence interval around the completion rate from a sample of 12 users where 10 completed the task successfully.

2.

What is the 95% confidence interval around the median time for the following 12 task times:

$198,\text{\hspace{0.17em}}220,\text{\hspace{0.17em}}136,\text{\hspace{0.17em}}162,\text{\hspace{0.17em}}143,\text{\hspace{0.17em}13}0,\text{\hspace{0.17em}}199,\text{\hspace{0.17em}}99,\text{\hspace{0.17em}}136,\text{\hspace{0.17em}}188,\text{\hspace{0.17em}}199$

3.

What is the 90% confidence interval around the median time for the following 32 task times:

$\begin{array}{rrr}\hfill 251& \hfill 21& \hfill 60\\ \hfill 108& \hfill 43& \hfill 34\\ \hfill 27& \hfill 47& \hfill 48\\ \hfill 18& \hfill 15& \hfill 219\\ \hfill 195& \hfill 37& \hfill 338\\ \hfill 82& \hfill 46& \hfill 78\\ \hfill 222& \hfill 107& \hfill 117\\ \hfill 38& \hfill 19& \hfill 62\\ \hfill 81& \hfill 178& \hfill 40\\ \hfill 181& \hfill 95& \hfill 52\\ \hfill 140& \hfill 130& \hfill \end{array}$

4.

Find the 95% confidence interval around the average SUS score for the following 15 scores from a test of an automotive website:

$70,\text{\hspace{0.17em}5}0,\text{\hspace{0.17em}}67.5,\text{\hspace{0.17em}}35,\text{\hspace{0.17em}}27.5,\text{\hspace{0.17em}5}0,\text{\hspace{0.17em}3}0,\text{\hspace{0.17em}}37.5,\text{\hspace{0.17em}}65,\text{\hspace{0.17em}}45,\text{\hspace{0.17em}}82.5,\text{\hspace{0.17em}8}0,\text{\hspace{0.17em}}47.5,\text{\hspace{0.17em}}32.5,\text{\hspace{0.17em}}65$

5.

With 90% confidence, if two out of eight users experience a problem with a registration element in a web form, what percent of all users could plausibly encounter the problem should it go uncorrected?

1.

Use the adjusted-Wald binomial confidence interval. The adjustment is 11.9/15.84 = 0.752:

$\begin{array}{c}{\stackrel{^}{p}}_{\text{adj}}±{z}_{\left(1-\frac{\alpha }{2}\right)}\sqrt{\frac{{\stackrel{^}{p}}_{\text{adj}}\left(1-{\stackrel{^}{p}}_{\text{adj}}\right)}{{n}_{\text{adj}}}}=0.752±1.96\sqrt{\frac{0.752\left(1-0.752\right)}{15.84}}\\ =0.752±0.212=95%\text{\hspace{0.17em}}\text{CI}\text{\hspace{0.17em}}\text{between}\text{\hspace{0.17em}}54.0%\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}96.4%\end{array}$

2.

The log times are: 5.288, 5.394, 4.913, 5.088, 4.963, 4.868, 5.293, 4.595, 4.913, 5.236, and 5.293, which makes the geometric mean = e(5.08) = 09888888 seconds. The 95% CI is:

$\begin{array}{c}{\overline{x}}_{\mathrm{log}}±{t}_{\left(1-\frac{\alpha }{2}\right)}\frac{{s}_{\mathrm{log}}}{\sqrt{n}}=5.08±2.23\frac{0.246}{\sqrt{11}}=5.08±0.166\\ ={e}^{\left(4.91\right)}\text{\hspace{0.17em}}\text{to}\text{\hspace{0.17em}}{e}^{\left(5.24\right)}=136\text{\hspace{0.17em}}\text{to}\text{\hspace{0.17em}}189\text{\hspace{0.17em}}\text{seconds}\end{array}$

3.

The sample median is 70 seconds. The critical value from the normal distribution is 1.64 for a 90% level of confidence.

$\begin{array}{l}np±{z}_{\left(1-\frac{\alpha }{2}\right)}\sqrt{np\left(1-p\right)}\hfill \\ =32\left(0.5\right)±1.64\sqrt{\left(32\right)\left(0.5\right)\left(1-0.5\right)}\hfill \\ =16±1.64\left(2.83\right)\hfill \\ =16±4.64=11.36\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}20.64=\text{the}\text{\hspace{0.17em}}12\text{th}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}21\text{st}\text{\hspace{0.17em}}\text{times}\hfill \\ =90%\text{\hspace{0.17em}}\text{CI}\text{\hspace{0.17em}}\text{between}\text{\hspace{0.17em}}47\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}107\text{\hspace{0.17em}}\text{seconds}\hfill \end{array}$

4.

A t-confidence interval should be constructed using a critical value of (t0.05, 14) = 2.14. The mean and standard deviation are 52.3 and 18.2, respectively:

$\overline{x}±{\text{t}}_{\left(1-\frac{\alpha }{2}\right)}\frac{s}{\sqrt{n}}=52.3±2.14\frac{18.2}{\sqrt{15}}=52.3±10.1$

The 95% confidence interval for the average SUS score of 52.3 is between 42.2 and 62.4.

5.

Compute a 90% adjusted-Wald binomial confidence interval. For 90% confidence, the value of z is 1.64. The adjusted proportion is 3.35/10.71 = 0.313.

$\begin{array}{ll}{\stackrel{^}{p}}_{adj}±{z}_{\left(1-\frac{\alpha }{2}\right)}\sqrt{\frac{{\stackrel{^}{p}}_{adj}\left(1-{\stackrel{^}{p}}_{adj}\right)}{{n}_{adj}}}\hfill & =0.313±1.64\sqrt{\frac{0.313\left(1-0.313\right)}{10.71}}\hfill \\ \hfill & =0.313±0.233\hfill \end{array}$

We can be 90% confident between 8% and 54.6% of all users will encounter this problem if two out of eight encountered it in the lab.

URL:

https://www.sciencedirect.com/science/article/pii/B09888888

## Emotional and Social Engagement in a Massive Open Online Course

Lia M. Daniels, … Adam McCaffrey, in Emotions, Technology, and Learning , 2016

### Engagement as a Social Psychological Construct

When MOOC developers talk about engagement, they are talking about completion and participation rates: how many people registered, logged on, want a certificate, etc. This is a very different definition than a social psychological approach to engagement in learning. Although many working definitions exist (Appleton, Christenson, & Furlong, 2008; Furlong et al., 2003), most researchers agree engagement is a multidimensional construct most commonly broken down into three components:

Cognitive engagement (regulation)

Behavioral engagement (effort, participation, rule following)

Emotional or affective engagement (positive attitude, interest)

Recently, Klassen, Yerdelen, and Durken (2013) proposed that social engagement is critical in learning environments, because learning is a social task. This may be particularly true in MOOCs, because their massiveness provides an unprecedented opportunity for social connections. Thus, for the purposes of this chapter, we adhere to a modified four-component operationalization of engagement (Fredricks, Blumenfeld, & Paris, 2004; Klassen et al., 2013) that includes the following additional construct:

Social engagement (connections, belonging)

Cognitive engagement looks at the level of investment learners put into thinking about their tasks. It incorporates the investment of intentional thought required to comprehend complicated ideas and to master the content presented (Fredricks et al., 2004). Some definitions emphasize the importance of psychological investment in learning, while others emphasize a variety of cognitive processes, such as problem solving, positive coping, and desire for learning (Connell & Wellborn, 1991; Newmann, Wehlage, & Lamborn, 1992; Wehlage, Rutter, Smith, Lesko, & Fernandez, 1989). Cognitive engagement is quite similar to intrinsic motivation and positively predicts outcomes, including goal setting and self-regulation (Boekarts, Pintrich, & Zeidner, 2000; Harter, 1981; Zimmerman, 1990). High cognitive engagement in a MOOC may involve seeking out additional information on the material, or preparing for and completing quizzes. Behavioral engagement is often defined in three ways. The first is associated with positive demeanor (e.g., following rules or school attendance; Finn, 1993; Finn & Rock, 1997). The second is the effort associated with paying attention and concentrating on the learning experience (Birch & Ladd, 1997; Skinner & Belmont, 1993). The third is participation in school-related events (Finn, 1993). Empirically, behavioral engagement has also been positively associated with students’ on-task behaviors and following rules (Karweit, 1989; Peterson, Swing, Stark, & Wass, 1984). This is the psychological element of engagement that is most similar to the notion of engagement discussed currently in the MOOC literature. Although the expectations for behavior in a MOOC may look different from a traditional classroom, they still exist. For example, a student may be able to stay focused on the video or become distracted and surf the web; each represents a qualitatively different level of behavioral engagement.

Emotional engagement relates to the student’s feelings of interest, pleasure, sadness, boredom, and anxiety in the classroom (Fredricks et al., 2004). In other words, emotional engagement looks at the positive and negative reactions that extend from the social and learning environments. Emotional engagement has been positively related to student outcomes including attitudes, emotional experiences, values, and interest (Epstein & McPartland, 1976; Yamamoto, Thomas, & Karns, 1969). An example of emotional engagement is the feeling of excitement a student may experience when watching the lectures or participating in the forums.

Social engagement refers to the willingness to socialize with others and the feeling of belonging. Although social engagement is an emerging construct in the student engagement literature, it has been discussed in the eLearning literature pertaining to social media and Web 2.0 technologies for quite some time (see Rennie & Morrison, 2013, for a more thorough review). From a theoretical perspective, several achievement motivation theories stress some component of social connectedness. For example, self-determination theory (Deci & Ryan, 2000) argues that relatedness is one of three basic psychological needs that, when met, leads to optimal motivation. Similarly, Butler (2012) has recently incorporated relational goals into her achievement goal framework. While specific to online learning, Siemens (2004) argues that the future of online learning lies with the connections students make and the process of social information creation, in order for students to think critically and contribute to knowledge on a global level. Although xMOOCs do not emanate from the connectivist paradigm described above, psychological theory and eLearning pedagogy reinforce that engaging learning of any type should indeed be social (Rennie & Morrison, 2013).

Let’s consider an example. In a traditional classroom, we may determine evidence of engagement when a student asks questions in class (cognitive), completes assignments (behavioral), appears excited about the content (emotional), and shares information with her peers (social). In an xMOOC, we may infer engagement when a student starts a debate on the forums (cognitive), logs on regularly to watch full video segments (behavioral), expresses that the content is relevant (emotional), and joins a related Facebook group (social). From a psychological perspective, this four-pronged operationalization of engagement can be applied to face-to-face and online learning environments without much difficulty. A similar argument was made for transitioning other social psychological theories to help explain and design online learning environments (see Daniels & Stupnisky, 2012).

URL:

https://www.sciencedirect.com/science/article/pii/B09888888

## Introduction and how to use this book

Jeff Sauro, James R. Lewis, in Quantifying the User Experience (Second Edition) , 2016

### What test should I use?

The first decision point comes from the type of data you have. See the Appendix for a discussion of the distinction between discrete and continuous data. In general, for deciding which test to use, you need to know if your data are discrete-binary (e.g., pass/fail data coded as 1s and 0s) or more continuous (e.g., task times or rating scale data).

The next major decision is whether you’re comparing data or just getting an estimate of precision. To get an estimate of precision you compute a confidence interval around your sample metrics (e.g., what is the margin of error around a completion rate of 70%—see Chapter 3). By comparing data we mean comparing data from two or more groups (e.g., task completion times for Product A and B—see Chapter 5) or comparing your data to a benchmark (e.g., is the completion rate for Product A significantly above 70%?—see Chapter 4).

If you’re comparing data, the next decision is whether the groups of data come from the same or different users. Continuing on that path, the final decision depends on whether there are two groups to compare or more than two groups.

To find the appropriate section in each chapter for the methods depicted in Figures 1.1 and 1.2, consult Tables 1.1 and 1.2. Note that methods discussed in Chapter 11 are outside the scope of this book, and receive just a brief description in their sections.

Table 1.1. Chapter Sections for Methods Depicted in Fig. 1.1

Method Chapter: Section
One-Sample t (Log) 4: Comparing a Task Time to a Benchmark
One-Sample t 4: Comparing a Satisfaction Score to a Benchmark
Confidence Interval around Median 3: Confidence Interval around a Median
t (Log) Confidence Interval 3: Confidence Interval for Task-Time Data
t Confidence Interval 3: Confidence Interval for Rating Scales and Other Continuous Data
Paired t 5: Within-Subjects Comparison (Paired t-Test)
ANOVA or Multiple Paired t 5: Within-Subjects Comparison (Paired t-Test)
9: What If You Need To Run More than One Test?
Two-Sample t 5: Between-Subjects Comparison (Two-Sample t-Test)
ANOVA or Multiple Two-Sample t 5: Between-Subjects Comparison (Two-Sample t-Test)
9: What If You Need To Run More than One Test?
10: Analysis of Variance
Correlation 10: Correlation
Regression Analysis 10: Regression

Table 1.2. Chapter Sections for Methods Depicted in Fig. 1.2

Method Chapter: Section
One-Sample Z-Test 4: Comparing a Completion Rate to a Benchmark—Large Sample Test
One-Sample Binomial 4: Comparing a Completion Rate to a Benchmark—Small Sample Test
McNemar Exact Test 5: McNemar Exact Test
Adjusted Wald Confidence Interval for Difference in Matched Proportions 5: Confidence Interval around the Difference for Matched Pairs
N−1 Two Proportion Test and Fisher Exact Test 5: N−1 Two Proportion Test; Fisher Exact Test
Adjusted Wald Difference in Proportion 5: Confidence for the Difference between Proportions
Correlation 10: Correlation—Computing the Phi Correlation

For example, let’s say you wanted to know which statistical test to use if you are comparing completion rates on an older version of a product and a new version where a different set of people participated in each test.

1.

Because completion rates are discrete-binary data (1 = Pass and 0 = Fail), we should use the decision map in Fig. 1.2.

2.

At the first box “Comparing or Correlating Data” select “Y” because we’re planning to compare data (rather than exploring relationships with correlation or regression).

3.

At the next box “Comparing Data?” select “Y” because we are comparing a set of data from an older product with a set of data from a new product.

4.

This takes us to the “Different Users in Each Group” box—we have different users in each group, so we select “Y.”

5.

Now we’re at the “Three or More Groups” box—we have only two groups of users (before and after) so we select “N.”

6.

We stop at the “N−1 Two Proportion Test & Fisher Exact Test” (Chapter 5).