Explain what process evaluation is, and why it is important.

Nine Evaluation of Training

Learning Objectives

After reading this chapter, you should be able to:

· ■ Describe the pros and cons of evaluation and indicate which way to go on the issue.

· ■ Explain what process evaluation is, and why it is important.

· ■ Describe the interrelationships among the various levels of outcome evaluation.

· ■ Describe the costs and benefits of evaluating training.

· ■ Differentiate between the two types of cost-effectiveness evaluation (cost savings and utility analysis).

· ■ Describe the various designs that are possible for evaluation and their advantages and disadvantages.

· ■ Define and explain the importance of internal and external validity ( Appendix 9-1 ).


The city of Palm Desert, California, decided to provide training to improve employees’ attitudes toward their work and to provide them with the skills to be more effective on the job. The two-day seminar involved a number of teaching methods, including a lecture, films, role-plays, and group interaction. Among the topics covered were conflict control, listening, communicating, telephone etiquette, body language, delegation, and taking orders. Throughout the two days, the value of teamwork, creativity, and rational decision making was stressed and integrated into the training.

Before the training was instituted, all 55 nonmanagement employees completed a paper-and-pencil questionnaire to measure both their attitudes toward the job and their perception of their job behaviors. Supervisors also completed a questionnaire assessing each of their employees. All 55 employees were told that they would be receiving the same two-day seminar. The first set of 34 employees was chosen at random.

The 21 employees who did not take the training immediately became a comparison group for evaluating the training. While the first group of employees was sent to the training, the others were pulled off the job, ostensibly to receive training, but they simply took part in exercises not related to any training. Thus, both groups were treated similarly in every way except for the training. Both groups completed attitude surveys immediately after the trained group finished training. Six months later, both groups completed self-report surveys to measure changes in their job behavior. Their supervisors also were asked to complete a similar behavior measure at the six-month mark.

The data provided some revealing information. For the trained group, no changes in attitude or behavior were indicated, either by the self-report or by supervisor-reported surveys. This result was also true (but expected) for the group not trained.

Was training a failure in the Palm Desert case? Would the training manager be pleased with these results? Was the evaluation process flawed? These types of issues will be addressed in this chapter. We will refer back to the case from time to time to answer these and other questions.


Imagine a business that decided it would not look at its profitability, return on investment (ROI), or productivity. You are a supervisor with this company, but you never look at how well or poorly your subordinates are performing their jobs. This is what training is like when no evaluation is conducted. Good management practice dictates that organizational activities are routinely examined to ensure that they are occurring as planned and are producing the anticipated results. Otherwise, no corrective action can be taken to address people, processes, and products or services that stray “off track”.

Nonetheless, many rationalizations for not evaluating training continue to exist, and evaluation of training is often not done. A 1988 survey of 45 Fortune 500 companies indicated that all of them asked trainees how much they liked training, but only 30 percent assessed how much was learned, and just 15 percent examined behavioral change. 2  Other evidence from that time suggested that only 1 company in 100 used an effective system for measuring the organizational effects and value of training. 3 But this is changing. In a 1996 study, 70 percent assessed learning, 63 percent assessed behavior, and 25 percent assessed organizational results. 4  Evaluation of training at all levels is becoming more common. Nevertheless, the evaluation of training is still not where it needs to be. A 2006 study of 140 businesses of all sizes and types shows that the things organizations view as the most important outcomes of training are still not being measured very often. 5

But, as noted, over the course of time, more organizations are evaluating training. The main reason for this is an increase in accountability. Top management is demanding evidence that training departments are contributing positively to the bottom line. 6  Dave Palm, training director of LensCrafters, knows firsthand about this trend. A frantic regional manager called Dave and told him that executives were looking to improve the bottom line and could not find enough evidence that training programs were providing a quantifiable return on the company’s investment. Yes, they knew that trainees were satisfied with training, but was the company getting the bang for their buck? The conversation ended with the regional manager saying, “So, Dave, what are you going to do about it?” Dave got his wake-up call. 7


Training managers can come up with a surprising number of reasons for not evaluating training, including the following:

· • There is nothing to evaluate.

· • No one really cares about it.

· • Evaluation is a threat to my job.

There Is Nothing to Evaluate

For some companies, training is simply a reward for good performance, or something that is mandated so everyone has to attend. 8  The argument here is that training is not expected to accomplish anything, so there is nothing to evaluate.


The first thing we would question here is why the company is spending money on something that has no value. We would argue that even in cases where training is a reward, it is designed with some goals or objectives in mind. Some type of knowledge, skills, or attitude (KSA) change is expected from the participants even if they just feel more positive about their job or the company. Once this goal or objective is identified, it can be measured. Evaluation is simply measuring the degree to which objectives are achieved. Even when training is mandated, such as safety training, there are still objectives to be achieved in terms of learning, job behavior, and the organization.

No One Really Cares About Evaluating Training

The most common rationale for not conducting training evaluations is that “formal evaluation procedures are too expensive and time-consuming, and no one really cares anyway.” This explanation usually means that no one specifically asked for, demanded, or otherwise indicated a need for assessment of training outcomes.


If an evaluation is not specifically required, this does not mean that training is not evaluated. Important organizational decisions (e.g., budget, staffing, and performance evaluations) are made with data when data exist, but will also be made if the data do not exist. If no formal evaluations of training have taken place, the decision makers will decide on the basis of informal impressions of training’s effectiveness. Even in good economic times, the competition for organizational budget allocations is strong. Departments that can document their contributions to the organization and the return on budget investment are more likely to be granted their budget requests. The question, then, is not whether training should be evaluated, but rather who will do it (training professionals or budget professionals), how it will be done (systematic and formally or informal impressions), and what data will be used (empirical studies of results or hearsay and personal impressions).

Evaluation Is a Threat to My Job

Think about it. According to the 2011 State of the Industry Report conducted by the American Society for Training & Development, training budgets in the United States totaled over $171.5 billion. 9  Why wouldn’t human resource development (HRD) departments be evaluating their results? Fear of the result is one reason. Football coach Woody Hayes, back in the 1950s, once said that he never liked to throw the forward pass because three things could happen and two of them were bad. The same could be said for evaluation. If time and money are spent on training, and an evaluation determines that no learning occurred, or worse, job performance declined, this doesn’t reflect well on the training provided. Although most managers are not likely to admit this concern publicly, it can be a real problem. When we use the term evaluation, we too often think of a single final outcome at a particular point that represents success or failure—like a report card. This type of evaluation is called an  outcome evaluation . When the focus is on this type of evaluation, managers naturally can be concerned about how documenting the failure of their programs will affect their careers. Consider  Training in Action 9-1 . It provides an example of an evaluation designed to provide feedback so that improvement (through training and practice) can take place. But when the focus shifted from “helping improve” ( process evaluation ) to a “measurement of success or failure” (outcome evaluation), the desire to participate in the process disappeared, and the airline threatened to discontinue it.


Can the airline in  Training in Action 9-1  be blamed for wanting to opt out of the program? It is easy to understand why someone would not want to participate in a program where the information could be used against them. While outcome results are important in making business decisions, the day-to-day purpose of evaluation should be used as a feedback mechanism to guide efforts toward success. 10  While trying to convince a client that the company’s training should be evaluated, one trainer decided not to use the term evaluation.

9-1 Training in Action Evaluation: What It Is Used for Matters 11

For 30 years, British Airways maintained a system in all its aircraft that monitors everything done by the aircraft and its pilots. This information is examined continuously to determine any faulty aircraft mechanisms and to constantly assess the skill level of the pilots. When a pilot is flagged as having done “steep climbs” or “hard” or “fast landings,” for example, the pilot is targeted for training to alleviate the skill deficiency. The training is used, therefore, as a developmental tool to continuously improve the performance of pilots. The evaluation is not used as a summative measure of performance upon which disciplinary measures might be taken. The result for British Airways, one of the largest airlines in the world, is one of the best safety records in the world.

In the past, one of the major ways of determining problems in the airline industry in North America was to wait until an accident occurred and then examine the black box to find the causes. The findings might indicate pilot error or some problem with the aircraft. This information was then sent to all the major airlines for their information. This form of summative evaluation met with disastrous results. Recently, six major American airlines began a program similar to the one at British Airways. After all, it makes sense to track incidents and make changes (in aircraft design or pilot skill level) as soon as a problem is noticed. In this way, major incidents are more likely to be avoided. In fact, airlines are using the evaluation information gathered as a feedback mechanism to ensure the continuous improvement of performance and not as a summative evaluation of “failure.”

This seemingly effective way of ensuring high performance threatened to come to an end in the United States. The Federal Aviation Administration (FAA) wanted to access this information for possible use as a way to evaluate pilots. The airlines feared that the information given to the FAA could be used to punish both pilots and the airlines. Fortunately, these regulations were never put into place and both the airlines and the FAA continue to use this cockpit information as a means of continuously improving safety and pilot performance by improving the training of pilots.

Instead, he chose the term data tracking. He emphasized tracking attitudes and behaviors over time and supplying feedback based on the findings to the training designers and presenters. This feedback could then be used to modify training and organizational systems and processes to facilitate the training’s success. The term data tracking did not have the same connotation of finality as evaluation. Hence, managers saw it as a tool for improving the likelihood of a successful intervention rather than as a pass/fail grade.

Was the evaluation in the Palm Desert case outcome or process focused? It is difficult to say without actually talking to those involved. If it was used for continuous improvement, assessment of the training process, as well as how much the participants learned, it could be helpful in determining the reason that transfer did not take place. On the basis of this information, the city could design additional interventions to achieve desired outcomes.


On the surface, the arguments for ignoring evaluation of training make some sense, but they are easily countered when more carefully analyzed. However, perhaps the biggest reason for abandoning the resistance to evaluation is its benefit, especially today, when more and more organizations are demanding accountability at all levels. Managers increasingly are demanding from HRD what they demand from other departments: Provide evidence of the value of your activities to the organization. 12  Other factors that influence the need to evaluate training are competitive pressures on organizations requiring a higher focus on quality, continuous improvement, and organizational cost cutting. 13

Sometimes, the image of the training function, especially among line managers, is less than desirable because they see this as a “soft” area, not subject to the same requirements for accountability as their areas. By using the same accountability standards, it is possible to improve the image of training. Furthermore, the technology for evaluating and placing dollar amounts on the value of training has improved in the last several years. However, let us be clear. We do not advocate a comprehensive evaluation of every training program. The value of the information gained must be worth the cost. Sometimes, the cost of different components of an evaluation is simply too high relative to the information gained. 14


Let’s go back to the evaluation phase figure at the beginning of the chapter. Recall from  Chapter 5  that one of the outputs from the design phase is evaluation considerations. These considerations, or more specifically, what is determined important to evaluate, are inputs to the evaluation phase. Organizational constraints and design issues are also inputs to evaluation. Remember that evaluation processes and outcome measures should be developed soon after the design phase output is obtained. The two types of outputs from the evaluation phase are process and outcome evaluation. Process evaluation compares the developed training to what actually takes place in the training program. Outcome evaluation determines how well training has accomplished its objectives.

Process Data

One of the authors has a cottage near a lake, and he often sees people trying unsuccessfully to start their outboard motors. In going to their assistance, he never starts by suggesting that they pull the plugs to check for ignition or disconnect the float to see whether gas is reaching the carburetor. Instead, he asks if the gas line is connected firmly, if the ball is pumped up, if the gear shift is in neutral (many will not start in gear), and if the throttle is at the correct position, all of which are process issues. He evaluates the “process” of starting the engine to see whether it was followed correctly. If he assumed that it was followed and tried to diagnose the “problem with the engine,” he might never find it.

It is the same with training. If learning objectives were not achieved, it is pointless to tear the training design apart in trying to fix it. It might simply be a process issue—the training was not set up or presented the way it was intended. By examining the entire training process, it is possible to see all the places where the training might have gone wrong. In the examination of the process, we suggest segmenting the process into two areas: process before training and process during training.


Several steps are required in analyzing the processes used to develop training.  Table 9-1  identifies questions to ask during the analysis of the training process. First, you can assess the effectiveness of the needs analysis from the documentation or report that was prepared. This report should indicate the various sources from which the data were gathered and the KSA deficiencies.

Next, you can assess the training objectives. Are they in line with the training needs? Were objectives developed at all levels: organizational, transfer, learning, and reaction? Are they written clearly and effectively to convey what must be done to demonstrate achievement of the objectives? It is important that you examine the proposed evaluation tools to be sure that they are relevant. On the basis of the needs assessment and resulting objectives, you can identify several tools for assessing the various levels of effectiveness. We discuss the development of these tools later in this chapter. Then evaluate the design of the training. For example, if trainees’ motivation to attend and learn is low, what procedures are included in the design to deal with this issue?

Would a process evaluation prove useful in the Palm Desert case? Yes. In that situation, as it stands, we recognize that training was not successful, but we do not know why. The process that leads to the design of training might provide the answer. Another place we might find the answer is in the training implementation.


TABLE 9-1 Potential Questions for Analysis of Processes Prior to Delivery 15

Were needs diagnosed correctly?

· • What data sources were used?

· • Was a knowledge/skill deficiency identified?

· • Were trainees assessed to determine their prerequisite KSAs?

Were needs correctly translated into training objectives?

· • Were all objectives identified?

· • Were the objectives written in a clear, appropriate manner?

Was an evaluation system designed to measure accomplishment of objectives?

Was the training program designed to meet all the training objectives?

· ● Was previous learning that might either support or inhibit learning in training identified?

· ● Were individual differences assessed and taken into consideration in training design?

· ● Was trainee motivation to learn assessed?

· ● What steps were taken to address trainee motivation to learn?

· ● Were processes built into the training to facilitate recall and transfer?

· ● What steps are included in the training to call attention to key learning events?

· ● What steps are included in the training to aid trainees in symbolic coding and cognitive organization?

· ● What opportunities are included in the training to provide symbolic and behavioral practice?

· ● What actions are included in the training to ensure transfer of learning to the job?

Are the training techniques to be used appropriate for each of the learning objectives?

If your outcome data show that you didn’t get the results you expected, then training implementation might be the reason. Was the training presented as it was designed to be? If the answer is yes, then the design must be changed. But, it is possible that the trainer or others in the organization made some ad hoc modifications. Such an analysis might prove useful in the Palm Desert case.

TABLE 9-2 Potential Questions for a Process Analysis of Training Delivery

· ● Were the trainer, training techniques, and learning objectives well matched?

· ● Were lecture portions of the training effective?

· ● Was involvement encouraged or solicited?

· ● Were questions used effectively?

· ● Did the trainer conduct the various training methodologies (case, role-play, etc.) appropriately?

· ● Were they explained well?

· ● Did the trainer use the allotted time for activities?

· ● Was enough time allotted?

· ● Did trainees follow instructions?

· ● Was there effective debriefing following the exercises?

· ● Did the trainer follow the training design and lesson plans?

· ● Was enough time given for each of the requirements?

· ● Was time allowed for questions?

Imagine, for example, that the Palm Desert training had required the use of behavior modeling to provide practice in the skills that were taught. The evaluation of outcomes shows that learning of the new behaviors did not occur. If no process data were gathered, the conclusion could be that the behavior modeling approach was not effective. However, what if examination of the process revealed that trainees were threatened by the behavior modeling technique, and the trainer allowed them to spend time discussing behavior modeling, which left less time for doing the modeling? As a result, it is quite plausible that there are problems with both the design and the implementation of the training. Without the process evaluation, this information would remain unknown, and the inference might be that behavior modeling was not effective.

Examples of implementation issues to examine are depicted in  Table 9-2 . Here, it is up to the evaluator to determine whether all the techniques that were designed into the program were actually implemented. It is not good enough simply to determine that the amount of time allotted was spent on the topic or skill development. It must also be determined whether trainees actually were involved in the learning activities as prescribed by the design. As in the previous behavior modeling example, the time allotted might be used for something other than behavior modeling.


Actual training is compared with the expected (as designed) training to provide an assessment of the effectiveness of the training implementation. Much of the necessary information for the expected training can be obtained from records and reports developed in the process of setting up the training program. A trainer’s manual would provide an excellent source of information about what should be covered in the training. Someone could monitor the training to determine what actually was covered. Another method is to ask trainees to complete evaluations of process issues for each module. Videotape, instructors’ notes, and surveys or interviews with trainees can also be used. Keep in mind that when you are gathering any data, the more methods you use to gather information, the better the evaluation will be.


Table 9-3  depicts those interested in process data. Clearly, the training department is primarily concerned with this information to assess how they are doing. The customers of training (defined as anyone with a vested interest in the training department’s work) usually are more interested in outcome data than in process data.

TABLE 9-3 Who Is Interested in the Process Data 16

Training Department  
  Trainer Yes, it helps determine what works well and what does not.
  Other trainers Yes, to the extent the process is generalizable.
  Training manager Only if training is not successful or if a problem is present with a particular trainer.
Customers of the Training Department  
  Trainees No
  Trainees’ supervisor No
  Upper management No

Providing some process data is important, even if it is only the trainer’s documentation and the trainees’ reactions. The trainer can use this information to assess what seems to work and what does not. Sometimes, more detailed process data will be required, such as when training will be used many times, or when the training outcomes have a significant effect on the bottom line. If, however, it is only a half-day seminar on the new computer software, collecting process information might not be worth the cost.

Once training and trainers are evaluated several times, the value of additional process evaluations decreases. If you are conducting training that has been done numerous times before, such as training new hires to work on a piece of equipment, and the trainer is one of your most experienced, then process analysis is probably not necessary. If the trainer was fairly new or had not previously conducted this particular session, it might be beneficial to gather process data through a senior trainer’s direct observation.

To be most effective, we believe that evaluations should be process oriented and focused on providing information to improve training, not just designed to determine whether training is successful. Some disagree with this approach and suggest that process evaluation ends when the training program is launched. 17  We suggest, however, that the evaluation should always include process evaluation for the following reasons:

· ● It removes the connotation of pass/fail, making evaluation more likely.

· ● It puts the focus on improvement, a desirable goal even when training is deemed successful. 18

Outcome Data

To determine how well the training met or is meeting its goals, it is necessary to examine various outcome measures. The four outcome measures that are probably the best known are reaction, learning, behavior, and organizational results. 19  These outcomes are ordered as follows:

· ● Reaction outcomes come first and will influence how much can be learned.

· ● Learning outcomes influence how much behavior can change back on the job.

· ● Behavior outcomes are the changes of behavior on the job that will influence organizational results.

· ● Organizational results are the changes in organizational outcomes related to the reason for training in the first place, such as high grievance rate, low productivity, and so forth.

This description is a simplified version of what actually happens, and critics argue that little empirical evidence indicates that the relationships between these outcomes exist. 20  We will discuss this in more detail later.

Reaction outcomes  are measures of the trainee’s perceptions, emotions, and subjective evaluations of the training experience. They represent the first level of evaluation and are important because favorable reactions create motivation to learn. Learning may also occur even if the training is boring or alternatively, it may not occur even if it is interesting. 21  However, if training is boring, it will be difficult to attend to what is being taught. As a result, the trainees might not learn as much as they would if they found the training interesting and exciting. High reaction scores from trainees, therefore, indicate that attention was most likely obtained and maintained, which, as you recall from social learning theory, is the first part of learning—getting their attention.

Learning outcomes  are measured by how well the learning objectives and purpose were achieved. The learning objectives for the training that were developed in the design phase specify the types of outcomes that will signify that training has been successful. Note the critical relationship between the needs analysis and evaluation. If the training process progressed according to the model presented in this book, the way to measure learning was determined during the training needs analysis (TNA). At that time, the employee’s KSAs were measured to determine whether they were adequate for job performance. The evaluation of learning should measure the same things in the same way as in the TNA. Thus, the needs analysis is actually the “pretest.” A similar measure at the end of training will show the “gain” in learning.

Job behavior outcomes  are measures of the degree to which the learned behavior has transferred to the job. During the TNA, performance gaps were identified and traced to areas in which employees were behaving in a manner that was creating the gap. The methods used for measuring job behavior in the TNA should be used in measuring job behavior after the completion of training. Once again, the link between needs analysis and evaluation is evident. The degree to which job behavior improves places a cap on how much the training will improve organizational results.

Organizational results  occupy the highest level in the hierarchy. They reflect the organizational performance gap identified in the TNA. This OPG is often what triggers reactive (as opposed to proactive) training. Here are some examples:

· ● High levels of scrap are being produced.

· ● Employees are quitting in record numbers.

· ● Sales figures dropped over the last two quarters.

· ● Grievances are on the increase.

· ● The number of rejects from quality control is rising.

Once again, if one of these OPGs triggered the training, it can be used as the baseline for assessing improvement after training. This process of integrating the TNA and evaluation streamlines both processes, thereby making the integration more cost-effective. 22


If each level of the outcome hierarchy is evaluated, it is possible to have a better understanding of the full effects of training. 23  Let’s examine one of the items in the preceding list—a high grievance rate—as it relates to the training process and the four levels of evaluation.

The needs analysis determines that the high grievance rate is a function of supervisors not managing conflict well. Their knowledge is adequate, but their skills are deficient. From the needs analysis, data are obtained from a behavioral test that measures conflict management skills for comparison with skill levels after training has been completed. Training is provided, and then participants fill out a reaction questionnaire. This tool measures the degree to which trainees feel positive about the time and effort that they have invested in the program and each of its components. Assume that the responses are favorable. Even though the trainees feel good about the training and believe that they learned valuable things, the trainer recognizes that the intended learning might not have occurred. The same behavioral test of conflict management skills is administered, and the results are compared with pretraining data. The results show that the trainees acquired the conflict management skills and can use them appropriately, so the learning objectives were achieved. Now the concern is if these skills transferred to the job. We compare the behavior of the supervisors before training and after training regarding use of conflict management skills and discover they are using the skills so transfer to the job was successful. The next step is to examine the grievance rate. If it has declined, it is possible, with some

level of confidence, to suggest that training is the cause of the decline. If it is determined that learning did not take place after training, it would not make sense to examine behavior or results, because learning is a prerequisite.

Let’s examine each of these four levels of evaluation more closely.


The data collected at this level are used to determine what the trainees thought about the training. Reaction questionnaires are often criticized, not because of their lack of value, but because they are often the only type of evaluation undertaken. 24

Affective and utility are two types of reaction questionnaire. 25  An  affective questionnaire  measures general feeling about training (“I found this training enjoyable”), whereas the  utility questionnaire  reflects beliefs about the value of training (“This training was of practical value”). While both type are useful, we believe that specific utility statements on reaction questionnaires are more valuable for making changes.

Training reaction questionnaires do not assess learning but rather the trainees’ attitudes about and perceptions of the training. Categories to consider when developing a reaction questionnaire should include training relevance, training content, materials, exercises, trainer(s) behavior, and facilities.

Training Relevance

Asking trainees about the relevance (utility) of the training they experienced provides the organization with a measure of the perceived value of the training. If most participants do not see any value in it, they will experience difficulty remaining interested (much less consider applying it back on the job). Furthermore, this perceived lack of value can contaminate the program’s image. Those who do not see its value will talk to others who have not yet attended training and will perhaps suggest that it is a waste of time. The self-fulfilling prophecy proposes that if you come to training believing that it will be a waste of time, it will be. Even if the training is of great importance to the organization, participants who do not believe that it is important are not likely to work to achieve its objectives.

Once trainees’ attitudes are known, you can take steps to change the beliefs, either through a socialization process or through a change in the training itself. Think about the Palm Desert case. What do you think the trainees’ reactions to the training were? Might this source of information help explain why no change in behavior occurred?

Training Materials and Exercises

Any written materials, videos, exercises, and other instructional tools should be assessed along with an overall evaluation of the training experience. On the basis of responses from participants, you can change these to make them more relevant to participants. Making suggested modifications follows the organizational development principle of involving trainees in the process.

Reactions to the Trainer

Reaction questionnaires also help determine how the trainees evaluated the trainer’s actions. Be sure to develop statements that specifically address what the trainer did. General statements tend to reflect trainees’ feelings about how friendly or entertaining the trainer was (halo error) rather than how well the training was carried out. Simply presenting an affective statement such as “The trainer was entertaining” would likely elicit a halo response. For this reason, it is useful to identify specific aspects of trainer behavior that need to be rated. If more than one trainer is involved, then trainee reactions need to be gathered for each trainer. Asking trainees to rate the trainers as a group will mask differences among trainers in terms of their effectiveness.

Asking about a number of factors important to effective instruction causes the trainees to consider how effective the instructor was in these areas. When the final question, “Overall, how effective was the instructor?” is asked, the trainees can draw upon their responses to a number of factors related to effective instruction. This consideration will result in a more accurate response as to the overall effectiveness of the instructor. There will be less halo error. Note that the questionnaire in  Table 9-4  asks the trainee to consider several aspects of the trainer’s teaching behavior before asking a more general question regarding effectiveness.

TABLE 9-4 Reaction Questions About the Trainer

Please circle the number to the right of the following statements that reflects your degree of agreement or disagreement.

1 = Strongly disagree

2 = Disagree

3 = Neither agree nor disagree

4 = Agree

5 = Strongly agree

· 1. The trainer did a good job of stating the objectives at the beginning of training. 1 2 3 4 5
· 2. The trainer made good use of visual aids (easel, white board) when making the presentations. 1 2 3 4 5
· 3. The trainer was good at keeping everyone interested in the topics. 1 2 3 4 5
· 4. The trainer encouraged questions and participation from trainees. 1 2 3 4 5
· 5. The trainer made sure that everyone understood the concepts before moving on to the next topic. 1 2 3 4 5
· 6. The trainer summarized important concepts before moving to the next module. 1 2 3 4 5
· 7. Overall, how would you rate this trainer? (Check one)

· _____ 1. Poor—I would not recommend this trainer to others.

· _____ 2. Adequate—I would recommend this trainer only if no others were available.

· _____ 3. Average

· _____ 4. Good—I would recommend this trainer above most others.

· _____ 5. Excellent—This trainer is among the best I’ve ever worked with.

Facilities and Procedures

The reaction questionnaire can also contain items related to the facilities and procedures to determine whether any element impeded the training process. Noise, temperature, seating arrangements, and even the freshness of the doughnuts are potential areas that can cause discontent. One way to approach these issues is to use open-ended questions, such as the following:

· ● Please describe any aspects of the facility that enhanced the training or created problems for you during training (identify the problem and the aspect of the facility).

· ● Please indicate how you felt about the following:

· ● Refreshments provided

· ● Ability to hear the trainer and other trainees clearly

· ● Number and length of breaks

Facility questions are most appropriate if the results can be used to configure training facilities in the future. The more things are working in the trainer’s favor, the more effective training is likely to be.

The data from a reaction questionnaire provide important information that can be used to make the training more relevant, the trainers more sensitive to their strengths and shortcomings, and the facilities more conducive to a positive training atmosphere. The feedback the questionnaire provides is more immediate than with the other levels of evaluation; therefore, modifications to training can be made much sooner.

Timing of Reaction Assessment

The timing and type of questions asked on a reaction questionnaire should be based on the information needed for evaluating and improving the training, the trainer(s), the processes, or the facility. Most reaction questionnaires are given to participants at the conclusion of training, while the training is still fresh and the audience is captive. However, a problem with giving them at this time is that the participant might be anxious to leave and might give incomplete or less-than-valid data. Also, trainees might not know whether the training is useful on the job until they go back to the job and try it.

An alternative is to send out a reaction questionnaire at some point after training. This delay gives the trainee time to see how training works in the actual job setting. However, the trainee might forget the specifics of the training. Also, there is no longer a captive audience, so response rate may be poor.

Another approach is to provide reaction questionnaires after segments of a training program or after each day in a multiday training session. In such situations, it might be possible to modify training that is in progress on the basis of trainees’ responses. Of course, this system is more costly and requires a quicker turnaround time for analysis and feedback of the data.

Regardless of how often reaction evaluation takes place, the trainer should specify at the beginning that trainees will be asked to evaluate the training and state when this evaluation will occur. It not only helps clarify trainee expectations about what will happen during training but also acknowledges the organization’s concern for how the trainees feel about the training. It is also important that the data gathered be used. Trainees and employees in the rest of the organization will quickly find out if the trainer is simply gathering data only to give the impression of concern about their reactions.  Table 9-5 provides a list of steps to consider when developing a reaction questionnaire.

Caution in Using Reaction Measures

A caution is in order regarding reaction questionnaires sent out to trainees sometime after training asking them about the amount of transfer of training that has occurred on the job. Trainees tend to indicate that transfer has occurred when other measures suggest it did not. 26  Therefore, reaction measures should not be the only evaluation method used to determine transfer of training.

Reaction questionnaires are not meant to measure learning or transfer to the job. They do, however, provide the trainees with the opportunity to indicate how they felt about the learning. How interesting and relevant the training is found to be will affect their level of attention and motivation. What the trainees perceive the trainer to be doing well and not so well is also useful feedback for the trainer. The reaction information can be used to make informed decisions about modifications to the training program.

TABLE 9-5 Steps to Consider in Developing a Reaction Questionnaire

· 1. Determine what needs to be measured.

· 2. Develop a written set of questions to obtain the information.

· 3. Develop a scale to quantify respondents’ data.

· 4. Make forms anonymous so that participants feel free to respond honestly.

· 5. Ask for information that might be useful in determining differences in reactions by subgroups taking the training (e.g., young vs. old; minority vs. nonminority). This could be valuable in determining effectiveness of training by different cultures, for example, which might be lost in an overall assessment. Note: Care must be taken when asking for this information. If you ask too many questions about race, gender, age, tenure, and so on, participants will begin to feel that they can be identified without their name on the questionnaire.

· 6. Allow space for additional comments to allow participants the opportunity to mention things you did not consider.

· 7. Decide the best time to give the questionnaire to get the information you want.

· a. If right after training, ask someone other than the instructor to administer and pick up the information.

· b. If some time later, develop a mechanism for obtaining a high response rate (e.g., encourage the supervisor to allow trainees to complete the questionnaire on company time).


Learning objectives are developed from the TNA. As we noted, training can focus on three types of learning outcomes: knowledge, skills, and attitudes (KSAs). The difference between the individual’s KSAs and the KSAs required for acceptable job performance defines the learning that must occur. The person analysis serves as the pretraining measure of the person’s KSAs. These results can be compared with a posttraining measure to determine whether learning has occurred and whether those changes can be attributed to training. The various ways of making such attributions will be discussed later in the chapter.  Chapter 4  discussed the various ways in which KSAs can be measured. The work done in the Needs Analysis phase to identify what should be measured and how it will be measured determines what measures you will use in your evaluation. This just makes sense because your learning objectives should be based on the training needs you identified in the Needs Analysis phase. Unless you were extremely insightful or lucky, you probably measured a number of things that didn’t end up being training needs as well as things that did. It is only the KSAs that ended up being training needs that get evaluated at the end of training. For example, let’s say your needs analysis used a knowledge test that assessed the employees’ problem-solving knowledge. Your test had 50 items measuring various aspects of problem solving. The person analysis showed that 30 of these items were training needs. In the Design phase these 30 items would be the focus of the learning objectives you developed. A training program would then be created to address the learning objectives and de facto those 30 items. Your evaluation instrument should then assess if knowledge of those 30 items has been learned by the trainees.

Timing of Assessment of Learning

Depending on the duration of training, it might be desirable to assess learning periodically to determine how trainees are progressing. Periodic assessment would allow training to be modified if learning is not progressing as expected.

Assessment should also take place at the conclusion of training. If learning is not evaluated until sometime later, it is impossible to know how much was learned and then forgotten.

In the Palm Desert case, the measures that they took six months after training created a dilemma. Was the behavior ever learned, learned but forgotten, or learned but not transferred to the job?


Once it is determined that learning took place, the next step is to determine whether the training transferred to the job. Assessment at this step is certainly more complex and is often ignored because of the difficulties of measurement. However, if you did your needs analysis correctly, you have already determined what behavior to measure and how to measure it.

Several methods may have been used to assess job behavior prior to training. These methods were covered in depth in the discussion of TNA in  Chapter 4. So, the method used when conducting the needs assessment should be used to evaluate whether or not the learned behavior transferred to the job.

Scripted Situations

Some recent research indicates that scripted situations might provide a better format for evaluating transfer of training than the more traditional behavioral questionnaires. 27  Scripted situations help the rater recall actual situations and the behaviors related to them rather than attempting to recall specific behaviors without the context provided. The rater is provided with several responses that might be elicited from the script and is asked to choose the one that describes the ratee’s behavior. Research suggests that this method is useful in decreasing rating errors and improving validity. 28  An example of this method is depicted in  Table 9-6 .

Finally, the trainer who includes sit-ins as a later part of training can observe on-the-job performance of the trainee. As was discussed in  Chapter 5 , these sit-ins facilitate transfer 29  and also help the trainer determine the effectiveness of the training in facilitating the transfer of training to the job.

Transfer of Attitudes

If attitudinal change is a goal of training, then it becomes necessary to assess the success of transfer and duration of the attitudinal change once the trainee is back on the job. Whatever method was used to determine the need for a change in attitude should be used to measure how much they have changed. As discussed in the needs analysis chapter, one way to assess changes in attitudes is by observing changes in behaviors. Attitudinal change can also be assessed through attitude surveys. Remember if respondents’ anonymity is ensured in such surveys, responses are more likely to reflect true attitudes.

TABLE 9-6 Scripted Situation Item for Evaluation of a School Superintendent

After receiving training and being back on the job for four months a school superintendent is being rated by members of the staff. The following is an example of one of the scripted scenarios used for rating.

The following is a scenario regarding a school superintendent. To rate your superintendent, read the scenario and place an X next to the behavior you believe your superintendent would follow.

The administrator receives a letter from a parent objecting to the content of the science section on reproduction. The parent strongly objects to his daughter being exposed to such materials and demands that something be done. The administrator would be most likely to: (check one)

· ____ Ask the teacher to provide handouts, materials, and curriculum content for review.

· ____ Check the science curriculum for the board-approved approach to reproduction, and compare school board guidelines with course content.

· ____ Ask the head of the science department for an opinion about the teacher’s lesson plan.

· ____ Check to see whether the parent has made similar complaints in the past.

A study of steward training provides an example of the assessment of an attitude back on the job. 30  Training was designed to make union stewards more accessible to the rank and file by teaching them listening skills and how to interact more with the rank and file. Results indicated that when factors such as tenure as a union official and age were controlled, stewards who received the training behaved in a more participative manner (changed behavior) and were more loyal to the union (attitude survey). For the union, loyalty is important because it translates into important behaviors that might not be measured directly, such as supporting the union’s political candidates and attending union functions. 31

Timing of Job Behavior Assessment

The wait time for assessing transfer of training depends on the training objectives. If the objective is to learn how to complete certain forms, simply auditing the work on the job before and after training would determine whether transfer took place. This could be done soon after training was complete. When learning objectives are more complex, such as learning how to solve problems or resolve conflict, wait time before assessment should be longer. The trainee will first need to become comfortable enough with the new behavior to exhibit it on a regular basis; then it will take more time for others to notice that the behavior has changed.

To understand this point, consider a more concrete change. Jack loses 10 pounds. First, the weight loss is gradual and often goes unnoticed. Even after Jack lost the weight, for some time people will say, “Gee, haven’t you lost weight?” or “What is it that’s different about you?” If this uncertainty about specific changes happens with a concrete visual stimulus, imagine what happens when the stimuli are less concrete and not consistent. Some types of behavioral change might take a long time to be noticed.

To help get employees to notice the change in behavior, you can ask them to assess whether certain behaviors have changed. In our example, if asked, “Did Jack lose weight?” and he had lost 10 pounds, you would more than likely notice it then, even if you did not notice it before.


Training objectives, whether proactive or reactive, are developed to solve an organizational problem—perhaps an expected increase in demand for new customer services in the proactive case, or too many grievances in the reactive case. The fact that a problem was identified (too many grievances) indicates a measurement of the “organizational result.” This measurement would be used to determine any change after the training was completed. If it was initially determined that too many defective parts were being produced, the measurement of the “number of defective parts per 100 produced” would be used again after training to assess whether training was successful. This assessment is your organizational result.

It is important to assess this final level, because it is the reason for doing the training in the first place. In one sense, it is easier to measure than is job behavior. Did the grievances decrease? Did quality improve? Did customer satisfaction increase? Did attitudes in the annual survey get more positive? Did subordinates’ satisfaction with supervision improve? Such questions are relatively easily answered. The difficult question is, “Are the changes a result of training?” Perhaps the grievance rate dropped because of recent successful negotiations and the signing of a contract the union liked. Or if attitudes toward supervision improved but everyone recently received a large bonus, the improvement might be a spill-off from the bonus and not the training. These examples explain why it is so important to gather information on all levels of the evaluation.

The links among organizational results, job behavior, and trainee KSAs should be clearly articulated in the TNA. This creates a model that specifies that if certain KSAs are developed and the employees use them on the job, then certain organizational results will occur. The occurrence of these things validates the model and provides some confidence that training caused these results. Thus, the difficult task of specifying how training should affect the results of the organization is already delineated before evaluation begins. TNAs are not always as thorough as they should be; therefore, it often falls to the evaluator to clarify the relationship among training, learning, job behavior, and organizational outcomes. For this reason, it is probably best to focus on organizational results as close to the trainee’s work unit as possible. Results such as increased work unit productivity, quality, and decreased costs are more appropriate than increased organizational profitability, market share, and the like. Quantifying organizational results is not as onerous as it might seem at first glance.

Timing of Assessment of Organizational Results

Consistent tracking of the organizational performance gaps such as high scrap, number of grievances, or poor quality should take place at intervals throughout the training and beyond. At some point after the behavior is transferred to the job, it is reasonable to expect improvement. Tracking performance indices over time allows you to assess whether the training resulted in the desired changes to organizational results. You will need to also track any other organizational changes that might be affecting those results. For example, a downturn in the economy might result in the necessity for temporary layoffs. This could trigger an increase in grievances, even though the grievance training for supervisors was very effective. This is one of the difficulties of linking training to organizational results. There are a multitude of factors, other than employees’ KSAs, that determine those results.


As suggested earlier, researchers have disagreed about the relationship among these four levels of evaluation. For example, some studies show reaction and learning outcomes to be strongly related to each other. 32  Others indicate little correlation between results of reaction questionnaires and measures of learning. 33  As noted earlier, a good response to the reaction questionnaire might mean only that the trainer had obtained the trainees’ attention. This factor is only one of many in the learning process. The findings also indicate that the more distant the outcome is from the actual training, the smaller the relationship is between higher- and lower-level outcomes.  Figure 9-1  illustrates the hierarchical nature of the outcomes and the factors that can influence these outcomes.

The research showing no relationship between the levels makes sense if we remember that organizational outcomes generally are the result of multiple causes. 34  For example, productivity is affected not only by the employees’ KSAs but also by the technology they work with, supplier reliability, interdependencies among work groups, and many other factors. Although improvements can occur in one area, declines can occur in another. When learning takes place but does not transfer to the job, the issues to be concerned with do not involve learning, but they do

involve transfer. What structural constraints are being placed on trainees, so they do not behave properly? Beverly Geber, special projects editor for Training magazine, describes a situation in which training in communication skills at Hutchinson Technologies, a computer component manufacturer, was not transferring to the job for some of the employees. 35  An examination of the issue (through worker focus groups) disclosed that some employees were required to work in cramped space with poor lighting. These conditions made them irritable and unhappy. Did this situation affect their ability to communicate with their customers in a pleasant and upbeat manner? “You bet,” said their human resource (HR) representative.

FIGURE 9-1 Training Outcomes and Factors Influencing Them

Despite all the reasons that a researcher might not find a relationship among the four levels of evaluation, research has begun to show the existence of these linkages. 36  More research needs to be done, but there is evidence to show that reactions affect learning outcomes, and learning outcomes affect transfer to the job. Few studies have attempted to link transfer outcomes to organizational outcomes due to the significant problems of factoring out other variables related to those outcomes.

Evaluating the Costs and Benefits of Training

Let’s say you are able to show that your training caused a decrease in the number of grievances. You have data to show that participants are engaging in the new behaviors, and they have the desired knowledge and skills. Your examination of all four levels of evaluation provides evidence of cause and effect, and your use of appropriate designs (see  Appendix 9-1 ) enhances the level of confidence in all of these outcomes. You might think that your job was done, but many executives still might ask, “So what?” Looking at the outcomes of training is only half the battle in evaluating its effectiveness. The other half is determining whether the results were worth the cost.


Was the training cost worth the results? This question can be answered in either of the following two ways:37

· ● Cost/benefit evaluation

· ● Cost-effectiveness evaluation

Cost/Benefit Evaluation

 cost/benefit evaluation  of training compares the monetary cost of training with the nonmonetary benefits. It is difficult to place a value on these benefits, which include attitudes and working relationships. The labor peace brought about by the reduction in grievances is difficult to assess, but it rates high in value compared with the cost of training. The conflict resolution skills learned by supervisors provide the nonmonetary benefit of better relationships between supervisors and union officials, and this is important. However, it is also possible to assess the reduction in grievances (for example) in a way that directly answers the cost-effectiveness question.

Cost-Effectiveness Evaluation

 cost-effectiveness evaluation  compares the monetary costs of training with the financial benefits accrued from training. There are two approaches for assessing cost-effectiveness:

· 1. Cost savings, a calculation of the actual cost savings, based on the change in “results”

· 2.  Utility analysis , an examination of value of overall improvement in the performance of the trained employees. This method is complex and seldom used, and therefore is presented in  Appendix 9-2  for those interested.

Cost Savings Analysis (Results Focus)

The common types of costs associated with training programs were presented in  Chapter 5  Table 5-4 . These costs are compared with the savings that can be attributed to training. Let’s look again at  Table 5-4  on  page 151 .

Recall that the cost of training was $32,430. Now, determine how much is saved when training is completed. To perform this cost savings analysis, we must first determine the cost of the current situation (see  Table 9-7 ). The company averaged 90 grievances per year. Seventy percent (63) of these go to the third step before settlement. The average time required by management (including HR managers, operational supervisors, etc.) to deal with a grievance that goes to the third step is 10 hours. The management wages ($50 per hour on average) add $500 to the cost of each grievance ($50 × 10). In addition, union representatives spend an average of 7.5 hours at $25 per hour, for a cost of $187.50 per grievance. The reason for this is that the union representative wages are considered paid time, as stipulated in the collective bargaining agreement. The total cost of wages to the company per grievance is $687.50. The total cost for those 63 grievances that go to the third step is $43,312.50. The cost of training is $32,430.00.

TABLE 9-7 Cost Savings for Grievance Reduction Training

Costs of Grievances Pretraining Posttraining
Management Time (for those going to third step) 10 h per grievance 10 h × 63 grievances = 630 h 10 h × 8 grievances = 80 h
Union Rep’s Time (paid by management) 7.5 h per grievance 7.65 h × 63 grievances = 472 h 7.65 × 8 grievances = 60 h
Total Cost    
Management Time 630 h × $50 per h = $31,500 80 h × $50 per h = $4,000
Union Rep’s Time 472 h × $25 per h = $11,812.50 60 h × $25 per h = $1,500
Total $43,312.50 $5,500.00
Cost Savings    
Reduction in cost of grievances going to the third step   $43,312.50 − $5,500.00 = $37 812.50
Cost of training   −$32,430.00
Cost saving for the 1st year   $5,382.50
Return on Investment    
Calculating the ratio   $5,382.50/$32,430 = 0.166
Percent ROI   0.166 × 100 = 16.6%

The Return on Investment (ROI) is the investment minus the cost.

Training ROI = ( Total Savings Resulting from Training − Cost of Training )

For this example, then, the data show a $37,812.50 return on a $32,430 investment; ROI is therefore a $5,382.50 savings in the first year.

Many organizations are interested in the ratio of the return to the investment. For example investors in the stock market might set a 10 percent ROI as a goal. That would mean that the investment returned the principle plus 10 percent. The ROI ratio is calculated by dividing the “return” by the “investment.” For training this would translate to dividing the cost savings (return) by the training cost (investment).

Training ROI Ratio = ( Total Savings − Cost of Training ) / Cost of Training

To translate that ratio to a percentage you would multiply the ratio by 100.

The percent ROI = ROI Ratio × 100

In the grievances case, dividing the cost saving (total savings of 37,812.50 − cost of training which was 32.430.00 = cost savings of 5,382.50) by the investment (cost of training which was 32,430.00) produces an ROI ratio of 0.166. 38  If the ratio were 0, the training would break even. If the ratio were a negative number, the costs would be more than the returns to the company.

Multiplying the ratio by 100 provides the percent ROI. In this case, there is a 16.6 percent ROI for the first year. Most companies would be delighted if all their investments achieved this level of return. In addition, the nonmonetary benefits described earlier are also realized. Presenting this type of data to the corporate decision makers at budget preparation time is certainly more compelling than stating, “Thirty supervisors were given a five-day grievance reduction workshop.”

Many training departments are beginning to see the importance of placing a monetary value on their training for several reasons:39

· ● HRD budgets are more easily justified and even expanded when HR can demonstrate that it is contributing to the profit.

· ● HRD specialists are more successful in containing costs.

· ● The image of the training department is improved by showing dollar value for training.

9-2 Training in Action Reduction in Training Time: The Value of Demonstrating Value 41

This case occurred some time ago, but still has a valuable lesson. Alberta Bell of Edmonton, Alberta, was looking for ways to reduce the cost of its operations. Downsizing and cost cutting were necessary to meet the competition. One cost cutting decision was to reduce the entry-level training program for its customer service representatives from two weeks to one week. This would save money by reducing the cost of training and getting service representatives out “earning their keep” sooner.

The manager of training decided to assess the value of this decision. Using data already available, he determined that the average time necessary to complete a service call for those who attended the two-week program was 11.4 minutes. Those in the one-week program took 14.5 minutes. This difference alone represented $50,000 in lost productivity for the first six weeks of work. He further analyzed the differences in increased errors, increased collectables, and service order errors. This difference was calculated at more than $50,000. The total loss exceeded $100,000.

Obviously, when he presented this information to upper management, the two-week training program was quickly put back in place.

Recall Dave Palm from LensCrafters. Top management told him to demonstrate what they were getting in the way of “bang for the buck.” Well, he did, and the result was that his training budget was doubled. 40   Training in Action 9-2  is a similar example. Here, Alberta Bell demonstrated

the value of the training that prompted management not only to restore funding for the original training, but also to consider increasing it.

TABLE 9-8 Training Investment Analysis Work Sheet

Because of the time and effort required to calculate the value of training, many small business managers simply do not do it. However, assessing the value of training is not an exact science, and it can be done more easily by means of estimates.  Table 9-8 provides a simplified approach for small business. 42  As can be seen in the table cost-savings translates to revenue for the company. When estimates are necessary in completing this form, it is useful to obtain them from those who will receive the report (usually top management). If you use their estimates, it is more likely that your final report will be credible. Of course, larger organizations can also use this method.

When and What Type of Evaluation to Use

So, do we compute a comprehensive evaluation at all four levels in addition to a cost/benefit analysis for all training programs? No. To determine what evaluation should take place, ask the question, “Who is interested in these data?” The different levels of outcome evaluation are designed for different constituencies or customers. Note that in  Table 9-9  the trainer is interested in the first three levels, because they reflect most directly on the training. Other trainers might also be interested in these data if the results show some relation to their training programs. Training managers are interested in all the information. Both reaction and learning data, when positive, can be used to evaluate the trainer and also promote the program to others. When the data are not positive, the training manager should be aware of this fact because it gives the trainer information to use to intervene and turn the program around. The training manager’s interest in the transfer of training is to evaluate the trainer’s ability to promote the transfer. Care must be taken in using this information because many other factors may be present and operating to prevent transfer. Also, if transfer is favorable, the information is valuable in promoting the training program. These generalizations are also true for the organizational results. If the training manager is able to demonstrate positive results affecting the financial health of the company, the training department will be seen as a worthy part of the organization.

Trainees are interested in knowing whether others felt the same as they did during training. They are also interested in feedback on what they accomplished (learning) and may be interested in how useful it is to all trainees back on the job (behavior). A trainee’s supervisor is interested in behavior and results. These are the supervisor’s main reasons for sending subordinates to training in the first place. Upper management is interested in organizational results, although in cases where the results may not be measurable, behavior may be the focus.

Does the interest in different levels of evaluation among different customers mean that you need to gather information at all levels every time? Not at all. First, a considerable amount of work is required to evaluate every program offered. As with process data, it makes sense to gather the outcome data in some situations and not in others.

TABLE 9-9 Who Is Interested in the Outcome Data 43

  Outcome Data
  Reaction Learning Behavior Results
Training Department        
  Trainer Yes Yes Y N
  Other trainers Perhaps Perhaps Perhaps N
  Training manager Yes Yes Y Y
  Trainees Yes Yes Y Perhaps
  Trainees’ supervisor Not really Only if no transfer Y Y
  Upper management No No Perhaps Y

Again, the obvious question to ask in this regard is “What customer (if any) is interested in the information?” Although one of the major arguments for gathering the outcome data is to demonstrate the worth of the training department, some organizations go beyond that idea. In an examination of “companies with the best training evaluation practices,” it was noted that none of them were evaluating training primarily to justify it or maintain a training budget. 44  They evaluated (particularly at the behavior and results levels) when requested to do so by the customer (top management or the particular department). Jack Phillips, founder of ROI Institute, a consulting firm that specializes in evaluation, suggests that organizations only evaluate 5 to 10 percent of their training at the ROI level. 45  Which ones? The ones that are high profile and/or are specifically requested by upper management. This selectivity is a function of the cost in developing such evaluations, because these type of evaluations 46

· ● need to be customized for each situation,

· ● are costly and time consuming, and

· ● require cooperation from the customer.

Motorola, for example, evaluates only at the behavioral level and not at the results level. Executives at Motorola are willing to assume that if the employee is exhibiting the appropriate behavior, the effect on the bottom line will be positive. 47   Training in Action 9-3  shows how various companies are dealing with evaluation, particularly behavior and results.

9-3 Training in Action What Companies Are Doing for Evaluation 48

After years of not evaluating their training, the U.S. Coast Guard decided to evaluate at the behavioral level, asking trainees and their supervisors three things: How well the trainees were able to perform the desired behaviors, how often they did those behaviors, and how important those behaviors were to being an effective employee. With the information provided in the evaluations, trainers were able to remove outdated training objectives and add job aids for some less frequent behaviors. Furthermore, the remaining training was refined, became more relevant, and provided more efficiency. This translated into a $3 million a year savings for the training department of the Coast Guard.

Texas Instruments noted that once trainees left training, it was difficult to obtain transfer of training information from them. It was generally ignored because of the time and expense of gathering this information. Then, an automated e-mail system was developed through which trainees, after being back on the job for 90 days, were contacted and asked to complete a survey related to transfer. This system increased the use of evaluations, reduced the time necessary to gather information, and provided a standardized process. Texas Instruments noted an improvement in the quantity and quality of participant feedback. It would seem easy enough to include an e-mail to the trainees’ supervisors for the same purpose.

Century 21 decided to evaluate their sales training at the results level. After training, trainees were tracked through a sales performance system that identified the number of sales, listings, and commissions for each graduate. This was cross-referenced to the place they worked and their instructor. Findings were surprising. Trainees from certain offices outperformed trainees from other offices even though they had the same instructor. Examination of these results showed that the high-performing offices provided help when needed, had access to ongoing training, and had better support. To respond to this, Century 21 had its trainers still deliver the training but, in addition, was responsible for monitoring the environment in offices where trainees were sent. This monitoring was to see that every trainee was in an environment similar to that of the “high-performing trainees” identified earlier.

Booz Allen Hamilton, a consulting firm, recently decided to assess the ROI of its executive coaching program, which had been up and running for three years. The result? It was determined that the program’s ROI was about $3 million per year.

Certainly, all levels of data gathering are important at different times, and the training professional must be able to conduct an evaluation at every level. So, what and when should the trainer evaluate? The answer is that it depends on the organization and the attitudes and beliefs of upper management. If they perceive the training department as an effective tool of the organization and require only behavior-level evaluation, that is the evaluation to do.

However, this level still might require vigilance at the learning and reaction levels to ensure positive results. Darryl Jinkerson, director of evaluation services at Arthur Andersen, looks at the size and impact of the training before deciding how to evaluate it. Only those that are high profile, or for which the customer requests it, will be evaluated at the results level. 49  What if training is a one-time event and no desire is indicated to assess individual competence (e.g., a workshop on managing your career)? Such a situation provides simply no reason to evaluate. 50


We have discussed in detail the types of measures you can use to help determine if training has been effective. However, it is not as simple as that, because change might have occurred for reasons not related to the training. This is where designing an appropriate evaluation becomes so important.  Appendix 9-1  provides an examination of the various concerns in evaluation, such as those related to the validity (both internal and external) of the findings. It also provides several designs that can be useful to help assure you that your results are in fact valid.



· ● Utility questionnaire

*These key terms appear only in appendices 9-1 and 9-2.

Questions for Review

1.What is the relationship among the four levels of evaluation? Would you argue for examining all four levels if your boss suggested that you should look only at the last one (results) and that if it improved, you would know that training had some effect?

2.What is the difference between cost/benefit evaluation and cost-effectiveness evaluation? When would you use each, and why?

3.What is the difference between cost-effectiveness evaluation and utility analysis? When, if ever, would you use utility rather than cost-effectiveness? Why?

4.Assume that you were the training manager in the Westcan case (in  Chapter 4 ). How would you suggest evaluating the training, assuming they were about to conduct it as suggested in the case? Be as specific as you can.

5.Of all the designs presented in  Appendix 9-1 , which one would you consider to be most effective while also being practical enough to convince an organization to adopt it? If your design involved representative sampling, how would you accomplish it?


1. Examine the reaction questionnaire that your school uses. Is it designed to rate the course content or the instructors? Does it meet the requirements of a sound reaction questionnaire? Why or why not? Explain how you would improve it (if possible).

2. Break into small groups, with each group containing at least one member who previously received some type of training in an organization. Interview that person on what the training was designed to teach and how it was evaluated. Did the evaluation cover all the levels of outcomes? How did the trainee feel about the evaluation? Devise your own methods for evaluating each of the levels based on the person’s description of the training.


Go to the role-play for active listening in the Fabrics, Inc., example. In groups of five or six, choose someone to be the initiator and someone to be the trainee. Have them go through the role-play while the rest evaluate the trainee’s response on a scale of 1 to 7 (1 being poor and 7 being excellent). Now share your scores. Were they all exactly the same? If not, how could you make the instrument more reliable? If they were all the same, why was that? Is there anything you would suggest to make the evaluation process easier?


You run Tricky Nicky’s Carpet Cleaning Co., which cleans carpets for businesses. On average, one carpet cleaner can clean six offices per eight-hour shift. Currently, 100 cleaners work for you, and they work 250 days per year. Supervisors inspect carpets when cleaners notify them that the carpet is done. Because of Nicky’s “Satisfaction Guarantee,” when a carpet does not meet the standard, it is redone immediately at no extra cost to the client. A recent analysis of the rework required found that, on average, one in every six carpets cleaned does not meet Nicky’s standards.

The profit averages $20 a cleaning. You pay your cleaners $15 per hour. When you re-clean a carpet, it is done on overtime and you lose, on average, $20 in labor costs. On average, your profit is gone. In addition, there is an average cost of materials and equipment of $2.00 per office.

Your training manager conducted a needs assessment regarding this issue at your request. He reported that half the employees are not reaching the standard one in nine times, and the other half are not meeting the standard two in nine times, for an overall average of one in six [(1/9 + 2/9)/2 = 1/6]. The needs assessment also indicated that the cause was a lack of KSAs in both cases.

The training manager proposes a training program that he estimates will reduce everyone’s errors to 1 carpet in 12 (half the current level). The training would take four hours and could handle 20 employees per session.

The following costs reflect delivery of five training sessions of 20 employees each and assume 250 working days in a year.

Developmental Costs  
  20 days of training manager’s time for design and development at $40,000 per year $ 3,200
  Miscellaneous $  800
Direct Costs  
  4 hours per session at $40,000 per year (trainer) $  400
  Training facility and equipment $  500
  Materials $ 2,000
  Refreshments $  600
  Employee salaries at $20 per hour per employee (Nicky decides to do training on a Saturday and pay employees an extra $5 per hour as overtime) $ 8,000
  Lost profit (none because training is done on overtime) 0
Indirect Costs  
  Evaluation of training; 10 days of training manager’s time at $40,000 per year $ 1,600
  Material and equipment $  600
  Clerical support—20 hours at $10 per hour $  200

Case Questions

1.How much does the re-cleaning cost Nicky per year? Show all mathematical calculations.

2.If everyone is trained, how much will the training cost? How much will training cost if only the group with the most errors is trained? Show costs in a spreadsheet and all mathematical calculations.

3.If everyone is trained, what is the cost savings for the first year? If only the group with the highest re-cleaning requirements is trained, what is the cost savings for the first year? Show all mathematical calculations.

4.What is your recommendation for this training based on the expected return on investment? Should just the group with the most re-cleanings be trained or should both groups be trained? Provide a rationale for your recommendation that includes both the financial as well as other factors that may be important in making this decision. Show any mathematical calculations used.

5.Let’s back up and assume that employees had the KSAs needed to clean the offices effectively. What other factors might you look at as potential causes of the re-cleaning problem?

Web Research

Conduct a search of the Internet to identify eight distinct reasons for conducting an evaluation of training. Document the source of these reasons, and compare the list with reasons cited in the chapter.


Research Methods on the WWW—Questionnaires



Appendix 9-1 Evaluation: The Validity Issues

Once it is decided to evaluate training, it is important to be reasonably sure that the findings on the effectiveness of training will be valid. After all, evaluation is both time-consuming and costly.

Let’s say that Sue is sent to a one-week training seminar on the operation of Windows. According to the needs analysis, she clearly did not know much about how to operate a computer in a Windows environment. After training, she is tested, and it is determined that she has learned a great deal. Training was effective. Perhaps—but several other factors could also result in her learning how to operate in a Windows environment. Her own interest in Windows might lead her to learn it on her own. The question is: “How certain is it that the improvement was a function of the training that you provided?” In other words, does the evaluation exhibit internal validity? Once internal validity is ensured, the next question is “Will the training be effective for other groups who go through the same training?” That is, does training show external validity? We will deal with internal and external validity separately. These “threats” are not specific to training evaluation but relate to evaluation in general. When we discuss each of the threats, we will indicate when it is not a serious threat in the training context.


Internal validity  is the confidence that the results of the evaluation are in fact correct. Even when an improvement is demonstrated after training, the concern is that perhaps the change occurred for reasons other than training. To address this problem, it is necessary to examine factors that might compromise the findings; these are called threats to internal validity.


History  refers to events other than training that take place concurrently with the training program. The argument is that those other events caused learning to occur. Consider the example of Sue’s computer training. Sue is eager to learn about computers, so she buys some books and works extra hard at home, and attends the training. At the end of training, she demonstrates that she has learned a great deal, but is this learning a function of training? It might just as well be that all her hard work at home caused her to learn so much.

In a half-day training seminar, is history likely to be a concern? Not really. What about a one-day seminar or a one-week seminar? The more that training is spread across time, the more likely history could be a factor in the learning that takes place.


Maturation  refers to changes that occur because of the passage of time (e.g., growing older, hungrier, fatigued, bored). If Sue’s one-week training program was so intense that she became tired, when it came time to take the posttraining test, her performance would not reflect how much she had learned. Making sure that the testing is done when trainees are fresh reduces this threat. Other maturation threats can usually be handled in a similar manner by being sure that training and testing are not so intense as to create physical or mental fatigue.


Testing also has an influence on learning. Suppose the pretest and posttest of the knowledge, skills, and attitudes (KSAs) are the same test. The questions on the pretest could sensitize trainees to pay particular attention to certain issues. Furthermore, the questions might generate interest, and the trainees might later discuss many of them and work out the answers before or during training. Thus, learning demonstrated in the posttest may be a function not of the training, but of the pretest. In Sue’s case, the needs analysis that served as the pretest for evaluation got her thinking about all the material contained in the test. Then, she focused on these issues in training. This situation presents less of a validity problem if pretests are given in every case and if they are comprehensive enough to cover all of the material taught. Comprehensive testing will also make it difficult for trainees to recall specific questions.


Instrumentation  is also a concern. The problem arises if the same test is used in the pretest and posttest, as was already noted. If a different but equivalent test is used, however, the question becomes “Is it really equivalent?” Differences in instrumentation used could cause differences in the two scores. Also, if the rating requires judgments, the differences between pre- and posttest scores could be a function of different people doing the rating.

For Sue, the posttest was more difficult than the pretest, and even though she learned a great deal in the computer training, her posttest score was actually lower than the pretest, suggesting that she did not learn anything. If the test items for both tests were chosen randomly from a large population of items, it would not be much of a concern. For behavioral tests where raters make subjective decisions, this discrepancy may be more of a concern, but careful criteria development can help to deal with it.


Statistical regression  is the tendency for those who score either very high or very low on a test to “regress to the middle” when taking the test again. This phenomenon, known as regression to the mean, occurs because no test is perfect and differences result as a function of measurement error. Those who are going to training will, by definition, score low for the KSAs to be covered in training and so will score low on their pretest. The tendency, therefore, will be for them to regress to the mean and improve their scores, irrespective of training. In the earlier example, Sue did not know much about computers. Imagine that she got all the questions on the pretest wrong. The likelihood of that happening twice is very low, so on another test she is bound to do better.

This threat to internal validity can be controlled through various evaluation designs that we will discuss later. In addition, the use of control groups and random assignment (when possible) goes a long way toward resolving statistical regression.


Initial group differences can also be a concern. For example, in some cases, to provide an effective evaluation, a comparison is made between the trainees and a similar group of employees who were not trained—known as the  control group . It is important that the control group be similar in every way to the training group. Otherwise, the inherent differences between the groups might be the cause of differences after the training. Suppose that those selected for training are the up-and-coming stars of the department. After training, they may in fact perform much better than those not considered up and coming, but the problem is that they were better from the start and more motivated to improve. Therefore, if Sue is one of the highly motivated trainees, as are all her cohorts in training, they would potentially perform better even without training.

This problem does not arise if everyone is to be trained. The solution is simply to mix the two types, so both the group to be trained and the control group contain both types.


In this situation, those who did poorly on the pretest are demoralized because of their low score and soon drop out of training. The control group remains intact. As a result, the trained group does better in the posttest than the control group, because the poorer-scoring members left the trained group, artificially raising the average score. The opposite could occur if, for some reason, members of the control group dropped out.

This situation becomes more of a problem when the groups are made up of volunteers. In an organizational setting, those who go to training are unlikely to drop out. Also, all department members who agree to be in the control group are a captive audience and are unlikely to refuse to take the posttest. Although some transfers and terminations do occur to affect the numbers of participants, they are usually not significant.


When trainees interact with the control group in the workplace, they may share the knowledge or skill they are learning. For example, when Sue is back in the office, she shows a few of the other administrative assistants what she has learned. They are in the control group. When the posttest is given, they do as well as the trained group, because they were exposed to much of what went on in training. In this case, training would be seen as ineffective, when in fact it was effective. This would be especially true if certain quotas of trainees were selected from each department. When such sharing of information reduces differences between the groups in this way, determining the effectiveness of the training could be difficult.


When the control group and training group come from different departments, administrators might be concerned that the control group is at an unfair disadvantage. Comments such as “Why do they receive the new training?” or “We are all expected to perform the same, but they get the help” would suggest that the control group feels slighted. To compensate for this inequity, the managers of the control group’s department might offer special assistance or make special arrangements to help their group. For example, let’s look at trainees who are learning how to install telephones more efficiently. Their productivity begins to rise, but because the supervisors of the control group feel sorry for the control group, they help the trainees to get the work done, thereby increasing the trainees’ productivity. The evaluation would show no difference in productivity between the two groups after training is complete.


If the training is being given to one particular intact work group, the other intact work group might see this situation as a challenge and compete for higher productivity. Although the trained group is working smarter and improving its productivity, the control group works harder still and perhaps equals the productivity of the trainees. The result is that, although the training is effective, it will not show up in the evaluation.


The control group could believe that it was made the control group because it was not as good as the training group. Rather than rivalry, the response could be to give up and actually reduce productivity. As a result, a difference between the two groups would be identified, but it would be a function of the drop in productivity and not the training. Even if training were effective, the test results would be exaggerated.

These threats to validity indicate the importance of tracking the process in the evaluation. Just as data are gathered about what is occurring in the training, it is also useful to gather data about what is going on with the control group.


The evaluation must be internally valid before it can be externally valid. If evaluation indicated that training was successful and threats to internal validity were minimal, you would believe that the training was successful for that particular group. The next question is, “Will the training be effective for the rest of the employees slated to attend training?”  External validity  is the confidence that these findings will generalize to others who undergo the training. A number of factors threaten external validity.


If the training is evaluated initially by means of pre-and posttests, and if future training does not use the pretest, it can be difficult to conclude that future training would be as effective. Perhaps those in the initial training focused on particular material, because it was highlighted in the pretest. If the pretest is then not used, other trainees will not have the same cues. The solution is simple: Pretest everyone taking the training. Remember that pretest data can be gathered during the needs analysis.


Suppose that a particular program designed to teach communication skills is highly effective with middle-level managers, but when a program with the same design is given to shop-floor workers, it does not work. Why? It might be differences in motivation or in entering KSAs, but remember that you cannot be sure that a training program that was successful with one group of trainees will be successful with all groups. Once it is successful with middle managers, it can be assumed that it will be successful with other, similar middle managers. However, if it is to be used to train entry-level accountants, you could not say with confidence that it would be successful (that it had external validity) until it was evaluated.

One of the authors was hired to assist in providing team skills to a large number of employees in a large manufacturing plant. The first few sessions with managers went reasonably well; the managers seemed to be involved and learned a great deal. After about a month, training began for the blue-collar workers, using the identical processes, which included a fair amount of theory. It soon became evident that trainees were bored, confused, and uninterested. In a discussion about the problem, the project leader commented, “I’m not surprised—this program was designed for executives.” In retrospect, it is surprising that lower-level managers received the training so well, given that it was designed for executives.


In many situations, once the training is determined to be effective, the need for further evaluation is deemed unnecessary. Thus, some of the trainees who went through the program were evaluated and some were not. The very nature of evaluation causes more attention to be given to those who are evaluated. Recall the Hawthorne Studies that indicated the power of evaluation in an intervention. The Hawthorne Effect is explained by the following:1

· • The trainees perceived the training as a novelty;

· • The trainees felt themselves to be special because of being singled out for training;

· • The trainees received specific feedback on how they were doing;

· • The trainees knew they were being observed, so they wanted to perform to the best of their ability; and

· • The enthusiasm of the instructor inspired the trainees to perform at a high level.

Whatever the mechanism, those who receive more attention might respond better as a function of that attention. As with the other threats to external validity, when the way groups are treated is changed, the training’s external validity is jeopardized.


In clinical studies, a patient receives Dose A. It does not have an effect, so a month later she receives Dose B, which does not have an effect, so she receives Dose C and is cured. Did Dose C cure her? Perhaps, but it could also be that it was the combination of A, B, and C that resulted in the required effect. The use of multiple techniques could influence training when some component of the training is changed from one group to the next. For example, a group received one-on-one coaching and then video instruction. The members did poorly after receiving the coaching but excelled after receiving the video instruction, so video instruction became the method used to train future employees. It was not successful, however, because it was the combination of coaching and video instruction that resulted in the initial success.


It is useful to understand the preceding issues to recognize why it is difficult to suggest with certainty that training or any other intervention is the cause of any improvement. We cannot be absolutely certain about the internal or external validity when measuring things such as learning, behavior, and organizational results. Careful consideration of these issues, however, and the use of well-thought-out designs for the evaluation can improve the likelihood that training, when shown to be effective, is in fact effective (internal validity) and will be effective in the future (external validity). This information is useful for assessing training and, equally important, for assessing evaluations done by outside vendors.

Evaluation Design Issues

A number of texts provide excellent information on appropriate designs for conducting evaluations. 2  Unfortunately, many of their recommended designs are impractical in most organizational settings. Finding the time or resources to create a control group is difficult at best. Getting approval to do pretests on control groups takes away from productive time and is difficult to justify.

Scientifically valid research designs are difficult to implement, so organizations often use evaluation designs that are generally not acceptable to the scientific community. 3  However, it is still possible to have some confidence in the results with less rigorous designs. Some research designs are less than perfect, but it is possible to find ways of improving them. The two designs most often used, and most criticized by scientists, are the posttest-only and the pretest/posttest methods. 4

Basic Designs


The posttest-only method occurs when training is followed by a test of the KSAs. The posttest-only design is not appropriate in some instances. At other times, however, the method is completely acceptable. 5  The two possible goals of evaluation are to determine

· 1. whether change took place and

· 2. whether a level of competence was reached.

If the goal of the training is the latter, a posttest-only design should suffice. If, for example, legal requirements state that everyone in the company who handles hazardous waste be trained to understand what to do in an emergency, then presumably this training needs only to provide a test at the end to confirm that all trainees reached the required level of knowledge. As more companies are required to be ISO 9000 (or equivalent) certified, it will be increasingly important to prove that employees possess the required skills. As a result, certification will become the goal of employee training, and in that case the posttest-only will suffice.

We frequently mention the value in doing a needs analysis. Conducting a needs analysis provides pretest data, making the posttest-only design moot. Giving the posttest automatically applies a pretest/posttest design. Furthermore, in the absence of a TNA, archival data may serve as the pretest. Performance appraisals, measures of quality, and the like might allow for some pre/post comparison. Although such historical data may not be ideal, it could provide some information as to the effectiveness of training. Alternatively, it is possible to identify an equivalent group and provide its members with the same posttest, thereby turning the design into a posttest only with control group. Suddenly, a much more meaningful design is created.

The posttest-only design as it stands is problematic for assessing change. A number of other competing causes could be responsible for the change such as history, maturation, instrumentation, selection and mortality. Nevertheless, we would agree with other professionals that any evaluation is better than none. 6  Gathering any pretraining information that might suggest that the level of KSAs before training was lower than in the posttest would help to bolster the conclusion that training was effective.


The pretest/posttest design is the other method organizations frequently use. Here, a pretest is given (T1), training is provided (×), and then a posttest is given (T2). This design is expressed as T1 × T2.

This design can demonstrate that change has occurred. But even though it can be demonstrated that KSAs have changed, it is not possible to say that training is responsible for those changes. There are several threats to internal validity (history, maturation, testing, instrumentation, and possibly regression). For example, you might have been training a group of machine operators to operate new drill-press machines. Pretesting the trainees revealed that none knew how to operate the machine. After a three-day training session, a posttest showed that, on average, the trainees could operate the machine correctly 85 percent of the time. A big success? Not if the supervisor of the work group says that the ones without training can operate the machines correctly 95 percent of the time by just reading the manuals and practicing on their own. Several different reasons might explain why those who did not go to training are performing better on the job. Perhaps they already knew how to operate the machine. Perhaps a manufacturer’s representative came and provided on-the-floor training to them. Or, it could be that your training somehow slowed down the learning process. And there is still the issue of external validity where testing selection and possibly reaction to evaluation are cause for concern. Therefore, it would be useful to have a control group.

In many instances, using a control group is simply not an option. Does that mean that the trainer should not bother to do anything? Absolutely not! In fact, it is better to do something than nothing. We tend to focus on the negative aspects of the preexperimental designs rather than to examine ways of using them most effectively when other options do not exist. 7  The pre- post-no-control-group at least establishes that changes did take place. History can be examined through some detective work. Recall that Sue had learned a great deal about operating in a Windows environment according to the pretest/posttest. Did she do extra reading at home? Did she practice on her own irrespective of training expectations? Did she get some help from someone at the office or elsewhere? Simply asking her might indicate that none of those factors occurred, suggesting that it was in fact the training. This process may be particularly relevant for the small business, where size makes it easier to identify potential threats.


Another way of dealing with the lack of a control group is to use the  internal referencing strategy (IRS) . 8  With this method, include both relevant and nonrelevant test questions in the pre- and posttest. Here’s how it works.

Both pretests and posttests contain questions that deal with the training content and questions that deal with related content not in the training. In the pretest, trainees will do poorly on both sets of questions. In the posttest, if training is effective, improvement should only be shown for the trained items. The nonrelevant items serve as a control. In their research on the IRS, Haccoun and Hamtiaux noted that the results obtained from the IRS design were identical to those obtained when a control group was used. 9  This method deals with many of the concerns that arise when a control group is used, and with several other concerns. Many of the threats to internal validity do not exist with the IRS because, with no control group to react in an inappropriate manner, issues such as diffusion of training, compensatory treatment, and compensatory rivalry are not a concern. The only threats are history, maturation, testing, statistical regression, and instrumentation.

As previously noted, history can be investigated through examination of the time frame in which training has occurred. Any events that potentially affected the trainees could be assessed as to their effect. Also, given that the relevant and nonrelevant items are similar in nature in the IRS, any historical event should affect both types of items in a similar manner. Maturation issues can be dealt with by ensuring that the training is designed to keep trainees interested and motivated, and to prevent them from becoming tired or fatigued. The reactive effect of testing can be dealt with if parallel tests are used. Parallel tests cover the same content but do not use identical questions. This technique does lead to another potential problem (instrumentation) that can be addressed. If all trainees receive a comprehensive pretest, then instrumentation is not an issue.

Instrumentation is a concern if two different tests are used. If a large pool of items is developed from which test items can be chosen at random, the result should be equivalent tests. Once again, it is important to note that in any evaluation, we can never be 100 percent sure that training has caused the improvement. We are not suggesting that this design take the place of more stringent designs when they are practical. It is appropriate, however, when the alternative is posttest-only or nothing. Again, some control is better than none at all.

One final note: The IRS design can be used to determine improvement in KSAs, but research indicates that it tends to show that training is not effective when, in fact, it is. 10  In other words, the training must provide a substantial improvement from pretest to posttest for it to be detected by this design.


Two factors need to be considered when developing a sound evaluation design:

· 1. Control groups

· 2. Random assignment

The  control group  is a group of similar employees who do not receive the training. The control group is used to determine whether changes that take place in trainees also take place for those who do not receive training. If change occurs only in the trainees, it is probably a result of training. If it occurs in both trained and untrained groups, it is probably a result of some other factor.

Random assignment  is the placement of employees in either the control group or the training group by chance, to ensure that the groups are equivalent. Random assignment is more applicable to experimental laboratories than to applied settings (such as in training) for two reasons. First, given the small number of employees placed in one group or the other, the theory of randomness is not likely to hold true. When we split a group of 60 employees into two groups of 30, it is quite likely that real differences will be present within the two groups. Random assignment works well when multiple groups of 30 are used, or when the total number of subjects is quite large (e.g., 500).

Second, it is unlikely that the organization can afford the luxury of randomly assigning employees to each group. The work still needs to be done, and managers would want some control over who will be in training at a specific time. For this reason, finding the best match of employees is important so that the control group contains a sample representative of employees who are in the training group.  Representative sampling  is matching employees in the control group and training group on factors such as age, tenure, and education to make the groups as equivalent as possible. The following discussion covers several designs that use control groups. We believe that assigning trainees through representative sampling is a more effective way of obtaining equivalent groups.


The following represents posttesting only with a control group:

Trainee Group  ( representative sampling ) × T 2 Control Group  ( representative sampling )  T 2

This design and the following one are equivalent in that they deal effectively with all internal validity issues.

If for some reason a pretest was not conducted or if the trainer did not provide a pretest to a control group at the beginning of training, the trainees can be compared with a control group using a posttest-only design. Differences in test scores noted between the groups, if trainees do better, provide evidence of the success of the training. The tendency is to downplay the effectiveness of this design, because no pretest assessed the equivalence of the groups before training. But if representative sampling has resulted in the groups being equivalent, there is no need to have a pretest. Of course, there is greater confidence regarding the equivalence of the groups if there was a pretest.


The expression for pretest/posttest with a control group is as follows:

Trainee Group  ( representative sampling )  T 1 × T 2 Control Group  ( representative sampling )  T 1  T 2

This design is one of the more favorable for eliminating threats to internal validity. Recall that we do not use random assignment in dividing the groups. So, how equivalent are they? A pretest can determine their level of equivalence. Equivalent pretests in both groups provide you with one more piece of evidence that the groups are equal, and post-test differences (if the trained group obtains higher scores) will suggest that training was successful.


The time series design is represented by:

Trainee Group  T 1 T 2 T 3 T 4 × T 5 T 6 T 7 T 8

This design uses a series of measurements before and after training. In this way, the likelihood of internal validity threats such as testing or regression to the mean is minimized. Also, when everyone attends training at the same time (a one-shot training program), this design can be used whether the number is large or small. In such a case it could still be argued that with no control group, there are alternative reasons for any change. But in an applied setting, the goal is to be as sure as possible about the results, given organizational constraints. If enough measures are taken pre- and posttraining to deal with fluctuations in performance, changes after training are certainly suggestive of learning. Remember that in an applied setting, there will never be absolute certainty regarding the impact of training, but taking care to use the best possible design (considering constraints) is still better than doing nothing at all.

To make this design more powerful, consider adding a control group, expressed by:

Trainee Group  T 1 T 2 T 3 T 4 × T 5 T 6 T 7 T 8 Control Group  T 1 T 2 T 3 T 4  T 5 T 6 T 7 T 8


Multiple baseline design is represented by:

Trainee Group A T1 T2 T3 × T4 T5 T6 T7 T8 T9 T10
Trainee Group B T1 T2 T3 T4 T5 × T6 T7 T8 T9 T10
Trainee Group C T1 T2 T3 T4 T5 T6 T7 × T8 T9 T10
Trainee Group D T1 T2 T3 T4 T5 T6 T7 T8 T9 × T10

In this design, multiple measures are taken much as in time series, but each group receives the training at a different time. Each untrained group serves as a control for the trained groups. This approach deals with many of the concerns when no control group is used. Here the ability to say that changes measured by the test are a result of the training is strong. If each group improves after training, it is difficult to argue that something else caused the change.

Choosing the Design to Use

Determining the true effect of training requires an investigation into the validity of evaluation results. Several methods are available, and the more complex the design, the more valid the results. There are other considerations when you are deciding on an evaluation design. Innovation can provide good substitutes when the best is not possible. Consider the multiple baseline design. It is a powerful design and certainly is a possibility if several employees need to receive the training over time.

However, what if multiple measures are not possible? The following design would address many of the same concerns, and although it is not as elaborate, it certainly deals with many of the concerns regarding outside influences causing the change. If pretest scores are all comparable and posttest scores indicate an improvement, these results are a strong argument for showing that training was responsible.

We have already mentioned that most organizations do not evaluate all training at all levels. Furthermore, even when evaluating training, many organizations do not use pretest/posttest or control groups in a manner that would eliminate concerns about the validity of the results.

Dr. Dixon of George Washington University indicated that, of the companies she investigated in her article “New Routes to Evaluation,” only one used designs that would deal with many of the validity issues. Other companies, including IBM and Johnson Controls, follow such procedures only when asked by particular departments or higher-level management, or when they can defray some of the high cost of developing reliable and valid tests by marketing the final product to other organizations. 11  The demand for certification in some skills (primarily because of ISO and others’ requirements) created a need for these types of tests.

When you are evaluating training, if using control groups or pretesting is not possible, remember that other investigative methods can be used for assessing the likelihood that factors other than training account for any change in KSAs.

What About Small Business?


We noted in  Chapter 9  that, for a small business, it is sometimes easier to infer cause and effect between training and outcomes. We also noted, however, that it is also useful at times to consider evaluation to ensure that training is having its effect on employee behavior. But traditional evaluation designs are very difficult to apply to a small business. So, is there an alternative? Consider the  single-case design . It is often used to evaluate the training provided to professional counselors. But managers can also use this method when the number of employees is small. 12

The single-case design uses data from one individual and makes inferences based on that information. To increase confidence in the results, use the multiple baseline approach. Suppose that two supervisors need to be trained in active listening skills. Because the business is small, both cannot attend training at the same time. Using a predetermined checklist developed for evaluating the training, count the number of active listening phrases that each of them uses while talking to you. Take several measures over three or four weeks, then send one supervisor to training. Continue monitoring the active listening after the person returns. Did the number of active listening phrases increase for the trained supervisor and not the other supervisor? Now give the second supervisor training, and afterward, continue monitoring the conversations. If both employees improved after training, it can be inferred that the training was effective. Although this approach is suggested for the small business, it is also useful in any organization when only a few employees need to be trained.

Appendix 9-2 Utility Analysis

In the example in  Table 9-7  on  page 330  using the cost-saving method of evaluation, training supervisors in grievance handling reduced the total number of grievances by 50 percent and the number going to the third step from 63 to 8. In this example, we calculated only the cost savings related to the change in third-step grievances.  Utility analysis , however, permits us to estimate the overall value to the organization of the supervisors’ changes in behavior. In other words, if those trained are better performers, on average, and better performers are worth more in dollar terms, utility analysis allows us to estimate that increased worth. A general approach to utility is as follows:1

Δ U = ( N ) ( T ) ( D T ) ( SD Y ) − C


Δ U = dollar value of improved performance N = number of trainees T = time the benefits will last D T = difference in performance between trained and untrained groups (in standard deviation units) SD Y = dollar value of untrained group’s performance (in standard deviation units) C = total cost of training the trained group

Some of the variables in the equation can be measured directly, whereas others must be estimated. For example, NC, and DT can be determined objectively. However, determining how long the benefits will last is really an estimate that will be more or less accurate, depending on the estimator’s experience with training and the types of employees involved. Calculating the dollar value of the untrained group’s performance falls somewhere in between. It is relatively easy to determine the compensation costs. However, it is often more difficult to translate their actual performance into dollar amounts. Recall our third-step grievance example. Even though we know what a third-step grievance costs in management labor compensation, we do not know the impact of those third-step grievances on the productivity of the work unit or the quality of the product/service. What to include in determining the dollar value of performance becomes a subjective decision. The final result will be an estimate of the value of the increased performance in dollars. Using the same example, an analysis of the possible utility is presented in  Table 9-10.

Utility analysis is complex and beyond the scope of this text; what has been presented here is just a taste of that complexity. More complex models account for even more factors that might affect the true financial value of training outcomes. 2 The purpose here is to demonstrate the difficulties of getting a true picture of the total financial benefits associated with training outcomes. However, these complexities exist for any area of the business when you try to determine the effects of change. By becoming more quantitative in the assessment and description of training outcomes, training managers can put themselves on an equal footing with other managers in the organization.

Although utility analysis has been around for quite some time it does not seem to have caught on in industry. Lori Fairfield, Editor of Training magazine, notes that in their survey “Industry Reports,” she has yet to see utility analysis as a write in where the survey asks for any method used for evaluation that was not an option in the survey. Furthermore, Jack Phillips, of the ROI Institute indicated that the only time he has found this method to be used in an organization is when a PhD student is using it for a dissertation. Dr. Phillips also noted that when he talks to executives about evaluation and mentions utility analysis as one option, they often suggest that it looks like “funny money.” This latter comment may explain why some research has concluded that using utility analysis to bolster the claim as to the value of a project actually decreased managerial support for the project. 3  Until it is clear why this tendency is the case, it might not be wise to use this particular type of analysis to sell a project.

TABLE 9-10 Calculation of the Utility of the Grievance Training 4

Formula: ΔU = (N)(T)(DT)(SDY) − C

N = 30 T = 1  year  ( an overly conservative estimate ) D T = 0.2 = X t − X uSD ( ryy )  X t = average job performance of the trained supervisors X u = average job performance of the untrained supervisors SD = standard deviation of job performance for the untrained supervisors ryy = reliability of job performance measure

DT is a measure of the improvement (in standard deviation units) in performance that trained supervisors will exhibit. Although obtaining the data is time-consuming (collecting the performance appraisal data for supervisors, trained and untrained), the calculations can be done easily on using a computer.

SD Y = $14,000 = 0.40 × $35,000

The equation assumes average salary of $35,000. The 0.40 comes from the 40 percent rule, which is a calculation based on 40 percent of the average salary of trainees. This rule comes from the Schmidt and Hunter research. This and other methods to calculate SDY can be found in Cascio (1991). According to the preceding information, the utility of the training based on this formula is

( 30 ) ( 1 ) ( 0.2 ) ( 14,000 ) − 32,020 = $51,980

Do you need a similar assignment done for you from scratch? We have qualified writers to help you. We assure you an A+ quality paper that is free from plagiarism. Order now for an Amazing Discount!
Use Discount Code “Newclient” for a 15% Discount!

NB: We do not resell papers. Upon ordering, we do an original paper exclusively for you.

Buy Custom Nursing Papers