A Clear, Coherent and Comprehensive View on Program Evaluation


The purpose of this session long project is to present a clear, coherent, and comprehensive view of program evaluation. In addition, it is intended to be sensible and practical. Since evaluation work so often involves teachers, administrators, and community members in the planning, execution, or reporting of results, it is important that the general form of program evaluation be understandable, appear reasonable, and easy to accomplish in a hypothetical situation with all of the attendant problems of a natural setting.

The formulation of a view of program evaluation that meets the above requirements is no easy task. It is not that there are no guides to help one. Over the past fifty years, a number of individuals have proposed how educational endeavors should be evaluated. It is not within the boundaries of this project to attempt to describe and compare all or even some of the models; that would probably be more confusing than enlightening.

While discussion of various evaluation models is beyond the scope of this project, some of the ways in which they differ can be noted to assist in formulating a view of program evaluation that attempts to transcend, take account of, and, in some way, accommodate these differences.


The evaluation of educational treatments mentioned in the NSF Program grant, whether they be units of instruction, courses, programs, or entire institutions, will require the collection of five major classes of information. Each class of information is necessary and should be sufficient. In setting forth these five major classes, it is recognized that some may be of little or no interest to the program objectives. What is to be avoided is the possibility of omitting any class of information important for determining the worth of a program. Thus, the framework allows for possible errors of commission, for example, collecting information that will have little bearing on the determination of the worth of a program while avoiding errors of omission that may lead to a failure to gather information that may be important. There are two reasons for this position. First, if a particular class of information turns out to be unimportant, it can simply be disregarded. Second, if it is known in advance that a particular class of information will have little bearing on the outcome of the evaluation, it simply need not be gathered. This does not reflect on the framework per se but only on the inappropriateness of a part of it in a particular situation. On the other hand, failure to gather information about particular relevant aspects of a grant award indicates a faulty evaluation effort and should be avoided.

The first class of information relates to the initial status of learners. It is important to know two things about the learners at the time they enter the program: who they are, and how proficient they are in Mathematics in regard to what they are supposed to learn. The first subclass of information, who the learners are, is descriptive in nature and is usually easily obtained. Routinely, one wants to know the age, sex, previous educational background, and other status and experiential variables that might be useful in describing or characterizing the learners. It should be useful in interpreting the results of a program and, more important, serve as a description of the learner population. If subsequent cohorts of learners are found to differ from the one that received the program when it was evaluated, then it may be necessary to modify the program to accommodate the new groups.

The second subclass of information, how proficient the learners are with regard to what they are supposed to learn, is more central to the evaluation. Learning is generally defined as a change in behavior or proficiency. To demonstrate learning, it is necessary to gather evidence of performance at, at least, two points in time: (1) at the beginning of a set of learning experiences and (2) at some later time. Gathering evidence about the initial proficiencies of learners furnishes the necessary baseline information for estimating, however crudely, the extent to which learning occurs during the period from start to finish. (Determining whether the learning occurred as a direct result of the program or was due to other factors is an issue that will not be addressed in this session long project) A related reason for determining the initial proficiency level of learners’ stems from the fact that some programs may seriously underestimate initial learner status with regard to what is to be learned. Consequently, considerable resources may be wasted in teaching learners who are already proficient. Mere end-of-program evidence gathering could lead one to an erroneous conclusion of program effectiveness when what had actually happened was that already developed math proficiencies had been maintained.

The second major class of information required in this program evaluation relates to learner proficiency and status after a period of instruction. The notion here is that educational programs are intended to bring about changes in learners. Hence, it is critical to determine whether the learners have changed in the desired ways. Changes could include increased knowledge, ability to solve various classes of problems, ability to deal with various kinds of issues in a field, proficiencies in certain kinds of skills, changes in attitudes, interests and preferences, etc. The changes sought will depend on the nature of the program, the age- and ability-levels of the learners and a host of other factors. Whatever changes a program, curriculum, or institution seeks to effect in learners must be studied to determine whether they have occurred and to what extent. The only way this can be done is through a study of learner performance.
The third class of information collected in this program evaluation centers on the educational treatment being dispensed, whether it is a course, program, curriculum or an entire institutional setting. At the very least, one needs to know whether the programs were carried out? If so, to what extent? Did the programs get started on time? Were the personnel and materials necessary for the program available from the outset or, as has been the case in a number of externally funded programs, did materials and supplies not arrive until shortly before the conclusion of the program? Questions regarding the implementation of the intended program may seem trivial but, in fact, are critical. Often it is simply assumed that the program was carried out on schedule and in the way it was intended.

The fourth major class of information is costs. The range of available treatments – in the form of units of instruction, courses, programs, curricula, and instructional systems – have widely varying costs. These need to be reckoned so that administrators and educational planners, as well as evaluation workers, can make intelligent judgments about educational treatments. Not only must direct costs be reckoned, for example, the cost of adoption, but indirect costs as well. Costs of in-service training for teachers who will use a new program, for example, must be determined if a realistic estimate of the cost of the new program is to be obtained.

The fifth class involves supplemental information about the effects of a program, curriculum, or institution and is composed of three subclasses. The first includes the reactions, opinions, and views of learners, teachers, and others associated with the program being evaluated. The latter could be administrators, parents, other community members, and even prospective teachers. The purpose of gathering such information is to find out how an educational program is viewed by various groups. Such information is no substitute for more direct information about what is actually being learned, but it can play a critical role in evaluating the overall worth of a program in a larger institutional context.

The framework for program evaluation presented above sets forth the major classes of information required for a comprehensive evaluation of the granted program. Detailed procedures for the collection of each class of information are presented in subsequent sections along with information about the analysis and interpretation of evaluative information and the synthesis of results into judgments of worth.

Information about learner performance must be obtained on at least two occasions – before instruction gets seriously underway, and after a specified period of instruction. The phrase “before instruction gets seriously underway” indicates that an estimate of learner performance before a program begins is important. Specifically, educators not only need to know that learners are not already proficient in what they are expected to achieve, some estimate of how unaccomplished they are is also required. To this end measures of achievement, for example, objective written tests, essay tests, etc., could be administered during the first few sessions that learners are together.

The administration of such measures at the beginning of a set of program experiences is a somewhat delicate undertaking: teachers and learners will not have developed a working relationship yet, and learners presumably will be answering questions and items covering unfamiliar material. One commonly used procedure is to include in the test directions a statement acknowledging the learners’ general lack of familiarity with the material, but requesting them to try to do their best. It can also be announced that the resulting information will be of assistance in planning specific learning experiences for the group. Often, assurance that these test performances will not influence students’ grades is given; however, this may prevent individuals from putting forth their best effort in answering various questions and items.
Information relating to program outcomes can be gathered during the beginning sessions of a course, program, or curriculum. For several reasons information about nonintellectual outcomes should probably not be gathered until somewhat later, perhaps three to four weeks after a program has started. First, learners may be reluctant to respond frankly to measures dealing with attitudes, feelings, and interests. Rather than run the risk of obtaining incorrect information or no information at all, it seems wise to hold off a little until some relationship has been established. Second, some measures, notably those dealing with social relationships, cannot be administered until a group of learners has been together for a period of time. For example, comparison-measures cannot be administered until a group has been together long enough for social relationships to develop.


The use of procedures that require each learner to respond to only a fraction of the total amount of evaluation material has several important consequences. One, the scope of the evaluation need not be unduly limited because of time restrictions. In fact, it should be possible to gather full information with regard to all objectives without making excessive demands either on teachers and learners or the time allotted for instruction. Two, through judicious allocation procedures, the same set of instruments for gathering information about learner performance may be used before instruction gets underway and after a period of instruction. If the total amount of testing material has been divided at random into fourths, a learner can be asked to respond to one set of math exercises before instruction begins and a different set after a period of instruction. In this way, a single set of math material could be used several times. Furthermore, no individual would be required to respond to the same material more than once.

An issue in the gathering of evidence about learner performance in program evaluations centers on the use of questionnaires, exercises, tests, scales, and so on. This would include the use of published instruments as well as standardized tests. Opinion in this area varies greatly. On the one hand, some will argue that only program-produced measures, tailored to the objectives of a particular objective, can be used with confidence. The use of externally developed measures can be exceptionally unfair if they measure proficiencies not emphasized in a particular program.


Further inquiry showed that the teachers were not sufficiently knowledgeable about prime numbers to feel comfortable teaching them and, consequently, omitted the topic from their classroom instruction. The remedy for this situation was simple. A single workshop on prime numbers was conducted for the teachers in the district. Not surprisingly, learner performance on problems involving prime numbers on the end-of-grade mathematics test improved markedly the next year. The learners, in fact, performed as well on that topic as they did on the others. Clearly, it is vital to know what has occurred in the teaching-learning situation in order to be able to properly interpret learner-performance information with regard to the instructional objectives.

Program evaluation involves more than the assessment of learner performance in terms of objectives. One of the additional elements in this program evaluation is an examination of the relationship between objectives, learning experiences, and appraisal procedures. Judgments need to be made about the relationships among these three elements of the educational process. For instance, are the learning experiences consistent with the stated objectives? What is the relationship between the learning experiences and the evaluation procedures? Answers to these questions require direct information about the learning experiences.


The first way to obtain information about the extent of program implementation will involve the use of observational procedures. Information about aspects of learner performance is not being sought; observational procedures are focused on learning activities and instructional materials. Ideally, the observer would be present at all times to observe and record what occurs in the learning situation: the school atmosphere and conduct of activities, the groupings of learners, the kinds of learning materials used and the ways, the time devoted to various activities, and so on. Such information, systematically recorded, could furnish one basis for sound evaluative judgments.

Weekly, biweekly, or even monthly visits can be made to programs, and observation periods of one-half hour to several hours in duration are clearly within the realm of possibility. Exactly how much observational work will be undertaken would be determined to a large extent by available resources. Given a limited amount of resources, the most productive deployment would probably be to have a substantial number of periods of observations, each of limited duration, rather than a small number of observations of extended duration. If resources were available for ten hours of observation for a course lasting an academic year, it would be preferable to hold forty observations of fifteen minutes each than to have ten one-hour observation periods spread throughout the year. Each short observation period yields fresh information; all other things being equal, the more observations the better.

The second way of gathering information about the way that the course, program, or curriculum is being implemented is through the use of teacher reports. These will be in the form of teacher lesson plan books, supplemented by elaborating comments or log/diaries that teachers submit on a regular basis. The latter are intended to serve as records of what happened in the learning situation and to preserve teachers’ comments on those events.
The principal ways the program evaluation will gather information about implementation is: (1) observations by evaluation personnel; (2) teacher reports in the form of plan books and, sometimes, logs and diaries; and (3) student reports on activities and features of the learning situation. Each type of data has its advantages and disadvantages. All three, when systematically gathered, analyzed, and properly interpreted, should provide sufficient information to describe adequately how the program was actually carried out.


1. Did the strategies used justified and did the changes implemented in the strategies match the objectives of the program?
2. Were the appropriate staff members hired and trained?
3. Was the timeline for the grant followed effectively?
4. Was a management plan developed and followed?
5. Are the students moving towards their anticipated goals?
6. Did the community become involved in this program?
7. Were the experiences by parents and community positive?
8. Was the project a success?
9. Did the project meet the goals it set in the beginning?
10. What parts of the program were most effective?
11. Were the results worth the projects estimated cost?
12. Can this project be replicated and used by other communities?


Of the many different types of data collection available for research, the program evaluation includes an evaluation design matrix. The evaluation matrix includes data collection approaches: focus group, questionnaire, survey, assessment performance, interview, and observation. Focus groups are useful measures of evaluation because it brings together a number of persons from the surveyed population to discuss topics relevant to the evaluation. Focus groups are considered inexpensive and are effective with an active moderator. Questionnaires are typically multiple choice questions or true-false questions geared at ascertaining answers to specific evaluation questions. Similar to questionnaires are surveys, which are useful for obtaining information about opinions and attitudes of subjects. Surveys are also useful for the collection of personal and background issues such as race or gender. Surveys are different than questionnaires because they offer wider ranging but less detailed data.

#1 Survey Teachers End
#2 Observation Administrators On-going
#3 Survey Administrators End
#4 Focus Group Administrators End
#5 Observation Teachers 4 times
#6 Survey (phone) Teachers End
#7 Questionnaire Parents End
#8 Questionnaire Administrators End
#9 Interview Teachers End
#10 Interview Teachers End
#11 Performance Assessment Administrators End
#12 Focus Group Teachers End


Cain, J.S. (2002). An Evaluation of the Connected Mathematics Project. Journal of
Educational Research, 95(4), 224-235.

Muraskin, Lana D. (1993) Understanding Evaluation: The Better way to better
prevention programs. Westat, Inc. (1-55).

Northwest Regional Educational Laboratory (2001) “A User’s Guide for National Service
Programs” (1-2).

Sanders J. R & Worthen B. R. (1987) Evaluation Planning Worksheet. Educational
Evaluation, New York, NY (6-10).

Taylor-Powell, Ellen & Thakur, Virendra (Jan. 2003) Statewide Program Planning: An
evaluation of the 1999 process. University of Wisconsin-Extension, Program
Development and Evaluation.

Warren, P.B., & Curley, R.G. (2001). Evaluation, Accountability and Audience: How
Demands for Accountability Undercut Program Development. Education. 118(4),

Evaluation Plan
Award Title: Deepening Everyone’s Mathematics Content Knowledge; Mathematicians, Teachers, Parents, Students, & Community.
Award Number: 0227603
Program Manager: James Hamos
Sponsor: University of Rochester, 517 Hylan Bldg. Rochester, NY 14627

Leave a Reply

Your email address will not be published. Required fields are marked *

seven + = 15