19 Finding and Using Evidence

Perhaps the first question, in our search for proof, should be – “What do we want to prove?” An important area of logic is concerned with tests of truth – the criteria used to distinguish truth from error. A criterion of truth is a standard, or rule, by which to judge the accuracy of statements and opinions; thus, it is a standard of verification. The individual must decide upon the criteria that can enable him or her to distinguish what is true from what is not true. Not all criteria have equal validity.

In addition to a standard, it is important to realize that people often have erroneous ways of reasoning about facts. Material fallacies are numerous, deceptive and elusive – so elusive that a person untrained in detecting them can easily be misled into accepting them as valid. The necessity of reasoning without committing error is an obvious asset for all persons, but is particularly pertinent to the issue of evidence-based practice. The fallacies that concern us here are all properly classified as material – meaning that the error lies in the factual context of the argument rather than the structure of the argument.

What might be the indicators of outcome or the ‘truth criterion’ for local evidence-based practice?

Functional contextualists, for example, determine the validity of ‘truth’ by looking at the purpose or function of the action. If the outcome of the action includes enough features to successfully achieve the goal, then it is deemed ‘true’. In other words, for contextualists the truth and meaning of an idea lies in its function or utility, not in how well it is said to mirror reality. The truth criterion of contextualism is thus dubbed ‘successful working’, whereby an intervention can be said to be true or valid insofar as it leads to effective action, or achievement of some goal. For the contextualist, ideas are verified by human experiences, with an idea’s ‘meaning’ essentially defined by its practical consequences, and its ‘truth’ by the degree to which those consequences reflect successful action.

Functional contextualists seek to predict and influence events using empirically based concepts and rules. This approach reveals a strong adherence to contextualism’s extremely practical truth criterion and can be likened to the enterprise of science or engineering, in which general rules and principles are used to predict and influence events. In psychology, functional contextualism has been developed explicitly as a philosophy of science and therefore is considered to be an ideal criterion for our endeavors.

In the context of the ‘dead man test’, medication must be considered to be a reasonably effective intervention for psychological problems. However, the ‘dead man’ test, first attributed to Ogden Lindsley, was designed as a means of determining if outcome objectives are properly specified. That is, if a dead man could fulfill the criteria, that objective was considered to be inadequate. Thus any focus upon reduction or elimination of excess behavior would fail the dead man test, since a dead man does nothing. Medication, along with contributing some dreadful side effects, is only oriented towards reduction of what the clinician or the community considers to be excess behavior. Medication does nothing to improve the social competence functioning of the individual subject, and this is one reason why ‘social rehabilitation’ is considered to be a companion [at least in theory, if not in practice]. The fact that medication inhibits the subject’s ability to participate effectively in ‘social rehabilitation’ is generally ignored.

Nonetheless, if the truth criterion is the dead man test, medication is an evidence-based intervention. Obviously Lindsley disagreed with this standard. In fact, most clinicians do [at least in theory, if not in practice]. Why we separate our theory from our practice is an interesting question, which we will not undertake here, but it raises the question of how we are going to develop the criteria that will be used to determine if an intervention is evidence based. Philosophical considerations do matter, for they set the constraints on what is acceptable action.

We start off every client engagement with some form of assessment. What we looking for depends on whether we are using a competence model or a defect model. The totality of DSM at whatever level we want to consider it is designed to identify a ‘defect; or a pathology as the experts like to call it. This is a very different assessment than one that is oriented toward discovering what improvements the client would like to make in his or her life. When the client says that s/he is distressed because of such and so – or can’t do this or that – is that important? Does the clinical evaluation ever state what is disturbing to the client? Or does the clinical report indicate what is disturbing to those around the client? Which of these concerns is the one we should be intervening with and what outcome expectations should we identify? Should we focus on clinical outcomes or personal outcomes? These are significant questions that are largely ignored.

One reason the questions are important in that they determine the purpose of the actions. When using the contextualist ‘truth criterion we are required to determine the purpose of the intervention in order to determine the acceptability of the outcome. Is the purpose of the public system to protect the community or is it to enable the client to improve his or her performance to his or her own standards? This purely philosophical issue has major ramifications in setting the truth criterion standards for our evidence-based interventions. If we decide that protection of the community is the primary outcome responsibility, then medication and incarceration become evidence-based practices. If, on the other hand, we decide that personal improvement in social competence is the primary outcome responsibility; medication and incarceration are failures and should be abandoned.

Another criterion that might be important to consider is the one of ‘do no harm’. Is the process of helping benign or intrusive? Intrusive interventions are often beneficial not only to the community, but to the subject him/herself. Consider an infected appendix. Certainly an operation is an intrusive procedure, but it may be the only way to save a life. So we cannot rule out intrusive procedures a priori, but how are we to determine whether intrusion is necessary. Further concern might be focused on whether intrusion is only necessary because of our failure to entertain the perceived preferences of our client. If we were addressing the client’s own goals and preferences, would we need to be intrusive?

Enough philosophy. We still have to address the issues of what constitutes evidence and whether evidence-based practice is a good or a bad goal for public human services. We will assume, for purposes of this discussion, that the purpose of the intervention is to help the client reduce his/her own distress through a process of increasing social competence. Obviously, if you have other truth criteria, you will need to take what follows with a ‘grain of salt’.

With this perspective, we are immediately aware that we must find a new method of assessment. Our emphasis will primarily need to focus on what the client finds disturbing and does s/he want to change. This does not imply that the community concerns are irrelevant, but it focuses on the only person who can change the community’s perspective. We may also need to work with the ‘community of interest’ [significant personal network] for it may be that they are contributing to the problem by their messages and interactions. While we won’t go into detail to define this, an obvious example would be a parent who constantly tells his or her child that s/he is ‘stupid’, contributing to the child’s distressing belief that s/he is ‘stupid’ – with the corollary belief that s/he is therefore unworthy. Obviously, changing the message to the child would be beneficial to the process of change.

The first things that we need to identify are the indicators of achievement of the goals. Since most problems in living are relational difficulties, the construct of social competence can supply some specific indicators, including, but not necessarily limited to:

Self-Awareness

Identifying emotions: Identifying and labeling one’s feelings
Recognizing strengths: Identifying and cultivating one’s strengths and positive qualities

Social Awareness

Perspective-taking: Identifying and understanding the thoughts and feelings of others
Appreciating diversity: Understanding that individual and group differences complement each other and make the world more interesting

Self-Management

Managing emotions: Monitoring and regulating feelings so they aid rather than impede the handling of situations
Goal setting: Establishing and working toward the achievement of short- and long-term pro-social goals

Responsible Decision Making

Analyzing situations: Accurately perceiving situations in which a decision is to be made and assessing factors that might influence one’s response
Assuming Personal responsibility: Recognizing and understanding one’s obligation to engage in ethical, safe, and legal behaviors
Respecting others: Believing that others deserve to be treated with kindness and compassion and feeling motivated to contribute to the common good
Problem solving: Generating, implementing, and evaluating positive and informed solutions to problems

Relationship Skills

Communication: Using verbal and nonverbal skills to express oneself and promote positive and effective exchanges with others
Building relationships: Establishing and maintaining healthy and rewarding connections with individuals and groups
Negotiation: Achieving mutually satisfactory resolutions to conflict by addressing the needs of all concerned
Refusal: Effectively conveying and following through with one’s decision not to engage in unwanted, unsafe, unethical, or unlawful conduct.

Since competence can be defined as ‘capacity to expectation’, the next issue becomes one of determining the expectations for each of these elements. Questions about expectations concern the ‘community of interest’. Are their expectations consistent and achievable? If not, perhaps the expectations are unreasonable and the change work that occurs needs to occur with the community itself, and not directly with the child. Part of the problem is that human expectations are not agreed upon standards. We will ignore the fact that expectations are of two different types: normative and belief, and only indicate that my expectation for a child’s performance may be quite different from yours, while both might be considered acceptable or unacceptable to a reasonable observer. When subtle variances in expectation exist, it may be necessary to get some negotiation going to bring these expectations into consciousness for examination.

But does the child want to be socially competent? Probably many with severe and persistent problems in living will reject becoming socially competent in the generally accepted definition of the term – but almost all will seek acceptance from some social unit, albeit maybe a deviant one. Human beings are social animals and most of us want to be accepted. When the developmental process leads to isolation, this is a major factor in future problems in living. It is this secondary aspect [if you reject me; I will reject you] that drives the rejection of social competence, not the base instincts of the individual. What this means is that if hope for inclusion can be manifest, the desire is likely to follow.

We would go further and suggest that the process of helping the child manifest hope is one of helping them deal with the issues presently on his or her own agenda. An example of such ‘starting where the child is’ would be the ‘harm reduction’ focus of addiction intervention. Rather than seek abstinence, if the child is rejecting intervention, seeking to enable the client to identify harm from his or her own perspective leads inevitably to a reduction of substance abuse. This is the jujitsu of using the client’s own power to move toward socially acceptable goals.

But we digress. Social competence can be measured if the clinician, the client and the community of interest are willing to develop the proper indicators. ‘Having a friend’ may be a relationship goal, but one would need to define the activities and interactions that determine a ‘friend’. But once defined, we can determine that a friend is, in fact, a ‘friend’. Note that a ‘dead man’ has no friends.

Now to the Evidence¹

What constitutes evidence in evidence-based practice? Assuming we have proper truth criterion, we can now attempt to measure whether an intervention is able to meet these outcome standards. Conducting ‘outcome research’ involves trying to answer questions such as “What are you doing?” “Does it work?” and “How do you know it works?”

This evidence-based focus is a relatively new endeavor in the area of psychological work with children. Early reviews of outcome research with children and adolescents suggested that talk therapy and/or the use of medication are not effective. Although there is no indication as to what ‘truth criterion’ was used, it is apparent from our previous discussion that it was more than the ‘dead man’ test. It was discovered that such interventions worked no more effectively than no intervention at all – or simply as well as the passage of time. It was also observed that the clinicians using these methodologies most frequently intervened with children who had less difficult problems that would improve simply with time and less frequently with children whose problems were more persistent, such as conduct problems or aggression. Those with more persistent problems in living are often considered to be resistant and/or not appropriate for services – a tacit admission of an inability to help effectively.

Criteria Used to Establish Evidence-based Practice

What determines if a specific intervention is considered to be ‘evidence-based’? The following are generally accepted as necessary criteria:

The intervention has well defined protocol, techniques and procedures that can be taught to other clinicians,
The evaluation of the technique uses well-controlled studies, usually involving random assignment of participants to intervention conditions,
There is a clear definition of the selection criteria for participants in the study (that is, the people who will be included in the study are carefully described),
There is use of multiple outcome measures, so one does not base judgment of outcome with a person on only one measure, and
There is replication of results in multiple settings: other researchers repeat the procedures and get similar results.

If one follows the development of the evidence-based practice movement in medicine and psychology and the criteria by which a given intervention is deemed evidence-based, in accordance with American Psychological Association’s (APA) guidelines, one would discover both strengths and weaknesses. A number of weaknesses mitigate against implementing these in day-to-day practice.

A Brief Sketch of the Evidence-Based Practice Movement

Clinical practice guidelines (CPG) were introduced into medicine about two decades ago. The purpose of these guidelines was to assist healthcare professionals in clinical decision-making. Not only did CPG encourage clinicians to employ valid, empirically sound interventions, but they also attempted to move toward a standardization of the decision-making process regarding case formulation and treatment planning. This emphasis on empirical research and standardization helped promote the quality of medical-surgical services and reduce variance in treatments prescribed as well as errors (e.g., dispensing the wrong medication).

A decade later, in 1991, the American Psychiatric Association (ApA) began developing CPGs of their own to assist psychiatrists in the clinical decision-making process with respect to people presenting with psychological problems. It should come as no surprise that their guidelines promoted pharmacotherapy over nonpharmacological approaches. Psychiatrists have more training and experience prescribing psychotropic medication rather than prescribing psychosocial interventions.

To counter-balance this bias, the American Psychological Association (APA) assembled a Task Force to help promote efficacious psychological interventions. The task force was comprised of a panel of about a dozen experts directed to describe how efficacious [producing or sure to produce intended or appropriate effects] interventions are identified and selected.

Interventions were identified as potential candidates for the list in a number of ways – the task force:

Asked for nominations from the field via the APA Monitor, the Division 12 Clinical Psychologist, and Internet lists serving the Society for Psychotherapy Research and the Society for a Science of Clinical Psychology, and their own published reports, among other sources;
Scanned the journals publishing psychotherapy research ourselves monthly; and
Conducted literature reviews on specific topics using services such as PsychLit and MedLine and checking the reference sections of papers and reviews encountered in this process

Once a potential intervention was identified through this method, a reviewer took responsibility for evaluating the literature on its efficacy. The reviewer then reported back to the group at large with a recommendation. Points of disagreement were debated and clarified until a consensus is reached or, more rarely, a vote is taken. From this process, the Task Force further divided efficacious interventions into two groups, well-established interventions and probably efficacious interventions.

Criteria for Evidence-Based Practices

The Task Force differentiated between two levels of efficacy:

Well-Established Interventions

Well-established interventions meet the following criteria (I or II, and III, IV, and V):

I. At least two good between group design experiments demonstrating efficacy in one of two ways:

A. The intervention in question is statistically significant to a placebo condition (pill or psychological) or to another treatment.

B. The intervention in question is equivalent to an already established intervention in experiments with adequate sample sizes.

II. More than a series of nine [09] single case experiments demonstrating efficacy. These experiments must employ a good experimental design and compare the intervention in question to another placebo or treatment, as is the case in I.A.

III. Experiments must utilize manuals.

IV. Participant characteristics must be explicitly stated.

V. Results must come from at least two different research teams.

Probably Efficacious Interventions

Probably efficacious interventions share criteria IV and V though differ with respect to criteria I through III.

I. Two experiments that demonstrate that the intervention in question is statistically significant to a waiting-list control group.

II. One or more experiments meeting all the criteria save criterion V.

III. Three or fewer single case experiments otherwise meeting all the criteria for well-established interventions.

Weaknesses of Evidence-Based Practices

From our perspective there are three major weaknesses of EBPs with respect to their applicability to clinical settings:

1. The studies on the list were chosen based on statistical significance not on clinical significance
2. They were selected based on efficacy not effectiveness
3. Selection criteria excluded many if not most client heterogeneous populations clinicians work with in day-to-day practice

Statistical Significance Versus Clinical Significance

Statistical significance refers to the extent to which an observed difference between two or more means is due to chance factors. In other words, was the change observed in the dependent variable likely due to the independent variable or to some other uncontrolled, extraneous factor such as poorly worded items on a questionnaire, ambient temperature, and the like. Clinical significance, by contrast, refers to the extent to which a clinical outcome is meaningful.

For example, if research participants in the group reduced the number of cigarettes smoked in a day from 40 to 38, all things being equal, given a large enough sample size, this would likely be a statistically significant result. However, few participants would consider this a meaningful change. A 2-cigarette-a-day reduction wouldn’t likely save them enough money to care or affect their health appreciably. A 2-cigarette-a-day reduction is simply not clinically significant. Clinical significance has been defined in various ways, but one popular way is the extent to which the intervention takes someone who is an undesirable category (smoker, sick, depressed) and places them in a desirable category (nonsmoker, healthy, non-depressed) – substantive change in quality of life.

The Task Force did not take clinical significance into account when developing their empirically supported list. Hence, although the intervention effects on their list are likely to produce some kind of effect, we don’t know if those effects were trivial or at all meaningful.

Efficacy Versus Effectiveness

The criteria by which the Task Force deemed an intervention evidence-based pertain to efficacy not effectiveness. Efficacy, for the Task Force, refers to the extent to which an intervention is beneficial for clients while effectiveness, as used here, refers to the extent to which an efficacious treatment can be exported from research to community or private practice settings. Although the Task Force acknowledges this weakness, one does not know whether or not the interventions on the list are generalizable to other contexts (e.g., everyday clinical practice), above and beyond highly controlled research settings. Some interventions might prove too technical, costly or too labor intensive to employ.

Participant Characteristics: Heterogeneity Versus Homogeneity

Criterion IV states that participant characteristics must be explicitly stated. By and large, the studies on the EBP list employed the Diagnostic and Statistical Manual of Mental Disorders criteria for selecting participants. In many cases, participants presenting with co-morbid issues were excluded. For example, people presenting with substance abuse problems and/or mental retardation in addition to, say, anxiety and/or depression did not participate in the research. This, of course, makes sense as this exclusionary process increases internal validity. However, public clients presenting with co-morbid problems in clinical settings are more often the rule than the exception, and, as such, many studies are simply irrelevant as clinicians seldom see such ‘pure’ cases walk through the door. Clinicians require protocols to help them serve people with co-morbid problems.

The Impact of the Weaknesses

These weaknesses open up the opportunity for clinicians to resist the whole model and continue to support the status quo. The list is open to question and the process impedes creative innovation to solve clinical problems. Science is clearly a positive methodology, however, it is important to realize is that Yugo engineers and Lexus engineers were both science-based. However, one company also embraced quality improvement methodologies. Without a market test, Yugo would still be in business. Human services have no market test in the public systems and it is the public systems that deal with the most significant problems in living.

In lieu of a market system to hone customer satisfaction, the next best option would be to measure outcome of client expectations. Evidence-based interventions cannot be just idle items on a list. The overall reason for producing such a list is to attempt to identify interventions, which have been shown to have more quality than others. This emphasis on quality distinctions is good; however it does not go far enough.

We have found that the positives of EBP include:

A focus on outcome
An increase in integrity and quality
A reduction of quackery [practices that do not work]
Consistency of process
Advances in knowledge

At the same time, we find the weaknesses of:

Creative constraints – how do we develop innovative services when we can only use what has been researched? This is contrary to the principles of quality management in which the management steers [indicates expected outcome] and the staff rows [figures out how best to achieve the expected outcome within the philosophical constraints].
The significance of the failure to consider clinical impact.
The significance of co-existing difficulties.
The existence of cultural factors.

What is needed to extend the values of EBP is a Research & Development arm of the clinical system. Almost no business survives very long without an ability to measure the impact of its products and services and to develop new products and services as necessary. Human services and particularly healthcare have rejected the necessity for this aspect despite historic attempts [the Base Service Unit] to ensure that it occurs. Part of the reason for this may be that too much discussion has been oriented toward the role of research in clinical practice. This level of discussion may be too abstract and raises philosophical questions about what should be included in measurement as applied to human behavior and what it implies for practice. Too often clinicians resist measurement of outcome performance even in its minimal level, since such measurement can negatively impact on them directly. Practice agencies rarely, if ever, include outcome data in their annual reports choosing instead to include only custodial data – e.g., number of people served, units of service, etc.

It may be easier, therefore, to focus on the implementation of sound, quality improvement systems, which can refine any starting point to more common end points. In contemporary American business there are two approaches to quality improvement:

1) The ‘Theory of Bad Apples’ [Command and Control]

The ‘bad apple’ approach posits that discovering and removing defective practitioners best achieves quality. Attempts to improve quality are by inspection and include procedures such as licensing and adjudication of complaints, recertification, establishment of thresholds for acceptability, and require research into better tools for inspection (e.g., increasing sensitivity and specificity). Essential to this approach, often called ‘command and control’ management, is the search for outliers, by for example, examination of mortality or morbidity data, as well as ‘vigilant regulation’. In this approach philosophy, one uses deterrence to improve quality and punishment or the threat of punishment to control workers who do not care enough or have problems in doing the right thing or doing things right. This approach leads to a defensive and fearful workforce who attempts to hide their perceived mistakes or weaknesses. Human services have relied almost exclusively on this approach.

2) The ‘Theory of Continuous Improvement’ (Outcome Management)

This approach focuses on outcome and holds that problems should be viewed as opportunities to improve quality, and that defects in quality are only rarely attributed to particular individual’s lack of will, skill, or intention. Even when individuals are at the root of the defect, the problem is generally not one of motivation or effort, but rather of poor job design or unclear direction. According to this outlook, real improvement in quality depends on understanding and revising the production processes on the basis of data about the processes themselves. Continuous improvement is sought throughout the organization through constant effort to reduce waste, errors, rework and complexity. The focus is on the average producer, not the outlier, and on learning. Two key components in managing quality are to count and reduce defects and to measure and reduce cycle time. Both rely on the worker’s definition. However, the key principle is customer satisfaction. Thus, introducing quality into human services changes the power equation – the client becomes the ‘expert’ and his or her satisfaction become a major issue.

Unfortunately human services have focused on the bad apple theory with a wide degree of intervention tolerated until thresholds of ethicality are breached. Often clinicians who sleep with their clients or kill their clients (such as the Denver rebirthing incident) are detected and punished. However, much that does not fall outside these ethical boundaries is tolerated and never improved.

In order to develop a methodology that addresses the strengths of EBP as well as the weaknesses, the local policy makers might:

Provide basic education in quality improvement philosophy and practices for all members of a service delivery organization
Develop technologies to increasingly understand consumers’ expectation, needs, desires, and dissatisfactions
Understand and ‘flow-chart’ the processes in the service delivery organization that affect outcomes.
Focus on improving outcomes through profound knowledge of processes rather than on detecting ‘bad apple’ individuals.
Designate and consistently use quality indicators such as client satisfaction, substantive clinical change, safety, improved functioning, cost impact, among others.
Develop reliable, affordable, information systems that capture quality indicators – particularly outcomes.
Develop situational evidence-based practice guidelines that are continually improved.
Develop learning trials which feedback information that improves all components of the system
Provide incentive systems for meeting or exceeding quality goals as well as for rewarding suggestions that work – ‘That which gets rewarded, gets done!’
Develop and utilize benchmarks to compare to national averages, ideals, competitors, etc.
Develop transparent report cards so that purchasers among others can be educated on the quality of the services provided

There should be an ethical mandate from local authorities that all psychological practice be delivered in a context of a sound quality improvement system. By meaningful quality improvement system, we mean a system that meets the above standards

The Learning System

When current practice is not evidence-based, there has to be latitude for practices to prove their worth. The ‘learning system’ provides that opportunity as it is identifying and recording information about outcomes as a normal course of events. However, any practice that is not already identified as evidence-based in local clinical circles should be considered ‘experimental’ and any client should be considered a ‘subject’ with all the attendant protections of a research subject. New or innovative services would be required to meet the standards of the Code of Federal Regulations Title 45 and the Ethical Standards in The Belmont Report.

Quality Improvement is an iterative process, continuously improving products and services in keeping with clients’ ever-changing demands and expectations. By perpetually upgrading products and services, everyone involved – the organization, clinicians, clients, and key stakeholders (e.g., funding sources and insurance companies) – are in a position to benefit greatly.

Quality Improvement programs:

Improve productivity (i.e., a more effective process produces more output units);
Utilize interesting technologies (i.e., there is an incentive for research and development of new technologies);
Improve value of services (by requiring fewer resources);
Are sensitive to unintended negative effects (by continuously measuring process variables, unintended negative consequences are readily flagged); and
Shift an agency’s focus from short-term goals to a long-term orientation (i.e., how does the company improve the product or service in accordance with the ever-changing needs and expectations of clients) – the organization becomes a learning system.

By understanding the workings of a process and its impact on the outcome we expect, we can then use this feedback to identify performance deficits and cycle delays and make efforts at improving them. We can also ensure that the evidence-based practice is, in fact, based on evidence in the practical field in which it is practiced. If the intervention does not work it is either revised or abandoned. If it does work, e.g., we have local documentation of that experience.

No one has yet been able to predict a child’s outcome from earlier events; we have as yet no direct cause-effect relationships established. It is always easier to explain behavior than to predict behavior. Making predictions and testing those predictions against the idea that your prediction might be completely wrong represent central features of the scientific approach. In contrast, explanations can appear reasonable and still demonstrate a thinking bias or error because explanations are seldom tested scientifically.

The following is adapted from Evidence-Based Practice in Psychology and Behavior Analysis by William O’Donohue & Kyle E. Ferguson of the University of Nevada, Reno. Their words make up most of the commentary from this point. This is an excellent article that I both want to promote and add to it.

Now to the Evidence¹

Criteria Used to Establish Evidence-based Practice

A Brief Sketch of the Evidence-Based Practice Movement

Criteria for Evidence-Based Practices

Weaknesses of Evidence-Based Practices

The Impact of the Weaknesses

The Learning System

BROWSE Topics

Disclaimer

19 Finding and Using Evidence

Now to the Evidence1

Criteria Used to Establish Evidence-based Practice

A Brief Sketch of the Evidence-Based Practice Movement

Criteria for Evidence-Based Practices

Weaknesses of Evidence-Based Practices

The Impact of the Weaknesses

The Learning System

BROWSE Topics

Disclaimer

Now to the Evidence¹