NADD Bulletin Volume X Number 6 Article 3

Complete listing

Measuring Psychiatric Treatment Outcomes in a Dual Diagnosis Partial Hospital Setting

Karen Shedlack, M.D., Antonia Chronopoulos, Ph.D., Elena Pellegrino, M.A., Michael Stefanov, Ph.D.

Assessing treatment outcome in dual diagnosis is essential and yet, impractical in many settings. Rating scales can be highly subjective and overly cumbersome. While psychiatric scales are not always appropriate for individuals with mental retardation and developmental disabilities (MR/DD), developmental scales can exclude adult psychopathology. Discussion of available instruments and examples of clinical application will be presented.


Psychiatric rating scales that are designed for measuring change in mental status may not be appropriate for adults with MR/DD, as the inherent cognitive limitations associated with developmental syndromes often impede the ability to self-report internal states. In addition, individuals with MR/DD usually lack the ability to make comparisons as to the intensity and duration of symptoms. For example, questions such as, “Are the voices better or worse than when we last spoke?” and the natural follow-ups, “How much better/worse?” and “How long have they been better/worse?” and “What made them better/worse?” are virtually impossible for most people with dual diagnosis to accurately answer. Conversely, some behaviorally-oriented scales that are available for use with MR/DD populations are purposely designed to disregard DSM-IV and ICD-10 psychiatric symptoms, in order to avoid the self-interpretation bias (Aman, 1991). In addition, most mental health rating scales are lengthy and can only be properly used by staff who are trained in psychiatry. These features rule out the use of most rating scales in dual diagnosis clinics and other outpatient settings in which the psychiatrist’s time is limited. The clinical benefits of routinely utilizing an appropriate rating instrument in adult dual diagnosis populations is obvious. However, in America, as third party payers and regulatory agencies become more interested in quantifying and comparing psychiatric change occurring in various treatment settings and over various time frames for all persons, the need to identify adapted rating scales for use in dual diagnosis has become timely.


Our own facility, McLean Hospital, a free-standing psychiatric hospital and division of Partners HealthCare and Harvard Medical School, requires the use of an instrument called the BASIS-24 (Eisen, Normand, Belanger, Spiro, & Esch, 2004) for every inpatient and for most people being treated at the partial hospital level of care. This scale requires the ability to make comparisons and judge degrees of psychiatric improvement or decline and then self-rate these changes using a 5-point Likert scale. The Developmental Disabilities Partial Hospital, the Child and Adolescent Program and the Dementia Unit have all been given dispensation from using this scale for obvious reasons. However, the directors of these programs have been charged with finding a psychiatric rating scale that is appropriate for their specialty populations. Most have opted to use the Brief Psychiatric Rating Scale (BPRS) (Overall & Gorham, 1962) which has excellent reliability and validity for psychiatric populations. However, for some items, the instructions specifically state that the rater must rate only on the basis of the patient’s self-report, NOT on the basis of the rater’s observations or inferences about the patient’s internal state. Thus, the BPRS is problematic for the MR/DD dual diagnosis population.

At the McLean Hospital Developmental Disabilities Program (DDP) (Shedlack & Chapman, 2004), we have historically utilized the Aberrant Behavior Checklist (ABC) (Aman, Singh, Stewart, & Field, 1985 a & b) and the Global Assessment of Functioning (GAF) (American Psychiatric Association, 1994) to rate behavioral and psychiatric change over time (at admission and discharge) in our population of ambulatory, dual diagnosis, adults with mild-to-moderate mental retardation, or borderline I.Q. After years of experience with these scales, we have found that the ABC provides an excellent range of items and avoids the problems of patient-based ratings. However, it misses some psychotic symptoms and mood states. In addition, it is lengthy and the identified items do not always intuitively match their more formal descriptions. We find that there is much interpretation involved in the use of the ABC and have taken on the practice of performing the ABC ratings as a collaborative staff group, rather than having an individual staff member use this scale on their own. The ratings thus generated avoid interpreter bias in terms of what is actually being rated in each item but are consensus, rather than individual, ratings. The GAF, which is a required element of the medical record, is generated by the case manager who fills out the admission and discharge paperwork. The GAF is quite limited, as it has an inherent ceiling effect such that the presence of adaptive limitations among individuals with MR/DD confines our target population, by definition, to the lower ranges of this scale (Shedlack, Hennen, Magee, & Cheron, 2005).

In looking for a rating scale to measure not psychiatric diagnosis per se, but change in mental status over a short duration of focused treatment, we looked for scales that would be appropriate for both the psychiatry and the mental retardation components of dual diagnosis and for our partial hospital setting, in which people are retained in this level of care for only a matter of weeks to months. We also kept in mind the fact that the people we serve are screened for mild intellectual disability and for the presence of at least some psychiatric symptomatology as criteria for admission to the partial hospital level of care. We reviewed scales in three categories: (a) Behavior in MR/DD: the ABC and the Reiss Screen for Maladaptive Behavior (Reiss, 1997);  (b) Psychiatric symptoms: the GAF, the Clinical Global Impressions (CGI) (Guy, 1976 ), the BPRS, and the Behavioral Observation Scale (BOS) (LePage & Mogge, 2001),and (c) Dual diagnosis: the Psychopathology Instrument for Mentally Retarded Adults (PIMRA) (Matson, Kadzin, & Senatore, 1984), the PAS-ADD Checklist (Psychiatric Assessment Schedule for Adults with Developmental Disabilities, Checklist) (Moss et al., 1998) and the Developmental Behavior Checklist- Adults (DBC-A) (Mohr, Tonge, & Enfield, 2005). The instruments were reviewed for rationale/target population, design, psychometric properties, ease of use, time required to complete the scale and perform the scoring procedure, pitfalls, and apparent sensitivity for measuring short-term change in mental status in our population and in our setting. We also looked at whether the ratings from multiple scales seemed to corroborate each other and whether the scales seemed to validate our clinical impressions of individuals’ progress or lack thereof during their treatment at the partial hospital.


We will review the instruments and present some sample items from each for illustrative purposes. We will review scoring and subscales for each instrument. We will also present case vignettes and review actual rating scale data from DDP patients in order to illustrate in detail the strengths and weaknesses of each of the instruments for use with our population and in our setting.

(Insert Table 1 about here.)

Table 1. Characteristics of the Instruments


InstrumentNumber of Questions, Range of ResponsesSubscalesTime to Administer

(approximate)Type of Scale   

ABC58 items, 

4 pt. scaleIrritability, Lethargy, Stereotypy,


Inappropriate Speech25 minutesDevelopmental   

BASIS - 2424 items, 

5 pt. scaleDepression and Functioning,



Sub. Abuse,


Self-Harm10 minutesPsychiatric 


BOS34 items, 

5 pt. scaleDepression, Mania, Psychosis,

Acting Out5 minutesPsychiatric


BPRS18 or 24 item

7 pt. scaleThought Disorder,




Activity5 minutesPsychiatric 


CGI1 item

7 pt. scaleSeverity of Illness2 minutesPsychiatric

DBC-A107 items,

3 pt. scaleDisruption, Depressive,


Social Relations,




(Psychosis)20 minutesDevelopmental/


GAF1 item, 

100 pt. scaleSeverity of Illness2 minutesPsychiatric



Checklist29 items,

4 pt. scaleAffective Disorders,

Organic Conditions,


Conditions5 minutesDevelopmental/


PIMRA 56 items,









Adjustment12 minutesDevelopmental/


REISS38 items,

3 pt. scaleAggression,

Autism, Paranoia




Dependent P.D.,

Drug Abuse15 minutesDevelopmental/




In searching for an ideal instrument to rate change in psychiatric symptoms among individuals with MR/DD during a short-term intervention at the partial hospital level of care, we found that no single instrument was a perfect fit for our purposes. Each provided advantages and disadvantages such that it might be quite useful in our setting, but not necessarily as an outcome measure for psychiatric treatment response. Some of the common features of these scales and their application to our population and our setting are discussed below.

Observation vs. self-report:  In using these various instruments, we were constantly reminded of the advantages of a purely observational scale, in that the issue of reporting bias in individuals with limited cognitive abilities is avoided entirely. The instruments that rely strictly on clinical observations are the ABC, DBC-A and the BOS. The ratings of skilled observers are very valuable in the psychiatric assessment process. However, these instruments are necessarily limited to those phenomena that can be witnessed, and, by definition, will exclude items pertaining to internal states such as moods, hallucinations, or delusions. Thus, in a psychiatric treatment setting, the exclusion of pertinent aspects of the mental status exam is problematic for determining outcomes. In general, we found that we preferred scales that contained both staff-observation and patient-report items for providing rational outcome information.


Breadth of the numeric ratings:  The number of response choices provided by the particular instrument was also a very important issue. The GAF uses a 100 point scale but generates a single value which, by definition for individuals with MR/DD, will lie in the range of 30-40, thus severely limiting its utility as an outcome measure for this population. Instruments that are scored on a yes/no basis (only two possible ratings), such as the PIMRA, were also not useful in our setting, since the identified symptoms and behaviors did not change enough to entirely arise or completely drop out during the few weeks of partial hospital treatment. Thus, the yes/no response provided almost no room for subtle change over time. Observations on the Reiss Screen and the DBC-A are rated on a 3-point scale (none, some, a lot), while observations on the ABC and PAS-ADD are rated on a 4-point scale (none, mild, moderate, severe). The BOS uses a 5-point scale (none, a few times, less than half the time, more than half the time, constant). The BPRS and CGI use a 7-point scale (not assessed, not present, very mild, mild, moderate, moderately severe, severe, extremely severe). We found that the instruments that were the most satisfying to use in terms of number of response choices were the instruments with 4 choices. These scales seemed to provide a sensitive measure of change over the short-term and were not so cumbersome as to require a great deal of mental energy to determine each and every rating accurately.

Breadth of the psychiatric features included:  In this category, the BPRS is the gold standard as it is designed to assess the most common psychiatric symptoms. However, it does not include obsessions and compulsions, and the 18-item scale is weak on the symptoms associated with mania (Ventura, Nuechterlein, Subotnik, Gutkind, & Gilbert, 2000). In addition, the BPRS requires that the patient’s own report be taken at face value on a number of items thus, limiting its utility for our population. The dual diagnosis scales (Reiss, PIMRA, PAS-ADD and DBC-A) all have reasonable breadth in this category, but the greater the breadth, the larger the number of items, and the longer the time required to complete the instrument. Each of the dual diagnosis instruments has its individual strengths and weaknesses in terms of the range of psychiatric symptoms assessed.

Ease of rating and scoring: We find that the most cumbersome of the instruments with respect to relative ease of determining a rating for each item is the ABC. In our opinion this is due to two factors. The first is the occasional discrepancy between the wording for each item and the more lengthy item descriptors provided separately. The second is the use of both positive and negative valence of the items to be rated. The most easy to rate instruments, due to clear language and lack of making complicated comparisons, were the Reiss, DBC-A and the PIMRA. The PAS-ADD, being tied to ICD 10 diagnostic criteria, requires comparisons of symptoms vs. a baseline rather than a point-in-time rating. Scoring for all of the instruments becomes more lengthy as the number of items and subscales increase. The DBC-A has the most individual items at 107 while the Reiss has the most subscales at 15.

Clinical utility of the scores and subscales:  Finally, we found that the subscale schemas for these instruments were vastly different from each other. These differences were, of course, fundamental to the intended purpose of each instrument. Some instruments that were designed to screen for behavior are thus broken down by behavioral features and generally disregard psychiatric symptoms. Those that were designed to screen for psychiatric “caseness” are broken down by groupings of behavioral and psychiatric features, but not necessarily by psychiatric diagnoses. In addition the “threshold” score for potential psychiatric problems of the DBC-A was so high as to not be sensitive to major psychopathology in our population. Those instruments that were specifically designed to screen for the presence of a potential ICD-10 or DSM-IV psychiatric diagnosis (PAS-ADD) are broken down by diagnostic criteria but are limited to a few diagnoses. In general, we found that the instruments that were most useful for the seeming psychiatric validity of their psychiatric subscales were the PIMRA, Reiss, and the PAS-ADD. The subscales for the BOS were confusing and the BPRS does not provide subscales.


There are advantages and disadvantages to each assessment instrument which  we reviewed with respect to our dual diagnosis population and our short-term setting. While we found no ideal instrument for our specific purpose of measuring short-term psychiatric outcome, we believe that we could adapt several of the instruments for our purposes. In so doing, we realize that the validity and reliability of the instruments would be affected. However, there would likely still be room for meaningful comparisons within our setting and patient population over time. Overall, we found that instruments with four response choices seemed the easiest to use. We noted that very clearly worded item descriptions also increased the ease of use. We did not find that most subscale scores were clinically meaningful to us but found that total item scores for some of the instruments did seem to correlate with our clinical impressions of patient progress during treatment. As valid psychiatric outcome measures are being tailored for other target populations, we feel that the pursuit of a rational, psychiatric outcome measure for the dual diagnosis population is a worthy goal that has not yet been met.  


Aman, M. G. (1991). Assessing psychopathology and behavior problems in persons with mental retardation: A review of available instruments (DHHS Publication # ADM 91:1712). Washington, DC: U.S. Government Printing Office.

Aman, M. G., Singh, N. N., Stewart, A. W., & Field, C. J. (1985a). The Aberrant Behavior Checklist: A behavior rating scale for the assessment of treatment effects. American Journal of Mental Deficiency, 89(5), 485-491.

Aman, M. G., Singh, N. N., Stewart, A. W., & Field, C. J. (1985b). Psychometric characteristics of the Aberrant Behavior Checklist. American Journal of Mental Deficiency, 89, 492-502.

American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.) Washington, DC: Author.

Eisen, S. V., Normand, S. L., Belanger, A. J., Spiro, A., & Esch, D. (2004). The revised behavior and symptom identification scale (BASIS-R) reliability and validity. Medical Care, 42, 1230-1241.

Guy, W. (1976). Clinical global impressions. In W. Guy (Ed.),  ECDEU Assessment manual for psychopharmacology, revised. Rockville, MD: National Institute on Mental Health.

LePage, J. P. & Mogge, N. L. (2001) .The Behavioral Observation System (BOS): A line staff assessment instrument of psychopathology. Journal of Clinical Psychology, 57, 1435-1444.

Matson, J. L., Kadzin, A. D., & Senatore, V. (1984). Psychometric properties of the Psychopathology Instrument for Mentally Retarded Adults. Applied Research in Mental Retardation, 5,  81-89.

Mohr, C., Tonge, B. J., & Enfield, S. L. (2005). The development of a new measure for the assessment of psychopathology in adults with intellectual disability. Journal of Intellectual Disability Research, 49, 469-480.

Moss, S. C., Prosser, H., Costello, H., Simson, N., Patel, P., Rowe, S., et al. (1998).Reliability and validity of the PAS-ADD Checklist for detecting psychiatric disorders in adults with intellectual disability. Journal of Intellectual Disabilities Research, 42, 173-183.

Overall J. E. & Gorham, D. R. (1962). The Brief Psychiatric Rating Scale Psychological Reports, 10, 799-812.

Reiss, S. (1997) Comments on the Reiss Screen for Maladaptive Behavior and its factor structure. Journal of Intellectual Disabilities Research, 41, 346-354. 

Shedlack, K. J. & Chapman, R. A. (2004). Social learning interventions in a developmental disabilities partial hospital program. American Journal of Psychiatric Rehabilitation, 7, 7-25.

Shedlack, K. J., Hennen, J., Magee, C., & Cheron, D. M. (2005). A comparison of the Aberrant Behavior Checklist and the GAF among adults with mental retardation and mental illness. Psychiatric Services, 56, 484-486.

Ventura, J., Nuechterlein, K. H., Subotnik, K. L., Gutkind, D., & Gilbert, E. (2000). Symptom dimensions in recent-onset schizophrenia and mania: A principal components analysis of the 24-item Brief Psychiatric Rating Scale. Psychiatry Research, 97, 129-135.

(Acknowledgements: The authors wish to thank Jeanne France, R.N.C. and Mary Jo Iacoboni, R.N. for their assistance in performing the clinical ratings and Anne Doherty, M.F.A. for her assistance in procuring the copyrighted scales and instruction sets.)

Information for Obtaining the Instruments

ABC: Aman, M. G. & Singh, N. N. (1986). Aberrant Behavior Checklist Manual, Slosson Educational Publications, East Aurora, NY.


BOS: Sample copies of the instrument and glossary can be obtained by contacting James P. Le Page, Ph.D. ( or Neil L. Mogge, Ph.D. (


DBC-A: Enfield, S. L., Tonge, B. J., & Mohr, C. (2002). Monash University, Centre for Developmental Psychiatry & Psychology, Dept of Child and Adolescent Psychiatry, Monash Medical Centre, 246 Clayton Road, Clayton VIC 3168, Australia. Phone: 650 949 3282 x213,

Web:, All proceeds support research.


PAS-ADD:  Moss, S. (2002). Pavilion Publishing (Brighton) Ltd, The Ironworks, Cheapside, Brighton BN1 4GD. Web: 


PIMRA: Matson, J. L. (1988). Psychopathology Inventory for Mentally Retarded Adults, Ratings by Others Scale. IDS Publishing Corporation, Orlando Park, IL. Phone: 614 885 2323,


REISS SCREEN: Reiss, S. (1987). Reiss Screen for Maladaptive Behavior, IDS, Orlando Park, IL. Reiss, S. (1988). The Reiss Screen Test Manual, IDS Publishing Corporation, Orlando Park, IL. Phone: 614 885 2323,