“In medicine, there is only one race – the human race” (Schwartz, 2001)
How this statement ought to be interpreted currently constitutes one of the most polarizing and hotly debated topics in American medicine. Few researchers or physicians would argue that the human species can be partitioned into distinct subspecies (i.e. races). In accordance with the ideals enshrined in both the Constitution and the Hippocratic Oath, most medical practitioners accept that all people are equally entitled to the best possible health care. At the same time, however, few medical professionals would claim that all patients are the same with regard to disease susceptibility and outcome, symptom profiles, or response to treatment. Instead, most doctors and scientists acknowledge that some of this variation appears to conform to racial or ethnic divisions. What biomedical researchers and doctors are far less likely to agree upon are the underlying causes for racial patterning in clinical outcome and disease risk among patients. In particular, the medical profession is deeply divided as to how (or whether) race should be incorporated into research designs or treatment plans.
Explanations for inter-patient variation invoke one of two distinct paradigms. The first emphasizes genetic differences in susceptibility to disease, while the second focuses on cultural practices (Braun, 2002). Accordingly, arguments over the source of racial health disparities stem from fundamental differences in the way health care professionals define race -as a social construct or a genetic proxy.
These disparate positions are not entirely grounded in an objective or scientific perspective (Risch et. al, 2002). They also derive from the potent imprint that historical battles over the “meaning” of race and its proper place in medicine have left on the national psyche. Patterns in human variation–how they are represented and interpreted–are not just “empirical question[s] that necessitate careful scientific analysis” (Risch et. al, 2002) because biomedical research and treatment practices do not occur in a social vacuum. Explanations for differences among groups – in terms of treatment response and disease prevalence -have potentially far reaching social consequences.
In this paper, I provide an overview of the history of race in American medicine and examine the adequacy of race as a heuristic device for representing human genetic variation. I also propose ways in which race (or some concept of ancestry related patterns of genetic variation) might be useful in certain research and public health contexts. Finally, I explain why scientists and doctors must be cautious in their interpretation and application of evidence for medically significant genetic differences among human populations.
The Shoals upon Which the Proud Past of American Medicine Gets Grounded
Note: Since the most extensive literature on this issue concerns the experience of African Americans, I will primarily focus on that group. However, it should be understood that the same racist ideology has had similarly adverse consequences for members of other so-called “inferior races.”
Carolus Linnaeus was the first western scientist to divide humans into four “races,” but most historians credit J.F. Blumenbach’s reversed treatise on racial categories (1779) as establishing the hierarchical arrangement of groups that was blatantly abused by generations of American scientists and doctors. His use of geographic location and “physical beauty” as criteria for partitioning the human species transformed the primarily cartographic system of Linnaeus into a pyramidal arrangement of “better” and “worse”. Under Blumenbach’s classificatory scheme, Caucasians came to epitomize “racial beauty”; Americans (i.e. Native Americans) and Malaysians (southeastern Asiatic and Oceanic groups) occupied intermediate positions; and Ethiopians (i.e. all African populations) were the most degenerate (Gould, 1996; Cavalli and Sforza, 1997).
Blumenbach’s system was gladly embraced by American colonists in search of scientific justification for their enslavement of Africans, as well as their forceful encroachment upon Native American tribal lands. Not only did it confirm the inherent superiority and worth of Northern Europeans, but it also suggested that they were biologically entitled to subjugate and exploit members of more “imperfect” races. The social, economic, and political ramifications of this “scientifically” accredited racism have been well documented by historians. But the impact that racist ideologies had on the medical profession was similarly severe, and it continued to shape the practice of medicine and the delivery of health care well into the 20th century.
One of the most egregious abuses of the unequal balance of power between races was the use of African Americans in medical experiments. The best-known example is the Tuskegee syphilis project. Launched in 1932 with 400 infected African Americans and 200 controls, the study continued until 1972 even though evidence for the efficacy of penicillin as a treatment for syphilis was available as early as the 1940’s. More unacceptable still – whether judged by modern standards or the tenets of the Hippocratic Oath, which physicians have taken for hundreds of years – was the subterfuge used by researchers to conceal the purpose and nature of their work. Patients were not notified that they had syphilis. Rather, they were told they had “bad blood,” an ambiguous phrase that rural African Americans associated with a host of maladies. The scientists in charge of the Tuskegee experiment also went out of their way to ensure that treatment remained inaccessible to their patients. For instance, they colluded with draft boards during World War II to ensure none of the Tuskegee participants were drafted, for researchers feared that army doctors would correctly diagnose and treat the syphilitic males from the Tuskegee study (Byrd and Clayton, 2000).
Less well-publicized incidences of exploitation litter the American historical landscape. J. Marion Sims–often credited as being the “father of gynecology” for his perfection of techniques used in the repair of vesicuvaginal fistulas and gynecological surgery– performed most of his experiments on slave women. More recently, African Americans were preferentially exploited in radiation experiments – such as feeding “volunteers” radioactive breakfast cereal – which took place at MIT, Harvard, and the Oregon State Penitentiary between 1945 and 1965 (Byrd and Clayton, 2000).
The once pervasive belief that races are fundamentally different in their biological (and later, genetic) make-up also led some doctors to ascribe phenotypic differences and social inequalities to race-specific diseases. Not surprisingly, the treatments which they devised to “cure” some of these “maladies” conformed to the political, economic, and social milieu in which American doctors worked. For example, Samuel Cartwright, a physician and pro-slavery advocate, theorized that the inferiority of black people could be traced to inadequate decarbonization of blood in the lungs, a syndrome he called dysesthesia. The treatment he proposed was particularly amenable to slave owners: patients should be washed and oiled; the oil should be beaten into their skin using a “broad leather strap;” and the patient should then be put “to some kind of hard work in the open air and sunshine that will ample him to expand his lungs” (from papers Cartwright presented at an 1851 meeting of the Louisiana Medical Association; quoted in Gould, 1996-please see page 36 in the appendix for a table listing some of the other so-called “Negro Diseases” identified by Cartwright).
The emergence of Social Darwinism provided new validation for racist interpretations of health disparities, and further legitimized inequalities in the provision of medical services. For much of the latter half of the 19th century and first quarter of the 20th, the U.S. Medical profession not only regarded African Americans as biologically, mentally, and morally inferior, but it also concluded that the poorer health of African Americans was a product of evolution. Accordingly, better health care would not reduce the burden of disease, higher infant mortality and lower life expectancy which characterized the health profile of the black populace, for these were all aspects of the “evolutionary scheme” (Boyd and Clayton, 2000). Indeed, researchers’ and physicians’ presumptions about the innate biological and intellectual inferiority of African Americans helped fuel the eugenics programs and miscegenation fears (later codified as laws against interracial marriage in 42 states) that prevailed during the first quarter of the last century. The words of Harvard geneticist Edward M. East are symptomatic of the irrational views of race which pervaded American society: “Gene packets of African origin are not valuable supplements to the gene packets of European originÃ¢Â?Â¦it is the white germ plasm that counts” (from the Science and Politics of Racial Research, by WH Tucker; quoted in Byrd and Clayton, 2000).
Biological and genetic arguments for the inferiority of African Americans continued to be espoused by individuals such as Carelton Coon (1962) and William Shockely (1960s and 1970s) in the latter half of the 20th century. However, the political and scientific foundations upon which the American medical profession had constructed its arguments for separate treatment for “blacks” and “whites,” were eroded by government enforced desegregation of most hospitals, nursing homes, and health departments during the Civil Rights Era; the drafting and publication of statements supporting human equality (such as UNESCO’s 1950 Statement on Race); and the gradual emergence of a consensus among most scientists concerning the biological and intellectual similarity of all races.
Health disparities continue to exist among racial groups, but most epidemiologists, health care officials, and other medical practitioners now look to social or environmental factors to explain these differences. The core of the current controversy over whether racial differences might have a genetic basis is located in the historical struggle to achieve this paradigm shift. If we trace health disparities to genetic factors which vary in their prevalence among different groups, do we risk reintroducing the racist practices and ideologies that have since become a source of shame for American scientists and doctors?
What’s in a Race, Anyway?
One of the central quandaries which the American medical establishment must resolve is how race is to be defined. Do racial and ethnic labels provide a useful shorthand for culturally distinct behaviors or experiences (e.g. nutritional habits, constant internalization of the stress produced by recurrent experiences of discrimination, or lower socioeconomic status) that may be among the underlying causes of health disparities? Can the same labels also be used to approximate differences in the distribution of genetic variants that may increase risk for a particular disease? It is important that biomedical researchers and physicians arrive at a common understanding of what race “is” (and/or what it “is not”). The manner in which race is defined effects the formulation of research questions; the interpretation of evidence from studies aimed at clarifying the source(s) of inter-individual (or inter-group) variation in disease risk or treatment response; and the translation of findings into clinical practice and access to medical services (Frank, 2001; Foster and Sharp, 2002).
In a recent article published in Policy Reviews, Dr. Sally Satel argued that “skin color sometimes can be a surrogate for genetic differences” (ibid, 2002) and, as a result, labels such as “black” or “white” can be used in a clinical setting to assess an individual’s risk for acquiring a certain disease or adversely responding to a particular drug. Definitions of race such as that offered by Satel reflect the lingering vestiges of traditional notions of races as separate genetic entities “produced by generations at reproductive isolation” (Frank, 2001). When applied in a medical context, such reductionist models imply that health disparities between groups reflect the existence of racially distinct gene pools generated by divergent evolutionary histories. Satel’s conclusions merely extend this conceptualization of racially packaged human variation to its logical endpoint – skin color, as one easily observed component of the suite of genetic features common to a particular race, can be used to assess less phenotypically accessible features of an individual’s genetic constitution.
Few scientists today would agree with this formulation of the relationship between skin color and genotype, primarily because the evidence for races as distinct, genetically homogenous partitions of human variation is singularly lacking. For example, one common measure used by biologists to assess whether a species can be divided into races is FST. The minimum value accepted as evidence for racial partitioning of genetic variation in a species is between 0.25 and 0.3; humans, with an average value of 0.156, don’t meet this requirement (Templeton, 1999-please see page 37 in the appendix). Richard Lewontin reached similar conclusions in a 1972 paper that has since become a key component of arguments made by those who deny any real patterning of human variation. He found that 85.4% of the total genetic diversity in the human species exists within populations. As for the remaining 15%, 8.3% exists between populations within a race, leaving only 6.3% of human genetic variation to be accounted for by the seven most common racial categories (Caucasian, Negroid, Mongoloid, Amerindian, Oceanians, Southeast Asians, and Australian Aborigines) (ibid., 1972). The salience of these findings for medical researchers or physicians, who, like Satel, believe racial classifications designate genetically distinct entities, is encapsulated in the American Anthropological Associations’ 1999 Statement on Race:
“Human populations are not unambiguous, clearly demarcated, biologically distinct groupsÃ¢Â?Â¦[and] any attempt to establish lines of division among biological populations [is] both arbitrary and subjective” (ibid, 1999).
Human genetic variation exhibits a clinal distribution. Differences among populations are not abrupt; rather, the genetic profiles of populations grade into one another in a manner that reflects “a basically continuous process of geographic isolation by distance” (Weiss, 1998)(see pages 38-40 in appendix). Previous contentions that abrupt genetic transitions mark the boundaries between the major races were primarily the product of biased sampling and interpretative frameworks. If scientists only compare allele frequencies between groups residing at geographic extremes, and then generalize from these sampled populations to all those residing on the same continent, it truly does appear as if human variation is racially patterned.
Another aspect of sampling design that determines the pattern of genetic variation observed is the particular gene or locus being examined. Individual genes have different population distribution that reflects the “unique combination of mutation, selection, and drift operating at that locus” (Weiss 1998). The implications of gene specific histories for assessment of human genetic variation among different human populations (or races, or ethnic groups) is readily apparent when one assesses clustering patterns among the same populations using genetic markers with different FST values (see pages 41 and 42 in the appendix). Markers with low FST values basically reflect the unity of the human species; by contrast, analyses that only employ high FST markers to assess the structure of human variation could easily give rise to mistaken impressions that human populations exist as biologically discrete species.
Despite the geographic gradation characteristic of global genetic variation, as well as the fact that “evolution of the human genome is best represented using a modular framework (Shriver, personal communication), it might still be argues that races do exist in America. While the black/white paradigm espoused by physicians like Satel is obviously too simple, African Americans, European Americans, Asian Americans, and Native Americans (the ‘Big Four’ in the American pantheon) do constitute separate genetic entities based on shared ancestry from a relatively small number of populations that emigrated from distant geographic regions. As mentioned earlier, when samples are taken from geographic extremes and all the intervening populations are ignored, the resultant pattern of genetic variation closely resembles that which is expected according to traditional paradigms of racial differences. The immigration patterns which prevailed during a substantial portion of American history (e.g. most European Americans are descended from German, Irish, or English settlers, while most African slaves were taken from a relatively circumscribed area of coastal West Africa) constitute just such a “sampling of the extremes.”
And yet, the same unique historical process that might have established “true” races in America also undermined the creation of firm genetic boundaries. Juxtaposition of these “isolated” groups produced new opportunities for gene flow (which had previously been negligible due to the factors such as the exigencies of travel in the pre-modern world). As a result of this process, American “races” grade into one another in complex patterns that preclude assumptions of racial homogeneity. Genetic variation is still clinaly distributed in the United States, but it is predicated upon proportionate ancestry more than geography.
So what is in a ‘race,’ anyway? A lot or a little, depending upon the sampling design, the programs and models used to assess genetic structure; and the goals of the researcher. Populations can be clustered according to greater or lesser genetic similarity, but there is no objective criterion that favors one level of resolution over another (Carulli and Storza, 1994). Accordingly, scientists must endeavor to record their reasons for choosing a particular framework (e.g. races, ethnic groups, or more narrowly defined populations) and acknowledge any limitations of the categories they’ve chosen to employ (Foster and Sharp, 2002). When social labels are used to report the distribution of genetic variation across a particular geographic region or within a particular nation, those labels should be operationalized as carefully as any other aspect of the study design. Otherwise, we risk reifying groups such as African or European Americans to such an extent that physicians like Sally Satel feel comfortable describing genetic differences in terms of Black and White.
Race in the Biomedical Context – A Lost Cause?
If races do not exist even in places like America–where, for historical reasons, genetic variation might be expected to follow a racial distribution–then does race have any place in either a research or treatment context? In fact, race does still have its uses, though they are not nearly as extensive as the current medical literature might lead us to believe. In the following sections, I will review some of the contexts in which self identified racial affiliation can provide a useful starting point for biomedical research aimed at elucidating the underlying genetic and/or environmental causes for variation–both among and between members of different social groups-in susceptibility to, and clinical outcome of, complex diseases,.
Admixed Populations I: ALD Mapping
Self-identified race can facilitate the collection of samples from individuals who share a common demographic history that may prove useful in certain types of research (Foster & Sharp, 2002). One investigative strategy that would benefit from such an approach is admixture linkage disequilibrium mapping.
Before discussing the important role ALD mapping may play in biomedical research, it is necessary to define the genetic premises underlying use of this technique for identifying the location of disease gene candidates. Linkage disequilibrium describes the tendency for loci that are located close to one another on the same chromosome to assort in a non-independent manner during meiosis (Maroni, 2001)(see page 43 in the appendix). In association studies, linkage disequilibrium can be used to determine the approximate chromosomal position of an unknown gene where the disease causing mutation has occurred. Researchers type cases and controls at a number of polymorphic markers–evenly spaced across the genome–for which the precise location on genetic and physical maps is known. If the disease causing allele is in linkage disequilibrium with an allele at one of these markers, then the marker allele will exhibit a statistically significant association with the disease phenotype, alerting investigators to the possibility that the disease gene is located somewhere in the proximity of that marker. One of the crucial determinants for success using this approach, therefore, is the selection of a set of markers sufficiently dense enough to pick up association signals.
ALD mapping represents one means by which researchers can at least partially circumvent the need to employ extremely dense arrays of markers (an important consideration given that the more markers for which study participants are typed, the more computationally burdensome and expensive the study). Admixture linkage disequilibrium refers to the phenomenon whereby gene flow between two genetically distinct populations leads to random segregation of alleles at both linked and unlinked loci (Plaff et. al, 2001). Since the admixture process generates linkage disequilibrium over long distances (often on the scale of 10 to 20 Cm), one ALD mapping’s chief appeals is the relatively small number of markers needed to perform association studies aimed at identifying the chromosomal region that contains the disease susceptibility locus (Pritchard and Przeworski, 2001).
Racial or ethnic labels may, therefore, be useful in the initial process of selecting individuals for inclusion in association studies where ALD mapping will be employed if those social labels refer to populations with a recent history of admixture. African Americans typify the sort of group that might become the focus of such efforts, but they also provide an instructive example of the limits of using self-reported race alone to select cases and controls.
Two key determinants of success using an ALD mapping approach are the following: first, the disease must exhibit frequency differences between the parental populations (more on this point later); and second, there must be little heterogeneity within the parental and admixed populations (Terwilliger and Weiss, 1998; Burmeister, 1999). It has been proposed that because so much phenotypic and genetic heterogeneity exists within both the parental groups (the various African and European populations that genetically contributed to African Americans) and the admixed population itself (admixture estimates range from 6-30% on average), African Americans are not good candidates for admixture mapping (Terwilliger and Weiss, 1998; Reich and Goldstein, 2001).
Several recent studies (Parra et. al, 1998; Parra et. al, 2001) have shown that sufficient genetic homogeneity exists between African and European populations most likely to have contributed appreciably to the ancestry of African Americans. However, these same studies have also reinforced assertions that considerable genetic heterogeneity exists within this population (see pages 44 and 45 in the appendix). Race, therefore, may facilitate the selection of individuals whose ancestry is felicitous for ALD mapping, but it does not provide researchers with a genetically homogenous sample of cases and controls.
The considerable genetic structure that exists within the African American population is most likely a product of the admixture process itself. Recent work using different hypothetical models has shown that the admixture dynamics characteristic of this group’s population history are best approximated by continuous gene flow (Plaff et. al, 2001). Under such a model, the admixture process is expected to create LD between markers greater than 10cm apart – confirmation that populations generated by this sort of interaction between groups can be assets in association studies. However, the model also predicts significant association signals will be found between unlinked loci, thereby increasing the likelihood of false positives in ALD dependent studies (ibid, 2001).
Several methods which allow researchers to discern whether association is due to linkage or genetic structure (i.e. significant association between unlinked loci) have been proposed (McKeigue, 1998; Reich and Goldstein, 2001; Schork et. al, 2001). Some have already been shown to work in the context of admixture studies using African American samples (McKeigue et. al, 2000; Parra et. al, 2001). Other strategies of controlling for population stratification should work for admixed populations like African Americans if slight modifications to the original approach are made. For example, Reich and Goldstein (2001) proposed that false positives can be controlled by first generating a baseline measure of the average association between unlinked in the case and control samples, and then assessing whether the association signal between the candidate gene and a marker is significantly greater than this “background noise.” This approach is more amenable to use in admixed populations if a few changes are made in calculations of the baseline LD statistic. Most notably, in a letter to the editor published in Genetic Epidemiology, Drs. Carrie Plaff, Mark Shriver, and Rick Kittles suggest that the markers used should be matched to the candidate gene in terms of the d level (a measure at the frequency difference for a particular allele that exists between parental populations), rather than by allele frequency similarities in the (admixed) study population (Platt et. al, 2001).
Admixed populations can play an important role in biomedical research contexts. Accordingly, when racial labels are socially employed to denote membership in these groups, selecting individuals for inclusion in association studies based on self-reported race or ethnicity may prove a valid sampling strategy. Nonetheless, racial labels are genetically imprecise and subsume a great amount of heterogeneity. Methods exist to correct for the false positives that such sample stratification will invariably produce, but researchers must first be aware that a racially homogenous group of cases and controls is unlikely to be a genetically homogenous one.
Admixed Populations II: Genetic or Environmental Causation?
Like all theories and investigative methods, the success of ALD mapping is contingent upon the validity of certain hypotheses, none of which is more basic or necessary than the assumption that discrepancies in disease frequency between parental populations reflect genetic, rather than environmental, differences. Determining whether genetic or environmental risk factors are primarily responsible for health disparities between individuals (or populations) is, however, also one of the most challenging problems confronting epidemiologists and biomedical researchers – especially when the focus of investigation is a complex disease. Moreover, establishing why a particular disease exhibits frequency differences across populations places scientists at the center of the maelstrom of controversy surrounding the definition of race. If the groups being compared are socially designated “races,” then any suggestion that genetic variation underlies observed health disparities may be misinterpreted as confirmation of races as “distinct” biological entities.
I have already addressed the basic fallacies underlying determinist conceptualizations of race, and in subsequent sections I will examine more fully the adverse consequences of this sort of genetic reductionalism. At the moment, I shall disregard the social issues surrounding the interpretation of population-level variation in disease susceptibility. Instead, I will focus only on the manner in which investigations using admixed populations may provide potentially beneficial insights into the sources of observable epidemiological patterns.
It has been suggested that one way to test whether differences in disease frequency between the parental populations are due to genetic or environmental factors is by looking to see if, in the admixed population, an association exists between degree of admixture and the frequency, severity, or other manifestations of a disease (Schork et. al, 2001). Numerous studies – using theoretical models and computer simulations, empirical data, or both – have demonstrated that it is possible to discern correlations between proportionate ancestry and disease frequency in admixed groups (e.g. Foster and Sharp, 2002; Schork et. al, 2001; Burmeister, 1999; Chalcraborly and Weiss, 1986). Indeed, comparisons between degree of admixture and disease risk can even reveal how correspondence between social status and genetic structure in a population may lead to erroneous assumptions about the relative importance of environmental factors – a decidedly unique twist to the more common focus upon the confounding effects that environmental risk factors exert on genetic epidemiological analyses.
One study which illustrates the insights to be gained from this sort of genetic epidemiological approach is R.Chakraborty and coworkers’ investigation of the relationship between Non-Insulin Dependant Diabetes mellitus (NIDDM) prevalence and American Indian admixture in the Mexican American population of San Antonio, Texas. At the time this investigation was conducted, a substantial body of epidemiological research had shown that the frequency of NIDDM varies among different ethnic groups living under similar environmental conditions, as well as among members of a single ethnic group living in different environments. In every case, the implicit assumption underlying these comparisons was that the ethnic groups being studied were genetically homogenous entities. Thus, scientists’ ability to discriminate between the relative importance of genetic and environmental factors was hindered by their failure to empirically assess whether genetic structure existed in the populations under investigation and, if present, how such substructure influenced the distribution and frequency of NIDDM among individuals.
Both Native American and Caucasian (primarily Spanish) populations have contributed substantially to the gene pool of San Antonio’s Mexican American population. Since the prevalence of NIDDM differs by a factor of 15-20 between the two parental groups (with diabetes far more frequent among Native Americans), Chakraborty and his co-investigators focused on the relationship between degree of Amerindian admixture and two different variables: 1) neighborhood residence (a proxy for socioeconomic status) and 2) disease status. Controlling for potentially confounding variables such as age or sex, they found that diabetics had, on average, a greater proportion of Amerindian ancestry than unaffected individuals (ibid, 1986-see page 46 in the appendix). However, the strength of the association between Amerindian ancestry and risk of NIDDM was not easily evaluated because admixture also varied across neighborhoods, with individuals living in the barrio (indicative of low income status) having a greater proportion of Native American ancestry than those living in the suburbs (reflecting a high income) (ibid, 1986). By performing nested gene diversity analyses and other tests, these researchers eventually determined that evidence for genetic risk factors existed and that differences in the frequency of these factors contributed to the variation in NIDDM between Caucasians (i.e. Spanish Europeans) and Native Americans. To some extent, this assessment rested upon the fact that genetic profiles, when classified by disease status, consistently revealed the existence of a positive relationship between disease susceptibility and greater levels of Amerindian admixture. But it also reflected the fact that variation between the neighborhoods in terms of NIDDM prevalence (with the disease becoming progressively more frequents among those of lower socioeconomic status) was at least partially dependent upon the distribution of Amerindian ancestry among different residential units (ibid, 1986-see page 47 in the appendix).
The work of Chakraborty and coworkers demonstrates that proportionate analyses using admixed populations can be used to determine whether genetic differences contribute to disease frequency variation between the parental populations–an important consideration when deciding whether ALD mapping might be a useful way to search for genes affecting disease risk. It also provides a cautionary example for epidemiologists analyzing the impact of environmental factors on disease. Failure to control for genetic structure in the population being examined could generate spurious associations between environmental conditions and disease prevalence-or, at least, make the correlation appear stronger than it truly is.
Racial or ethnic labels may, therefore, be helpful sampling tools in epidemiological studies if the population to which they refer is admixed, because analyses of the relationship between proportionate ancestry and disease status can provide clues as to whether research aimed at identifying the causes of disease should focus primarily on environmental or genetic risk factors. Nevertheless, the fact that the value of these “races” (i.e. admixed groups) derives from their inherent genetic heterogeneity also underscores the genetic inaccuracy of racial categories. And while researchers may employ racial labels to select samples that will help them assess whether there is a genetic component influencing differences in disease frequency between two populations (i.e. the “parental” groups), association does not equal causation. For contentions that disease disparities between groups are “genetically determined” to be supported, it must be experimentally verified that a candidate gene contributes to disease pathology and that significant allele frequency differences exist between populations (Risch, 2002).
Representative Sampling and Investigations
Human genes typically have hundreds or thousands of alleles, and since only the oldest mutations/alleles will be geographically widespread, it is reasonable to expect that different populations will have their own subset of “private” alleles (Weiss, 1998; Terwilliger and Weiss, 1998). Moreover, computer simulations have shown that the prevailing assumption among medical geneticists – that common diseases (i.e. complex) are caused by common alleles – may vastly oversimplify the allelic heterogeneity underlying clinical phenotypes (Foster and Sharp, 2002; Pritchard, 2001). Cystic fibrosis, a typical Mendelian disorder, underscores the potential pitfalls of the “common disease, common variant” model for complex diseases. Hundreds of mutations have been found in the CFTR gene, and dozens are associated with development of the disease (though different mutations vary in the severity of their affects).
Different alleles of a single gene (allelic heterogeneity) or different combinations of genes (multilocus heterogeneity) may cause phenotypically identical diseases in different populations. Investigators who hope to unravel the precise mechanisms through which a given mutation contributes to disease pathogenesis must, therefore, assess the range of variation at a candidate gene (or genes) and how it covaries with phenotypic variation. Human variation cannot be ignored in biomedical research. But if this statement seems intuitively obvious, the sampling strategies which scientists “should” employ to ensure that the groups included in their studies sufficiently represent the range of (disease conferring) variation, are less easily discerned.
In the absence of obvious biological criteria, the use of social labels to collect samples is one common – if controversial – technique for approximating the range of variation which exists between populations (Foster and Sharp, 2002; Risch et. al, 2002). Admittedly, social identifiers (like race) are extremely loose proxies for genetic variation, often obscuring more than they inform. However, they provide scientists with useful starting points and a more representative database from which to launch more accurate analyses.
On the other hand, if the concerns fueling scientists’ use of social labels as proxies for human variation are solely ones of inclusivity and representation, then it can be argued that better sampling techniques – which don’t rely on race/ethnic/population affiliation – are available. For instance, since variation conforms to a pattern of isolation by distance, scientists could superimpose a grid over world maps and randomly select individuals from each square (Ken Weiss, personal communication). Random sampling predicated upon geography or some other variable constitutes a truly race-neutral approach.
In some regions, such methodological techniques may indeed obviate the need for using social labels. But the United States is not among them. As mentioned previously, variation does not conform to an “isolation by distance” model in the U.S.A. Accordingly, a “race-neutral” sampling strategy would most likely yield samples, which reflect population proportions: European Americans would be over-represented, while members of minority groups would be under-represented, or possibly even overlooked (Risch, 2002). By contrast, race/ethnicity in America do exhibit concordance with ancestry – though the “boundaries” between groups are exceedingly amorphous due to historical patterns of gene flow – so social categories provide an (admittedly rough) framework for collecting samples that more accurately reflect the extent of human variation. “Racial sampling” may, therefore, help scientists to move beyond race, for it favors the creation of databases that more accurately reflect the extent of human genetic variation.
Race as a Medically Informative Social Phenomenon
Phenotypic variance, both within and between populations, is often represented as the sum of three major components: genotypic variance, environmental variance, and genotype-environmental covariance (Hartl and Clarke, 1997). Even in the case of simple Mendelian diseases – which are usually assumed to be strongly determined by the individual’s genotype – environmental factors mediate expression of the underlying genetic code. For quantitative traits (including complex diseases), the role of exogenous variables is considerably greater, further reducing the power of genotypes to predict phenotypes (Peltonen and McKusick, 2002).
In the current rush to identify genetic factors which influence susceptibility to common diseases, researchers must remain cognizant of the limited use of such data without information on the environmental context in which genetic risk translates into adverse health