The Role of Population Data Systems in Human Rights Abuses

William Seltzer


PRINCES, kings, and emperors have collected information about their populations in some form for millennia. Whether it is the biblical references to census taking, the Domesday Book, the Florentine Catasto (Herlihy, 1985), or the population counts of Chinese emperors (Spence, 1990), it is not hard to find examples of efforts by premodern rulers to determine the extent of their realms for the purpose of taxation and military conscription or to estimate of economic capacity. Yet the creation of a population data system–namely, a systematic collection of uniform, periodic information about a nation’s population and its constituent elements–is largely an innovation of the modern state, since it required the development of modern administrative bureaucracy, technology, and professional expertise (Headrick, 2000).

There is a growing literature on this historical development of census taking, statistics, and on the development of the social sciences generally (Alonso and Starr, 1986; Beaud and Prevost, 2000; Desrosieres, 1998; Patriarca, 1996). The functions of such population data systems include the traditional premodern functions of taxation, military and economic planning, and analysis. But they also include new state functions, such as the allocation of representation in democratic assemblies according to population (M. Anderson, 1988), or the provision of public health and the prevention and control of epidemic disease (Szreter, 1996), and the more general provision of “welfare” to local populations. Population data systems, in other words, whether based on administrative-reporting systems of one kind or another or direct inquires such as household sample surveys or population censuses, have come to serve critically important social, political, and humanitarian functions.

Yet such functions do not exhaust the uses of the population data systems. As many commentators have indicated, particularly in the literature on the efforts of European colonialists to control of populations in their far-flung empires (B. Anderson, 1991; Scott, 1998), there is a darker side to the development of these systems. Population data systems also permit the identification of vulnerable subpopulations within the larger population, or even the definition of entire populations as “outcasts” and a threat to the overall health of the state.

Most elementary demographic or applied statistics textbooks provide information on the range of general uses that population data systems serve. These texts, or at least the more thorough ones among them, also provide a balanced discussion of the wide range of errors associated with these population-data collection efforts. The harmful results sometimes associated with these systems are more rarely mentioned, and when treated, the discussion is usually limited to issues related to the threats to the integrity of the statistical system and its outputs, including minor challenges to confidentiality. Greater harm, however, can and has ensued. Moreover, the possibility of such misuses and the ensuing human rights abuses are often ignored in work on improving national statistical systems.

The purpose of this paper is to review the use of population data systems in human rights abuses and potential abuses and to open a discussion about the character of such abuses and their implications for the field of population analysis. Particular stress is laid on the ethical responsibilities of professionals involved in the development, preservation, and dissemination of such data.

The term “population data system” is used to cover: (a) one-time comprehensive data-gathering operations, such as regular population censuses or special censuses; (b) one-time or periodic inquiries carried out on a sample basis; or (c) comprehensive administrative-reporting systems, with or without a major statistical component, such as national population registration systems, that attempt to maintain a continuous record, including current address, for each member of the population or for well-defined population subgroups. It is sometimes argued that population registration systems, as purely administrative activities, have little to do with statistics. This position ignores the closeness that commonly exists between a national population registration system and a national statistical system.

In many countries with population registers, the national statistical office led the way in establishing the registration system or at least modernizing it. Even in those countries where the statistical office is no longer directly involved in the management of the system, the statistical office is often a major user and, almost inevitably, the principal locus within the government of expertise on the operations and content of large-scale population data systems. The term “vulnerable person(s)” as used in this paper is identical to the “potentially censurable or vulnerable entity” described by Begeer, de Vries, and Dukker (1986) as requiring special protection by the national statistical service.

We focus on the human rights abuses of forced migration, internment, genocide, and crimes against humanity. We first summarize available evidence on the misuse, or attempted misuse, of population data systems to bring harm to vulnerable populations. Second, we review the types of safeguards that may help to deter future misuses. Third, we present a summary of needed further research, focused on an enumeration of other hitherto unstudied cases where misuses may have occurred and on safeguards. We conclude with a reminder that population data systems have also been important sources of evidence in the prosecution of human rights abuses.

Documented Instances of Misuse or Attempted Misuse

A cursory review of the history of episodes of genocide and forced migration of the past two centuries yields some obvious instances to start thinking about population data systems and human rights abuses. These include the Nazi Holocaust during World War II; the internment of Japanese Americans during the same period; the forced removal of American Indians from their territorial lands in the United States in the nineteenth century; the forced migration of minority populations in the Soviet Union in the 1920s and 1930s; and the Rwanda genocide of 1994. In each case, a military or police apparatus sought to manage, implement, and justify the genocide or removal of entire populations from their prior civil life. The question at hand is the role of population data systems in these processes.

We distinguish four types of use of population data and expertise (Seltzer and Anderson, 2000): (a) macrodata (the use of compiled statistical results in terms of large aggregates and geographic units); (b) mesodata (the use of compiled statistical results for very small geographic units; (c) unprotected microdata (the use of information, such as identification number or name and address, along with related information that identifies an individual as a member of a vulnerable group); and (d) the use of other material, staff, services, and expertise provided by the population data system, particularly the national statistical service.

Most commentators would readily admit that statistical authorities should not grant access to confidential microdata to military or political authorities, and that they should certainly try to protect this data from military invaders. But how should and did statistical authorities respond to requests for macrodata, mesodata, and professional expertise for policies aimed at human rights abuses? This is a much more difficult question, and since macrodata is likely to be published routinely and is thus publicly available from the statistical system, professionals have generally disclaimed responsibility for harms connected to the use of macrodata. Typically macrodata refer to statistical tabulations, or related graphics, for countries, provinces or states, counties or gouvernments, and moderate to large urban agglomerations. Such tabulations, while important for policymaking, propaganda, and general administrative purposes, usually have limited value in planning or carrying out the operational aspects of a major human rights abuse.

We define mesodata as statistical results presented at such a fine level of geographic disaggregation, whether in tabular or graphic form, that the results may be used in conducting field operations at the local level. Thus the line between macrodata and mesodata will depend in part on the size of the geographic units, the distribution of the target population among these units, and the intended operational uses. For example, census aggregates showing the number of people in a target population for an individual small village may be operationally useful, while similar data for a large city would need to be further broken down by tract, ward, or even block to be operationally useful.

Table 1 lists 10 cases where misuses of population data systems have been documented as associated with major human rights abuses or where documentation exists that statistical authorities attempted to further develop such data systems where such a misuse was highly likely. Since papers are available that describe each case listed (see table 1 for citations), no attempt to cover each case in detail will be attempted here. Instead, a few features of this body of experience will be highlighted.



Place Time Period Intended Data Systems

Victims Involved

Germany 1933-1945 Jews, Gypsies, Numerous

and others

Poland 1939-1943 Jews Primarily special


France 1940-1944 Jews Population registration,

special censuses

Netherlands 1940-1944 Jews and Population

Gypsies registration system

Norway 1942-1944 Jews Special census

and proposed



Romania 1941-1943 Jews and 1941 Population

Gypsies Census

United States 19th Native Special censuses,

Century Americans population registers

United States 1941-1945 Japanese 1940 Census


Soviet Union 1919-1939 Minority Various population

populations censuses

Rwanda 1994 Tutsi Population registration

Place Type of Human Rights Source

Data Abuse

Germany Macro, Genocide Seltzer (1998)


Poland Macro, Genocide, Crimes Seltzer (1998)

micro against humanity

France Macro, Genocide, Crimes Remond (1996);

micro against humanity Seltzer (1998)

Netherlands Macro, Genocide, Crimes Seltzer (1998)

meso, against humanity


Norway Macro, Genocide, Crimes Seltzer (1998);

micro against humanity Sobye (1998)

Romania Macro, Genocide, Crimes Black (2001)

micro against humanity

United States Macro, Forced migration, Seltzer (1999)

micro other serious crimes

United States Macro, Forced migration, Seltzer and

meso internment, and Anderson (2000)

loss of property

Soviet Union Macro, Forced migration, Blum (2000)

micro other serious crimes

Rwanda Micro Genocide, Crimes des Forges (1999)

against humanity

Note: The time periods and intended victims specified refer only to

those times and victims studied in the sources cited.

The Holocaust

Six of the ten cases shown in table 1 are associated with the Holocaust. It should be noted that although these six cases were Nazi-inspired crimes, in only two cases, Germany itself and Poland, could the misuse of the data systems be attributed solely to Nazi initiatives. In France, Henri Buhle and Rene Carmille, and in Norway, Gunnar Jahn, the heads of the statistical agencies, took advantage of the political climate of German occupation or influence, to expose vulnerable target populations to further risks by proposals to undertake major new data-gathering efforts to serve both statistical and administrative purposes (Remond, 1996; Sobye, 1998).

In the Netherlands, the effort at establishing a comprehensive population registration system for administrative and statistical purposes was completed even before the Nazi-occupation (Methorst, 1936; Thomas, 1937). In 1938 H. W. Methorst, who was then the director-general of the Dutch Central Bureau of Statistics and formerly also head of the Dutch office of population registration, reported on the rapid progress being made in the Netherlands in implementing a new comprehensive system of population registration that would follow each person “from cradle to grave” and open “wide perspectives for simplification of municipal administration and at the same time social research” (1938: 713-714). By early 1941 Methorst’s successor as head of the population registration office, J. L. Lentz, had quickly adapted this general “cradle to grave” system to create special registration systems covering the Jewish and Gypsy populations of the Netherlands. These registration systems and the related identity cards played an important role in the apprehension of Dutch Jews and Gypsies prior to their eventual deportation to the death camps. Dutch Jews had the highest death rate (73 percent) of Jews residing in any occupied western European country–far higher than the death rate among the Jewish population of Belgium (40 percent) and France (25 percent), for example. At the same time, Jewish refugees from Germany and other countries living in the Netherlands during the Nazi occupation experienced an overall death rate lower than that of Dutch Jews. The best explanation for this unusual phenomenon was that these refugees, unlike most Dutch Jews, avoided registration. The experience of the Gypsies in the Netherlands was, if anything, worse than that of the Jews. The critical role of the registration system in the overall process has been stressed by such diverse observers as the German Generalkommisar for administration and justice in the Netherlands in September 1941 (Presser, 1969: 38) and the British historian Bob Moore (1997).

American Indians in the United States

The example focusing on Native Americans in the nineteenth century involves the use of two quite different types of data systems: the decennial population census and a series of special censuses and registration systems primarily focused on individual Indian nations and tribes. Prior to the 1870 United States decennial census, neither the census law nor the census forms made any provision for the enumeration of Native Americans, although in earlier censuses some Native Americans were covered. With the end of the Civil War, the search for a “solution to the Indian problem” emerged as an important policy issue and the 1870 Census marked the start of the influence of Francis A. Walker on United States population censuses. Walker served as superintendent of the 1870 and 1880 decennial censuses and was responsible for many organizational and technical advances in census taking. Moreover, after the completion of the fieldwork of the 1870 Census and while the compilation activities were under way, Walker also served as commissioner of Indian affairs.

Walker’s rationale for including all Native Americans within the scope of the decennial population census (U.S. Census Office, 1872: xvi-xvii) began by his first distinguishing between the “constitutional” and the “true” population of the country. He then argued that the Constitutional phrase

excluding Indians not taxed … seems to have been adopted by the framers

of the census law as a matter of course. Now the fact that the Constitution

excludes from the basis of representation “Indians not taxed” affords no

possible reason why, in a census which is on its face taken with equal

reference to statistical as to political interests, such persons should be

excluded from the population of the country…. An Indian not taxed should,

to put it on the lowest possible ground, be reported in the census just as

truly as the vagabond or pauper of the white or colored race. The fact that

he sustains a vague political relation is no reason why he should not be

recognized as a human being in a census which counts even the cattle and

the horses of the country.

Walker’s words clearly place Native Americans in the common family of humanity. It should be understood, however, that Walker’s view of the humanity of Native Americans was within the context of an explicit racism that ran through most of his writings, whether on Native Americans or on the new immigrants to the United States. He characterized the early settlers’ conquests of the American Indian in terms of beating “the savages with their own weapon, as men of the higher race will always do….” (Walker, 1873: 331). He also considered that what he termed “the Indian question” boiled down to two quite separate issues: “What shall be done with the Indian as an obstacle to the national progress? What shall be done with him when, and so far as, he ceases to oppose or obstruct the railways and the settlements?” (337). Walker’s answer to the first question was to push the less aggressive tribes onto reservations, but using force when required. His answer to the second question was essentially a semipermanent system of apartheid based on one or two “grand reservations” west of the Mississippi (364-375).

With regard to special censuses and registrations, numerous treaties concluded between individual Indian nations and tribes and the United States between 1817 and 1868 provided for various population censuses to be carried out (Seltzer, 1999). These censuses ranged from one-time enumerations to annual administrative censuses. With few exceptions, the treaties provided for the censuses to be carried out by Indian agents working for the Bureau of Indian Affairs, or its predecessor organizations, with little or no involvement by Native Americans. After 1868, most enumerations were no longer treaty-based but carried out by military or civilian authorities solely on the basis of federal laws or administrative decisions.

The treaties specified or implied a wide range of uses for the data to be gathered: the equitable apportionment of land and per capita-based annuities and other benefits; the determination of the number of seats individual tribes and bands of Indians were to be allocated on Tribal Councils; and the provision of population data needed for routine planning purposes as well as to assist Indian removal and resettlement programs. Equally diverse were the concepts and classification variables specified in the treaties. In addition to age, sex, marital status, and household relationship, individual treaties provided for gathering data on such topics as: intentions to emigrate or apply for United States citizenship, competency to manage ones own affairs, and orphanhood, idiocy, insanity, and loyalty during the recently concluded Civil War.

Although they seem to have been largely ignored in the demographic literature, a number of historians have referred to the results of these special censuses. Most relevant for the concerns of this paper are three nontreaty censuses carried out by special agents working under the auspices of the United States War Department in connection with the forced expulsion of Native American populations from their lands east of the Mississippi River pursuant to the Indian Removal Act of 1830. These three nontreaty censuses were: the 1835 census of the Eastern Cherokee (Foreman, 1953 [1934]: 250), the Choctaw census of 1830 (47-48), and the Creek census of 1833 (111).

The extent to which special censuses were sometimes used as instruments of control is clear from a 1901 memoir describing a census carried out in the San Carlos Indian Reservation in what is now Arizona:

In 1884 a complete census had been made, the tribes being enumerated under

their head chiefs and each camp of Indians of the same tribe under its head

man. Brass tags of different shapes with one shape for each tribe had been

provided. The band or subdivision of a tribe was designated by a letter of

the alphabet, and each [mem]ber of a band had his number, stamped by the

provost officer on the tag of the proper shape and given to each Indian

whose name was recorded in books kept for the purpose. Each man was

required to wear his tag at all times and to produce it when called

upon…. Any failure to comply with these regulations was severely

punished, and in a short time the system worked to the perfection I found

it on my arrival (Elliot, 1948: 98).

Elliot also made clear one motive for and use of the census and tag link: “Any American who would attempt to burden himself or his memory with a number of Indian names would soon be hopelessly lost, but tag numbers and the records made it very simple to locate a special individual” (1948: 98). Describing the successful use of the brass tags in wrapping up an investigation of! some off-reservation Indian deer poachers, Elliot noted “the officer … called the band of Indians together and walking down the line without a word, only looking at their tags, selected the men he wanted…. The chief and all his band were astonished but promptly complied and their [sic] culprits were duly punished” (100-101). (Hochschild [1998: 163] recounts an almost identical system that required rubber workers in the Belgian Congo in the early 1900s to wear numbered metal tags so that it could be determined if each person’s daily production quota was made.)

Internment of the Japanese Americans during World War II

The history of the internment of Japanese Americans during World War II is well known (see, for example, Daniels, 1981; Irons, 1983; U.S. Commission, 1997; Weglyn, 1996 [1976]). Yet only recently (Seltzer and Anderson, 2000) have the activities of the United States Census Bureau in the internment been examined systematically by those with a technical understanding about how population data systems operate. The Census Bureau tabulated, published, and widely disseminated a series of special releases, the first of which was released two days after the Japanese attack on Pearl Harbor, which provided extensive data on people of Japanese ancestry based on the race item in the 1940 Census. The Census Bureau also gave direct assistance to the military authorities on the West Coast by: (1) providing tract-level tabulations of Japanese Americans from the 1940 Census in January 1942; (2) posting, beginning in late February 1942, one of the most senior members of the bureau’s technical staff, Calvert Dedrick, to San Francisco to assist in the evacuation and internment effort; (3) making available census-block maps showing the number of Japanese American enumerated as residing in each block; and (4) strongly supporting, through the efforts of Dedrick and the bureau’s director, J. C. Capt, an ultimately unsuccessful proposal to establish a national population-registration system for military and statistical uses in the early months of the war. Despite accusations that microdata were released (Toland, 1982), the bureau has denied the charge and no definitive evidence to the contrary has emerged. Nevertheless, there now seems to be general agreement that the provision of mesodata and professional expertise violated the spirit, if not the letter, of the promises embodied in the census confidentiality laws (Prewitt, 2000).

Soviet Union

The study by Blum (2000) of the former Soviet Union during Stalin’s dominance reconfirmed several instances when census microdata were used to target minority population groups for forced migration and other human rights violations. It also found that by the time of the 1937 Census, Stalin apparently was relying on other data systems for microdata and that the census was primarily used as a source of macrodata to evaluate policies, including those related to forced migration and other programs with human rights consequences.


A comprehensive population registration system had been a tool of colonial administration in Rwanda over most of the twentieth century. In the 1930s this registration system was used to help fix the identity of the population in terms of the hitherto somewhat amorphous categories “Hutu” and “Tutsi,” primarily to assist a pro-Tutsi policy by the Belgian colonial administration based on pseudoscientific racial grounds. The registration system continued, along with related identity cards, when the Belgians switched their support from the Tutsis to the Hutus in 1959, and after Rwanda gained its independence in 1962 (Des Forges, 1999). This same registration system was operating throughout Rwanda up to the start of genocide in April 1994. The system was based in local government offices in each Commune and generated, inter alia, monthly statistical reports providing the number and basic demographic characteristics of the population–classified by ethnicity–living in each local administrative area. These monthly statistical reports, along with monthly lists of individual births, deaths, marriages, and those moving into and out of the area, as well as annual lists of the population, all classified by ethnicity, were transmitted up to the Prefecture and to the capital, Kigali. Information from this registration system was used to plan and assist in the implementation of the killing operations.

Possible Factors Leading to Misuses or Attempted Misuses

It is possible to hypothesize a number of motivations leading to the misuse or attempted misuse of a population data system that might contribute to a major human rights abuse. Such a list of motivations might include ideology, including racism; patriotism; obedience or fear; bureaucratic opportunism; or professional zeal. From the twentieth-century cases studied so far (which are not necessarily representative of all cases), ideology, patriotism, and fear seemed less decisive in determining complicity than bureaucratic opportunism and professional zeal. This finding is similar to the observation by David, Fleischhacker, and Hohn (1988: 89) that the willingness of German medical scientists, “even if they did not fully embrace Nazi racism,” to teach anti-Semitic racial hygiene could be attributed to the fact that they “welcomed the opportunity of translating their theoretical research into government policy.”

Bureaucratic opportunism and professional zeal certainly seemed paramount in the proposals of Buhle and Carmille in France (Remond, 1996) and Jahn in occupied Norway (Sobye, 1998) during World War II, the actions of Lentz in the Netherlands and Richard Korherr (Himmler’s statistical specialist) in Germany in furthering the Holocaust (Seltzer, 1998), and the efforts of Capt and Dedrick in their proactive assistance in 1941 and 1942 in the internment of Japanese Americans (Seltzer and Anderson, 2000). (By contrast, ideology was certainly involved in Walker’s decisions about the coverage of Indians.) Capt, Carmille, and Jahn were well-regarded heads of national statistical agencies, each making many positive contributions to the development of their country’s statistical services over their careers. Similarly, Dedrick (in statistical methods and organization), Korherr (in demographic statistics), and Lentz (in population registration) were each highly regarded senior technicians with extensive experience and responsibilities in their respective fields. While the grouping of Capt, Carmille, Dedrick,Jahn, Korherr, and Lentz for the purposes of studying motivation is analytically justified, there are important differences among them in terms of the level of human rights abuse with which their actions or proposals may be linked.

Potential Safeguards: An Introduction

A variety of safeguards are available that may help to deter the use of population data systems in assisting in the planning or carrying out of major human rights abuses. While in most circumstances few of these safeguards are absolute, they each can help to discourage a contemplated misuse by raising the cost of such a misuse, either in financial or political terms. Moreover, even if one or more safeguards successfully discourages the use of a population data system in assisting in a human rights abuse, the underlying human rights abuse may still take place. However, without the assistance of the data sought, the efficiency of the perpetrators is likely to be reduced. As a result, lives may be saved and the extent and duration of other harms reduced. (On the other hand, most, but not all, safeguards have the unfortunate side effect of also reducing the analytical usefulness of the resulting data.) Five different safeguards may be distinguished.

Substantive Safeguards

The ultimate safeguard is not to gather or save data that permits associating an individual with a potentially vulnerable group. This safeguard, while often perceived as reducing the analytical or policy usefulness of the involved data system, has been deliberately employed in several countries that had histories of misuses associated with major abuses. For example, shortly after World War II, the Dutch authorities removed the item on religion from both the census and the population registration system (Berger, 1998). In France, “it is forbidden to stock in computer memory. .. any personal data relating directly or indirectly to the racial origins, political, philosophical and religious opinions, and trade union membership of individuals” (Article 31 of the Law of 6 January 1978, quoted by Leridon, 1999: 189).

Two major objections to this approach are often advanced. First, it is not always possible to know in advance which variables will turn out to be of relevance to potential perpetrators of human rights abuses. Although the topics that have usually been used to define target populations in national data systems come from a well-defined list (race, ethnicity, religion, country of birth or ancestry, mother tongue), in special circumstances victims have sometimes been defined in very broad terms (for example, urban residents, those with advanced education, or even the literate population). Second, and of more practical relevance, the elimination of variables may impose a considerable price in terms of a reduction of the descriptive and analytical usefulness of the data system. For example, in recent years, both in the United States and the United Kingdom, advocates for some but not all potentially vulnerable groups have been among the strongest supporters of continued or more detailed coverage. Such detailed identification is seen as important element in providing the statistical basis for various programs enacted to redress past discrimination as well as recognizing the groups’ presence within the overall body politic. Since such data have also been used to abuse vulnerable target populations either in the official statistical reports of individual countries or in studies by other users, the trade-offs involved deserve to be further examined.

Methodological and Technological Safeguards

Even if items and classifications that define one or more target populations are included in a national data system, a range of methodological and technological procedures can be used to reduce the potential negative impact of such inclusion. For example, if the data system is based on sample rather than full-count data gathering, the resulting information is of little help in providing microdata that can be used to provide operational lists of the members of a target population. Depending on the size and type of sample, the results may also be of limited usefulness in providing operationally relevant mesodata. Typically, even relatively large-scale national sample surveys based on multistage samples of clustered households would be of limited usefulness in this regard. On the other hand, essentially unclustered systematic samples of census enumeration records or population registers might provide operationally useful mesodata if results were shown for small geographic areas.

Another broad technological approach is the deliberate introduction of errors into the data set. These include systematically swapping responses for individual items between records, introducing perturbations in specific items, top (or bottom) coding of quantitative items so that unduly large (or small) responses are grouped together to protect the identity of respondents, coding categorical data in broad response categories, or using only large areal units for similar purposes. As discussed earlier with respect to the elimination of variables, reducing the level of substantive or geographic detail available also reduces the usefulness of the data for certain users and uses.(1)

Organizational and Operational Safeguards

Suitable organizational and operational arrangements have also been used to help protect against the misuse of population data systems, although to date these arrangements have not been systematically discussed. For example, in the Netherlands the population registration system is deliberately kept decentralized (Begeer 1998); in the United States decisions at the Census Bureau related to the release of data that may pose confidentiality issues are made by a committee that is independent of both the concerned substantive and processing divisions; and in several countries the machine-readable census-data files are stripped of most or all individual personal or exact address identifiers. More complex procedures have been used in some sample surveys that collect sensitive data. In one case three files were established: an anonymous data file, an identifier file, and a “bridge” file that provided the link between the other two files, with the bridge file kept in a foreign country immune from domestic court orders. As with other safeguards, the degree of protection afforded by such operational and organizational arrangements is rarely absolute, particularly with respect to threats posed by the misuse of mesodata. Nevertheless, the use of such safeguards–jointly with other approaches–can make misuse more difficult and thus deserve more careful attention.

Legal Safeguards

Legal provisions designed to protect the confidentiality of many kinds of information reported to statistical agencies are a standard feature of any modern national statistical system. The content, status, and effectiveness of these provisions vary from country to country. In terms of scope, these legal safeguards usually focus on identifiable microdata provided by the responding public. In some countries the enforcement is assigned to an independent privacy commission. As already indicated, in a few countries these laws and regulations even extend to barring the collection or storage of data on sensitive topics. In other countries, whatever external oversight exists is only advisory.

In normal times these laws appear to work well. However, in times of perceived national crisis legal safeguards can be set aside by legislative action or decree or are simply ignored. In the United States, both in World War I and II, the provisions of the census act that called for the confidentiality of data collected by the Census Bureau were set aside by the War Powers Acts adopted after the United States entered each war (Okamura, 1981). Yet even before the Second War Powers Act was introduced in Congress early in 1942, the director of the Census Bureau indicated his willingness to ignore the law so as to provide defense authorities a way of checking confidential census information on individual Japanese Americans (Seltzer and Anderson, 2000). The situation is even more precarious if a country is occupied or controlled by a foreign power.

Ethical Safeguards

Ethical safeguards in terms of agreed-on normative standards serve two quite different functions. First, they can remind statisticians and others responsible for statistical and related data systems that there are underlying professional norms that need to be considered in our daily work. Awareness of these norms helps to keep us from inadvertently crossing the boundary into socially harmful activities as we become caught up in the pursuit of our scientific, career, bureaucratic, or other professional interests. Second, pre-existing statements of ethical norms can also serve as a rallying point when legal and other safeguards appear to be overwhelmed in times of national hysteria.

Currently, a solid body of national and international statements of ethical norms exists to which statisticians and data users can refer. They include, at the international level, the “Fundamental Principles of Official Statistics” adopted by the United Nations Statistical Commission (United Nations Economic and Social Council, 1994) and the “Declaration of Professional Ethics” adopted by the International Statistical Institute (ISI) (1986). At the national level, examples from France (Association des Administrateurs de 1’INSEE, 1985), the United Kingdom (Royal Statistical Society, 1993), and the United States (American Statistical Association, 1999) may be cited. The auspices of these national norms include national statistical societies, government statistical agencies or their staffs, and individual private-sector statisticians. The bibliography attached to the ISI declaration provides a useful, if dated, review of national ethical norms in the field of statistics. Analogous statements of ethical norms also exist for many of the social sciences, although none has been specifically developed for demography or the population field more generally.

As important as the existence of agreed-on statements of ethical norms may be, continuing discussions of ethical issues among statisticians and major users of statistics may be even more important in promoting sensitivity to ethical issues in ongoing work. Indeed, Roger Jowell, chair of the committee that developed the ISI declaration, considered the educational role of a statement on ethics to be its paramount function; one of the reasons for adopting or revising a statement or code of ethical principles is the opportunity it provides for education and discussion (Jowell, 1981). It is also important to be alert to opportunities that foster wider discussion of ethical issues. One aspect of such discussions is the examination and review of possible past ethical lapses in official statistics. Forthrightness about the past is an important element in both strengthening ethical awareness and in building a network of other defenses against possible future misuses.


The line of research related to the misuse of population data systems reviewed in this paper suggests several distinct areas of further work that may be grouped under two headings: research and development, and exposition and dissemination.

Research and Development

Most of the studies of the misuses or attempted misuses summarized in table 1 also specified areas where further research was needed to clarify issues related to the case under study (Blum, 2000; Seltzer, 1998, 1999; Seltzer and Anderson, 2000). Based on the work by Sobye (1998) and Seltzer and Anderson (2000), archival research can be a critical component of such further research. Moreover, although the use of the population register in the 1994 Rwanda genocide has been established, its role has yet to be spelled out in detail. In addition, the further development of many of the technological and methodological safeguards noted will require further research on statistical and information theory, while the development of improved legal safeguards will require further legal research.

Two major statistical and national policy issues also need to be addressed: first, the continued routine use of official population data systems to collect data on the classification variables (i.e., race, ethnicity, religion, country of birth or ancestry, mother tongue) traditionally used to identify vulnerable populations; and second, the expanded use of comprehensive population registration systems. The use of ethnicity and similar classification variables is long established in most countries and the use of population registers is often proposed to countries without such a system (van Bochove, 1996; Phillips, MacLeod, and Pence, 2000). Nevertheless, given the recent research summarized here, it would seem only prudent to reexamine both issues as suggested in Seltzer (1998) and Seltzer and Anderson (2000).

Another important area for further research is other instances of human rights abuses that may have been assisted by the work of statisticians, national statistical services, or population data systems. Such a listing is far more problematic than the cases listed in table 1, since in these instances no clear link has been established and the level of the associated harms varies even more widely. A partial listing of instances where there is at least some measure of suggestive evidence that population data systems were associated with substantial human rights abuse include:

Colonial Africa, late nineteenth century-1950s. Although most regular

censuses during the colonial period excluded the “aboriginal” population,

evidence from Rwanda and the Belgian Congo suggests that in some areas

special censuses or registration systems did cover the African population;

South Africa, 1930s-1993. Although de Klerk (1998: 74) identified the

Population Registration Act as the “cornerstone” of apartheid, a full

review of activities related to all population data systems, including the

population registration system and regular and special censuses, seems in

order given that so many parts of the state administration were used to

further systematic abuses directed at the nonwhite population and the long

gestation period of the apartheid system;

Namibia, especially the 1960s-1980s. In view of the great sensitivity and

secretiveness manifested by the local authorities about the operations of

their population censuses when they were still under South African control,

a careful review of all population data systems again seems in order;

Korea and Taiwan, 1890s-1990s. Both Korea and Taiwan were occupied by Japan

in the 1890s, and in each case the Japanese established one or more

population registration systems. These registration systems continued after

the Japanese occupation ended at the conclusion of World War II and in both

Taiwan and Korea the systems played an important role in aiding

data-gathering operations used to document the effectiveness of modern

family-planning methods and programs in bringing about fertility reduction

in the developing world. However, until the 1990s Taiwan and South Korea

had repressive authoritarian governments (in North Korea the authoritarian

government remains). Thus, for Taiwan and Korea, the questions to be

investigated are whether the registration systems were used as systems of

repression and control, both during the long period of Japanese occupation

and subsequently, and whether the statistical authorities became involved

in or contributed to these control activities;

Countries of Eastern Europe, 1948-1990s. Because these countries were under

the domination of the Soviet Union during this period and because the

Soviet Union used population data systems to pursue minority and other

population groups seen as hostile, a careful review of the possible misuse

of population censuses and other data systems in Bulgaria, Czechoslovakia,

Estonia, the German Democratic Republic, Hungary, Latvia, Lithuania,

Poland, and Romania would seem prudent;

Many European countries since 1900. The treatment of Gypsies (Roma) in many

countries, particularly in Europe, has been a long-standing and continuing

human rights concern. The use of population data systems to assist in the

rounding up of Gypsies during the Holocaust has already been mentioned. The

extent to which population censuses and other population data systems may

have been used, in one country or another, to target Gypsies at other times

deserves review.

In addition to these potential cases, a number of other examples may be cited where circumstances suggest that some further research would be prudent, if only to eliminate the possibility definitively. These include: the roundup of people of Japanese ancestry in several Latin American countries in the period after Pearl Harbor, the forced migration of minority populations and other actions aimed at those seen as hostile in China in recent decades, and the treatment of Laps in Finland, persons of Finnish ancestry in Sweden and Norway, Aborigines in Australia, and suspected Irish “terrorists” in the United Kingdom, Arab “terrorists” in Israel, and Israeli “terrorists” in a number of Islamic countries.

Although human rights abuses have occurred in each of these additional instances, the evidence that any population data system may have been involved is purely circumstantial at this point. It is quite likely that, after investigation, population data systems and statistical personnel will be found to have had little or no involvement in the human rights violations mentioned. Nevertheless, given the range of examples cited in table 1 where links have been established, possible involvement cannot be dismissed without a knowledgeable and independent inquiry. (Excluded from table 1 and these listings of possible cases are human rights violations that seem to be linked with other types of data systems. For example, in Sri Lanka in 1983, anti-Tamil rioters were apparently guided to the homes and businesses of their victims by lists generated from official voting and tax records (de Alwis, 1983; Sanmugathasan, 1984: 66-67).

Until recently, research into human rights abuses has not involved population professionals with experience in data methods. Indications of the involvement of population data systems in human rights abuses tended to be mentioned, usually just in passing, as part of a larger history of a specific human rights abuse, such as from witnesses to or commentators on the forced migration of Native Americans in the nineteenth century (Foreman, 1953 [1934]); the forced migration and internment of Japanese Americans in the western United States during World War II (Daniels, 1982; Okamura, 1981; Weglyn, 1996 [1976]); the Holocaust (Czerniakow, 1979 [1968]; De Jong, 1965); and the 1994 genocide in Rwanda (des Forges, 1999). These scholars and modern human rights workers seem largely unaware of the potential usefulness of research findings that establish how population data systems were used to assist in planning or executing major human rights abuses. Future research in this area would greatly benefit from the active collaboration between those with training and experience in historical research and those equally knowledgeable about population data systems.

The lack of attention by demographers and statisticians to the issue may also have another source. It is difficult for professionals to examine situations where the work they do, often motivated by a mixture of social idealism and a belief in the purity of science, has become entangled in several of the major human rights abuses of the last 200 years. While there is increasing interest in the professional literature on the protection of confidentiality (see, for example, Duncan, Jabine, and de Wolf, 1993), there is also a need to examine critically the major misuses of population data systems in order to aid in assessing the strength of protections against confidentiality violations.

Exposition and Dissemination

One of the biggest obstacles to work in this area is the widespread silence that has followed the misuse of population data systems. The silence seems to be a mixture of both genuine ignorance and a reluctance by government statisticians to address the matter due to fear that forthright discussion of past problems may adversely affect future response. However, the evidence seems to indicate that such reticence only fosters respondent mistrust (Seltzer and Anderson, 2000). Fortunately, in recent years, statistical agencies in a number of countries have begun to address the role they and their former leaders may have played in past abuses. For example, the French and German statistical authorities separately decided to commission studies to thoroughly examine the possible role of their national statistical systems and methods in furthering the Holocaust. The results of these two studies were released in Azema, Levy-Bruhl, and Touchelay (1998) and Wietog (2001), and Statistics Norway published Sobye’s study in 1998. After reexamining the evidence related to the role of the United States Census Bureau in the surveillance and internment of Japanese Americans in 1941 and 1942, including the findings of Seltzer and Anderson (2000), the director characterized the bureau’s role with respect to the Japanese Americans in this period as “proactive.” By contrast, the Dutch statistical office, Statistics Netherlands, has itself yet to take any action. Preliminary results from a review by a member of the Statistics Netherlands staff, the only study so far undertaken by a Dutch statistician (Nobel, 2000), concluded that the staff and leadership of the Dutch central statistics office were blameless.

Exposition and dissemination in other areas are also valuable, including those involving different types of safeguards. For example, compilations and comparative studies of national laws relating to confidentiality and data protection, the organizational and operational measures used in different countries, and the relevant ethical standards and guidelines can be of great assistance, as can training materials that cover these matters. Of particular importance is the translation of a number of basic documents in the field. By their nature studies of the misuse of population data systems are often highly technical and rarely are covered by more general translation efforts. For example, for English speakers, Aly and Roth (1984), Azema, Levy-Bruhl, and Touchelay (1998), Remond (1996), Sobye (1998), and Wietog (2001) are a few of the texts that ought to be carefully translated.

The Role of Population Data and Population Data Systems in Documenting, Investigating, and Prosecuting Human Rights Abuses

This paper has focused on a dark side of numbers: how data and data systems have been used to assist in planning and carrying out a wide range of serious human rights abuses throughout the world. Fortunately, this is only one side of the story. Our introduction briefly referred to the general social utility of population data systems in a variety of fields. In addition to these benefits, population data systems and the results they produce have often directly aided efforts to document human rights abuses and prosecute perpetrators of these abuses. For example, statistics have been used to document and mobilize opposition to the lynching of African Americans (Wells-Barnett, 1991 [1895]) and the genocidal policies of King Leopold in the Belgian Congo (Hochschild, 1999), and in the trials of Nazi war criminals (Seltzer, 1998). These applications were often carried out by those with little real training or experience in population data. Moreover, some human rights workers and social scientists have placed special stress on what they consider to be the questionable usefulness of quantitative data in studying human rights abuses (for example, Goldstein, 1986).

In the last decade, the potential value of quantitative data and data systems as a tool by which human rights abuses can be described, investigated, and prosecuted has received attention from a growing number of statisticians, social scientists, and human rights workers. Jabine and Claude (1992), Spirer and Spirer (1994), and Ball (1996) provided some basic guidance for human rights workers. Subsequently, quantitative data and analysis have been used as evidence at the international criminal tribunals established for the former Yugoslavia (Brunborg, Urdal, and Lyngstad, 2001) and for Rwanda and in documenting and studying these and other human rights abuses by truth commissions and in other fora. Published reports describe the data and methods used to document the state killings in Guatemala over a 30-year period (Ball, Kobrak, and Spirer, 1999), the political killings in Kosovo in the early part of 1999 (American Bar Association and the American Association for the Advancement of Science, 2000), and the forced migration from Kosovo during the same period (Ball, 2000). Ball, Spirer, and Spirer (2000) provide further information on the methods used to estimate the number of killings in Guatemala and include reports describing research in connection with human rights violations in Haiti, Salvador, and South Africa.

In these current human rights investigations, little attention has been given to the possibility that data and related data systems may have been an element of the processes used to plan and carry out the human rights abuses under investigation. In many large-scale human rights abuses, statistical outputs, systems, and methods are a necessary part of the effort to define, find, and attack an initially dispersed target population. The discovery that such data and systems were so used is one means to establish that an abuse was not simply the result of a spontaneous and uncoordinated popular outburst of hate against members of the victim population. In addition, such findings, by providing evidence of the systematic actions of the perpetrators, go a long way toward establishing one critical element of “crimes against humanity” under international criminal law (DeGuzman, 2000: 375-376).

We conclude with a simple point: numbers and the systems that produce them are morally neutral. It is what we–or others–do with them that counts.

(*) An earlier version of this paper was presented by William Seltzer at the International Association of Official Statistics 2000 Conference, “Statistics, Development and Human Rights,” Montreux, Switzerland, 4-8 September 2000.


(1) For more detailed information on a broad range of these and other methodological and technological safeguards, see the documentation on the website of the Federal Committee on Statistical Methodology ( For a more general discussion, see Duncan, Jabine, and de Wolfe (1993).


