Business Economics

the Statistics of Income Program of the Internal Revenue Service

The statistics corner: the Statistics of Income Program of the Internal Revenue Service

Tom Petska

FEDERAL TAX RETURN information is an integral part of the statistical infrastructure enabling analysis of the U.S. economy. Most of this information is compiled by a relatively obscure organization, the Statistics of Income (SOI) Division of the Internal Revenue Service (IRS). In spite of this obscurity, SOl data are part of the bedrock of the U.S. statistical system and central to the understanding of the economy as a whole.

This article is the first of two that provide a brief overview of the SOI program — its history, products, and services. In this first article, background information on the statutory origins and statistical processing of tax return information is provided. The major SOI programs and their principal customers are then summarized. In a later article, ongoing innovations in the functional structure and technologies in SOI are described. Finally, issues of access to confidential tax return information and services to users are discussed.


The compilation of economic and financial information from tax returns came into being after the adoption of the Sixteenth Amendment to the Constitution and the subsequent enactment of the first modern U.S. income tax law, the Revenue Act of 1916. This Act specifically called for the annual publication of statistics. In spite of many revisions to the tax law, this function remains in the current Internal Revenue Code, which is based on the Revenue Act of 1986 and specifies to – “… prepare and publish not less than annually statistics reasonably available with respect to the operations of the internal revenue laws, including classifications of taxpayers and of income, the amounts claimed or allowed as deductions, exemptions, and credits.”(1)

Like other Federal statistical agencies, the SOI Division’s mission is to collect and process data so that they become meaningful information. The mission of SOl differs from many other federal statistical agencies in two respects:

1. Unlike many statistical agencies that collect data through surveys, SOI collects data from the administrative records created from processing tax and information returns.

2. Although the IRS is a user of SOI data, the primary uses for SOI data are outside of IRS, in policy analyses on the effects of new or proposed tax laws and for evaluating the functioning of the U.S. economy.

To accomplish its statutory responsibilities, the SOI program presently requires an annual budget of about $25 million. If revenues from reimbursable projects are also considered, this total amounts to nearly $28 million. The SOI national office staff in Washington is comprised of about 200 people — mostly economists, computer specialists, and statisticians — and accounts for about 40 percent of total SOl staffing. This staff, working closely with customers, determines the content of each project, designs the samples used, and develops field processing instructions, then works with its field processing staff to carry out the program.

Data capture operations in SOI are largely conducted by paraprofessionals at five of the ten IRS service centers located throughout the country. Programming is done mainly by staffs of computer specialists at two “hub” service centers. Together these centers account for about 50 percent of SOI staffing. The remainder of SOI staff is based at the Detroit Computing Center where activities such as final data file creation are performed.

SOI’s statistical processing of tax return data has historically been separate from the mainline processing of tax returns for administrative purposes. SOI operations begin by sampling from tax or information returns in the basic tax administration (or Master File) system. The Master File offers a sampling frame that enables efficient and sophisticated sample designs to be used. After the returns are sampled, data elements already captured for administrative purposes are used as a starting point in statistical processing. These data are substantially augmented with other items from the tax returns. All data are then tested for consistency, and identifiable errors or inconsistencies are resolved.

In comparison to IRS administrative processing, which indudes 100 percent of the tax returns but only limited item content, the SOI programs collectively represent a small volume, albeit mainly large and highly complex returns, with extensive item content.(2,3)


Through its publications, including the quarterly SOl Bulletin, tax return information from SOI reaches many thousands of outside tax experts and the general public. Users of SOI data represent a broad spectrum of researchers, tax practitioners, and the public at large. As noted in Table 1, SOI user inquiries come from a very wide range of interests. The detailed income and expenditure data provided on tax and information returns are highly regarded because they are more reliable than similar survey data because penalties for misreporting exist.

The major statistical program in SOI includes studies of individual taxation, business statistical programs, international studies, and other special studies. For most of its nearly eighty-year history, the main emphasis of the SOI program has been individual and corporation income tax data. However, the SOI program now includes other data as well, e.g., on unincorporated businesses, tax-exempt organizations, and estates.

In 1980, the SOI program consisted of twentysix projects; now, in 1992, this number has more than doubled to nearly sixty. This two-fold growth in programs was accompanied by a parallel four-fold increase in the amount of data extracted from the various tax and information returns, all at virtually no increase in inflation-adjusted costs.

In addition to studies based on various types of returns, schedules provided with returns have also been the subject of separate SOI programs, such as for sole proprietorships, sales of capital assets, and the foreign tax credit. Studies also have been based on matching or linking various tax returns and schedules, such as for studies of business and their payroll and employment and for partnerships and their partners.

Individual SOl Program

Income and tax statistics from individual income tax returns have been published annually by the IRS beginning with tax year 1916.4 The content of the program is largely determined by the Office of Tax Analysis (OTA) of the Treasury Department for use in tax policy research and in estimating future tax revenues. The needs of other researchers for individual income tax data are addressed on a costreimbursable basis.

While the individual program has historically been based on an annual cross-sectional sample of individual tax returns, a major redesign of the program is currently underway.(5) From detailed discussions with OTA, it became apparent that the individual program needed to be refocused in three respects:

1. Because the annual crosssectional samples were not conducive to multiyear economic modeling (for such events as sales of capital assets), the sample has been redesigned creating a large panel of individuals imbedded within the annual crossectional samples.

2. Because family “economic units” reflecting households rather than individuals are more desirable as the focus of tax analysis, social security numbers of the dependents now reported on the tax returns of parents are used to obtain dependents’ returns and are combined with the parents’ returns to form such units.

3. Sampling stratifiers and selection rates have been restructured to enhance the samples of returns with greater policy interest, such as by including more persons with very high or low incomes and the aged.

Other studies closely related to individual taxation issues include the linkage of information documents (for example, W-2s for wages and 1099s for interest and dividends). The data from these information documents are being linked on a record-byrecord basis using social security numbers. This provides a file for studying both filers and nonfilers as well as the combined income of taxpayers and their dependents. Once these are brought together, this information document database will cover about 98 percent of the U.S. population.

Public-use microdata files of individual tax returns have been produced annually beginning with data for 1960. These files constitute a major break with past policies of not releasing information to private researchers relating to specific tax returns. However, steps have been taken to safeguard the confidentiality of persons in these files.(6)

SOI Business Programs

Although businesses can be legally organized in a variety of ways (including farmers’ cooperatives, tax-exempt organizations, and estates and trusts), most U.S. business activity is conducted by corporations, partnerships, or sole proprietorships. These three annual SOI programs are thus often referred to as the SOI business studies:

1. SOI corporate data have been published annually beginning with tax returns for 1916. These data are the only publicly available source of financial information on all corporations, because other sources include only large or publicly held corporations or those in certain industries. This also makes the program the basic source of data used by the Commerce Department’s Bureau of Economic Analysis (BEA) in estimating corporate profits for the National Income and Product Accounts.

The corporation program is rich in item content; complete income statement, balance sheet, and tax computation information have always been mainstays of the program almost since its inception. As with the individual program, this program is being restructured to meet the needs of OTA and BEA more effectively. Increased longitudinality is being designed into future studies, and, through the financial support of BEA, corporation statistics are being greatly accelerated beginning this year.

2. The annual SOI partnership program is vital to the National Income and Product Accounts because it is the only source of data on these businesses. Although partnerships are not taxed directly, they are required to file annual information returns, including an income statement, balance sheet, and schedules showing the shares of income or losses and other items distributed or allocated to partners. Partners are required to report the distributions or allocations from partnerships on their own income tax returns.

For many years, partnerships commanded only modest interest because they were not taxed directly and thus had no direct effect on Federal revenues; however, the proliferation of the use of partnerships in tax shelter activities has substantially increased interest, e.g., curbing deductions of partnership losses by individual partners was a key provision in the 1986 Tax Reform? Unlike the data for individuals and corporations published in separate SOI reports, these data are published annually in the SOl Bulletin.

3. Information about nonfarm sole proprietorship business activities is reported on Schedule C of the individual tax return, Form 1040. Profits from these activities are combined with income from other sources in order to compute individual “adjusted gross income .” Data on proprietorships provide the other half of information on unincorporated businesses for the National Income and Product Accounts. Here, again, the tax return is the only annual source of financial information about these businesses. Information on sole proprietorships is published annually in the SOl Bulletin.

SOI International Studies

International studies are conducted biennially or periodically in two broadly-defined areas: foreign investment and activity abroad by U.S. “persons,” and investment and activity in the U.S. by foreign “persons.” Studies of foreign investment and activity abroad by U.S. persons include: corporate foreign tax credit, controlled foreign corporations of U.S. corporations, foreign sales corporations, and individual income earned abroad, among others.s SOl statistical compilations of investment and activity in the United States by foreign “persons” m-‘ clude: foreign owned U.S. corporations, foreign corporations with income derived from U.S. sources, U.S. partnership income of foreign partners, and sales of U.S. real property interests by foreign “persons.(9)

Other SOl Studies

Major statistical programs by SOl are also conducted that are annual, biennial, or periodic, on tax-exempt organizations, certain tax-exempt obligations, estates, and excise taxation. Studies of tax-exempt organizations include those on information returns filed by private foundations, nonprofit charitable and other organizations that are exempt from taxation under Internal Revenue Code section 501(c), exempt organizations with “unrelated business income,” and tax-exempt private activity bonds.(10)

Estate tax studies are conducted annually based on year of death. Studies are also periodically undertaken using estate tax returns and mortality rates to estimate the wealth of top (living) wealthholders. A long-term research project is also underway based on estate tax filings from 1916 to the present to examine intergenerational transfers of wealth through inheritance.(11)

Excise tax studies have been selective and are regularly published in the SOl Bulletin. They include (or have included) returns of the quarterly crude oil windfall profit tax and the environmental excise tax on certain hazardous substances (i.e., the so-called “Superfund Tax”).


1 This is from the Internal Revenue Code of 1986, Section 6108(a), “Statistical Publications and Studies, Publication or Other Disclosure of Statistics of Income.”

2 Because SOI sampling rates generally increase with relative increases in the size of financial amounts, e.g., income or asset size, the returns in the SOl samples are, on average, substantially larger and more complex than those in the administrative population files of all taxpayers.

3 In all, over 200 million tax returns are administratively processed each year by IRS, of which only about 500,000 are selected for statistical purposes.

4 The report for 1916 also included some data for the period 1913-1915.

5 See Czajka, John and Walker, Bonnye, “Combining Panel and Cross-sectional Selection in an Annual Sample of Tax Returns,” 1989 Proceedings of the American Statistical Association, Section on Survey Research Methods, 1990, and O’Conor, Karen, Atrostic, B.K., and Gillette, Robert, “Moving from Descriptive Statistics to Inference,” Proceedings of Symposium 90: Measurement and Improvement of Data Quality, Statistics Canada, 1991.

6 Various measures are employed to make public-use files available while protecting taxpayer confidentiality, including purging names and other unique identifiers, rounding data items to make it much harder to identify individuals, and averaging the financial data of “similar” returns. For a discussion on these techniques, see Spruill, Nancy, “The Confidentiality and Analytic Usefulness of Masked Business Microdata, 1983 Proceedings of the American Statistical Association, Section on Survey Research Methods, 1984.

7 As a result of the 1986 Tax Reform Act, losses that were defined as “passive” could only be deducted to the extent that there were offsetting passive gains. See, for example, Nelson, Susan, and Petska, Tom, “Partnerships, Passive Losses, and Tax Reform”, SO1 Bulletin, Winter 1989-90, Volume 9, Number 3, April 1990, and 1989 Proceedings of the American Statistical Association, Section on Survey Research Methods, 1990; and Petska, Tom, “Partnerships, Partners, and Tax Shelters: Three Years after Tax Reform,” SOl Bulletin, Volume 11, Number 4, April 1992.

8 Many of these studies are used for Treasury Department reports to Congress that are mandated by law.

9 A compendium of SOI studies of international income and taxes has just become available, Statistics of Income — Compendium of Studies of International Income and Taxes, 1984-1988, Publication 1216, September 1991.

10 SOI studies of tax-exempt organizations have recently been released in an historical compendium, Statistics of Income – Compendium of Studies of TaxExempt Organizations, 1974-1987, Publication 1416, July 1991.

11 A compilation of SOI research in this area is planned for late this year.


Ahmed, Yahia, and Scheuren, Fritz, “The U.S. Statistics of Income Program: Issues and Challenges,” Bulletin of the International Statistical Institute, Number 48, 1991.

Ballenger, Louella, “Sole Proprietorship Returns, 1989,” SOI Bulletin, Volume 11, Number 1, Statistics of Income Division, Internal Revenue Service, August 1991.

National Research Council, Committee on National Statistics, Improving Information for Social Policy

Decisions: The Uses of Microsirnulation Modeling, 1991.

Pazulski, Victoria, “General Description of the 1980 Sole Proprietorship (Schedule C) Public-Use File,” Statistics of Income Division, Internal Revenue Service, 1983 (unpublished.).

Statistics of Income Division, Internal Revenue Service, Statistics of Income — 1988, Corporation Income Tax Returns, Publication 16, November 1991.

Statistics of Income Division, Internal Revenue Service, Statistics of Income — 1988, Individual Income Tax Returns, Publication 1304, September 1991.

Statistics of Income Division, Internal Revenue Service, Source Book, Corporation Income Tax Returns, 1988, Publication 1053, June 1991.

Statistics of Income Division, “Office of Management and Budget, Application Package — Quality Improvement Prototype Award,” August, 1990.

Statistics of Income Division, Internal Revenue Service, “General Description Booklet for the 1988 Individual Public-Use File,” 1990.

Statistics of Income Division, Internal Revenue Service, “75th Anniversary, 1913-1988,” SOl Bulletin, Volume 8, Number 2, Statistics of Income Division, Internal Revenue Service, December, 1988. U.S. Department of Commerce, Bureau of the Census, “Quarterly Financial Report for Manufacturing, Mining, and Trade Corporations,” 1991.

U.S. Department of Commerce, Bureau of Economic Analysis, Survey of Current Business, various issues.

Zempel, Alan, “Partnership Returns, 1989,” SOl Bulletin. Volume 11, Number 2, Statistics of Income Division, Internal Revenue Service, Nov. 1991.

* Tom Petska, Fritz Scheuren, and Bob Wilson are with the Statistics of Income Division, Internal Revenue Service, Washington, DC. The authors would like to acknowledge the assistance of many persons in the Statistics of Income Division who contributed to this article, ineluding but not limited to Wendy Alvey, Beth Kilss, Gene Otto, and Dan Skelly.

In past issues of Business Economics we have from time to time published articles describing the statistical programs of various agencies of the federal government. In this issue we present the first of two articles describing the work of the Statistics of Income Division of the Internal Revenue Service.

— Joseph W. Duncan, Editor, The Statistics Corner

Table 1

Statistics of Income User Inquiries by Type, 1991

Inquirer Telephone Letter

Total 100.0% 100.0%

Consultant/researcher 17.2 9.1

Accounting firm 4.2 13.0

Association 7.0 7.4

Law firm 2.3 8.5

Other private business 8.4 12.8

College 7.1 10.6

Public library 0.5 0.6

Private citizen 8.3 22.5

State/local Government 6.9 4.6

Internal Revenue Service 15.7 1.3

Congressional 7.0 2.4

Other Federal Government 8.0 1.5

Foreign 1.1 1.0

Media 4.8 3.5

Student 1.5 1.3

Other O.I 0.0

COPYRIGHT 1992 The National Association for Business Economists

COPYRIGHT 2004 Gale Group