Saturday, December 11, 2010

American Community Survey: Methodology

The American Community Survey (ACS) methodology includes the set of methods and procedures that are used to select the survey’s sample, collect and process survey interviews, and produce survey-based estimates. The Census Bureau conducts the ACS as part of the decennial census program under the authorization contained in Title 13 of the United States Code. As with the decennial count, Title 13 requires households to participate and for the Census Bureau to keep all information confidential. The ACS surveys the country’s housing units and group quarters and the populations residing in them. In contrast to housing units, group quarters include such places as college residence halls, nursing homes, correctional facilities, and military barracks. The methods used to select and interview these two universes differ and are described separately below. The sample selection and data collection methods for the Puerto Rico Community Survey (PRCS) are similar to those used in the ACS.

All surveys require a frame from which a sample can be selected with known probability. The frame for the ACS is the Master Address File (MAF). The MAF is the Census Bureau’s official inventory of known living quarters in the United States and Puerto Rico and is maintained throughout the decade by a series of automated, clerical, and field operations designed to reflect the changing housing inventory. The ACS samples of addresses are selected from the MAF to represent both the housing unit and group quarters populations. When the resulting interviews are combined, the survey data provide estimates of the characteristics of the total residential population.

Sample Selection and Data Collection for Housing Units

The ACS employs a two-phase sample design for housing units. In the first phase of sampling, an annual sample of about 2.9 million housing unit addresses is selected. To ensure the production of reliable estimates for the smallest geographic areas, the initial sampling rates are based on the estimated number of occupied housing units in each census block. The first-phase sampling rates for 2008 ranged from a low of 1.5 percent to a high of 10 percent of the addresses in each block. The sample is then divided into 12 panels of about 240,000 addresses each. Each month a new panel of sample addresses is introduced, and three sequential modes of data collection are used to collect the survey data for the panel over a three-month time period. Table 1 summarizes this mixed mode design. The mail mode is first. Each month the Census Bureau mails questionnaires to the new panel of housing unit addresses. The bureau accepts mail responses over the panel’s full three-month data collection period. If the bureau does not receive a completed questionnaire from a household, they follow up with non-responding addresses initially by telephone, and then by selecting a subsample of the remaining non-responding addresses to visit in person. This overlapping panel design results in efficient continuous data collection within each mode throughout the entire year.

Table 1. American Community Survey Data Collection Schedule

About 95 percent of the housing unit sample addresses are complete enough for mailing. The ACS uses four mailings to obtain as many mail responses as possible. An advance letter is sent a few days prior to the delivery of the ACS questionnaire, followed by a reminder postcard. A second questionnaire is sent to all nonresponding addresses about three weeks after the initial questionnaire. An instruction booklet accompanies the ACS questionnaire and toll-free telephone assistance is provided to help respondents complete the form. The ACS questionnaire is mailed in English, with Spanish questionnaires available upon request. Once the Census Bureau receives a questionnaire from the household in the mail, they scan the form and conduct an automated review to determine if the answers are sufficiently complete. If the form lacks sufficient information, the bureau contacts the household by telephone to collect the missing data.

About five weeks after the initial mailout of questionnaires, the bureau identifies nonresponding addresses for follow-up by telephone. The bureau obtains telephone numbers from commercial vendors, and the nonresponding cases with an available telephone number are sent to the Census Bureau’s telephone call centers. Interviewers at the call centers contact these cases and conduct computer assisted telephone interviews. The call centers recruit bilingual staff to conduct interviews in languages other than English. Telephone follow-up for each month’s panel lasts about four weeks.

For 2008 and 2009, after the mail and telephone contacts, the Census Bureau received responses from about 51 percent of the initial sample. To reach the remaining households, they initiate a second phase of sampling. The nonresponding addresses are subsampled to create a list of addresses that will have a personal visit from a census interviewer. Cases that could not be mailed to or contacted by telephone because the addresses were incomplete are also included in the universe that is subsampled. To account for the expected differences among geographic areas in levels of response by mail and telephone, there are four second-phase sampling rates, which range from a high of two-in-three to a low of one-in-three.

The Census Bureau’s 12 regional offices manage the personal visit follow-up. The interviewers use computer assisted methods. As with the call centers, the regional offices recruit staff with specialized language skills. Interpreters are also hired to assist with data collection. Personal visit follow-ups for each sample panel lasts about four weeks.

The combination of these three modes of data collection is very successful in obtaining completed interviews from the housing unit sample. The survey response rate, which is weighted to reflect the probabilities of selection of both the initial sample and the personal visit subsample, is about 97 percent, which means that interviews are not obtained for only about 3 percent of the eligible housing unit sample. This does not mean that 97 percent of the first phase sample is interviewed. Due to nonresponse, ineligible sample addresses, and the second phase subsampling, only about 67 percent of the addresses selected in the first phase sample result in a completed interview.

Text box: Who is Interviewed in the American Community Survey

Sample Selection and Data Collection for Group Quarters

Group quarters (GQs) and the people staying in them are sampled each year from the MAF. The smallest GQs are sampled in a similar manner to housing unit addresses, and all residents of the facilities selected for the sample are eligible to be interviewed. For large GQ facilities, sets of ten people are identified, and the final number to be interviewed is determined by the total number of residents in the facility. The annual GQ sample of about 20,000 facilities is divided into 12 monthly panels and results in a sample of approximately 195,000 persons for the year.

Data collection in GQs takes place in two phases. First, interviews are conducted with the contact person or administrator for each selected facility, and arrangements are made to conduct the interviews with residents. Second, a sample of residents to be interviewed is identified, which varies depending on the size of the facility. The collection of information from GQs is a very labor intensive process, involving field representatives at each stage, from the initial visit to verify the status of the facility to the actual interviews, which can involve multiple layers of approvals and different types of data collection. At some facilities, administrators may accept and distribute forms that are delivered and then picked-up by field representatives because of difficulties conducting personal interviews, for example, health issues among patients in nursing homes or other skilled nursing facilities. In other types of facilities, such as federal correctional institutions, approvals to conduct interviews need to be received ahead of time from the Federal Bureau of Prisons and interviews may be restricted to only certain months of the year. In addition, some interviews may take place over the telephone. The interview for GQs consists largely of the same questions asked of the general population, minus the housing information. Data collection for each monthly panel of GQs lasts about six weeks and does not include a formal follow-up operation of non-responding residents.

These methods result in a high proportion of completed interviews from the GQ sample. The survey response rate, which is weighted to reflect the probabilities of selection, is about 98 percent, which means that interviews are not completed for about two percent of the eligible sampled group quarters population. However, given the difficulties associated with interviewing persons in institutions and other non-institutional arrangements, levels of missing data for questionnaire items are generally much higher for residents of GQs than for households.

Data Processing

Once the interviews are completed, the collected ACS data go through several steps before they are ready to be released as survey-based estimates. Some processes occur every month, but most happen only once a year. Electronic records of interviews conducted by telephone and in person arrive daily at the Census Bureau from the telephone call centers and regional offices, while paper questionnaires arrive daily by mail at the Census Bureau’s National Processing Center. The information entered on the mailed-back forms is captured and converted to electronic computer records compatible with the telephone and personal visit records. The forms are scanned, creating a digital image of each, and the marked check box responses are read and interpreted from the image using optical mark recognition software especially designed to read entries on the ACS questionnaires. Many ACS responses on mail returns take the form of alphabetic write-ins, which are keyed from their digital images.

The data collected from mail, telephone, and personal visit interviews are accumulated each month, and certain responses are sent to coding operations. These operations convert race, Hispanic origin, ancestry, language, industry, occupation, place of birth, migration, and place of work responses into numeric codes, which are then added to their data records in preparation for the final processing that occurs at the end of the year. All data records undergo several checks to determine if they will be considered interviews and used to produce ACS estimates. The completeness of the records can range from all required information collected to all information missing. Records that do not include the required minimum amount of data are treated as noninterviews in the estimation process. Inconsistencies can also occur within interviews in which a response to one question contradicts a response to another. Content edits are applied to the ACS interview records that correct for both missing and inconsistent information.

Content edits are run once a year on the set of interviews conducted during the entire calendar year. The edits are designed by subject matter experts and specified by survey item. The purpose of the edits is two-fold: to apply rules that modify internally contradictory responses in a consistent way that helps maintain the quality of the data, and to deal with missing responses. Some edit rules are quite simple, as when inconsistencies are identified and fixed: a person cannot be married if he/she is under the age of 15, and a person who reports their sex as male cannot respond to the fertility question as having given birth. In each case, flags identify these inconsistencies and rules are established to correct them. Other edits are very complicated and require comparisons of responses to several items. The consistency edit for housing value, for example, involves a joint examination of value, property taxes, and insurance. When the combination of variables is improbable, several variables may be modified to give a plausible combination with values as close as possible to the original. Edits are used to provide a value when a response is missing, which is done in two different ways: by assignment and by allocation. A value is considered an assignment when a response provided for a specific person or housing unit is used to generate the answer to a missing item for that same person or unit, as when a person’s first name is used to assign a value for sex when it is missing. Certain values are more accurate when they are provided from another housing unit or person with similar characteristics. When a response is provided from the data record of another person or unit, the value is considered an allocation. For example, if a person fails to provide a response for the occupation question, that person’s reported characteristics of age, sex, education, hours, and weeks worked are used as a basis for allocating an occupation from another person with similar characteristics. The Census Bureau monitors the levels of item allocation. Allocation rates are a key measure of data quality and are released each year concurrent with the survey-based estimates.

The final step in preparing the estimates to be published is called “disclosure avoidance.” Disclosure avoidance measures are applied to the final data records to protect the confidentiality of survey respondents. The procedure used is called “swapping,” in which a small percentage of household records are moved from one geographic area to another. The selection of households to be swapped targets the records with the highest risk of disclosure. All released data are created from the swapped data files.

Estimation

The ACS estimation process is based on pooling interview data collected over one-year, three-years, and five-years, and results in three unique sets of ACS estimates. Each year the ACS actually collects data from about 1.9 million housing units and about 145,000 people living in group quarters facilities. This sample size is sufficient to support one-year estimates for the largest geographic areas—areas with populations of 65,000 or more. Three years of data collection are needed to produce estimates for areas with populations of 20,000 or more. Estimates for the full set of geographies, including areas as small as census tracts and block groups, require five years of data, or 60 months of interviews. Multi-year estimates are released every year based on new aggregations that exclude the data collected in the earliest year and include the data collected in the latest year. Table 2 illustrates the release schedule and the sets of interview data included in each year’s release.

Table 2. Summary of American Community Survey Estimates and Release Schedule

All housing unit and group quarters interviews completed from January 1 through December 31 of a specific year are used to produce the one-year estimates. The collected data are weighted to account for the ACS sample design and for survey coverage and nonresponse. The basic estimation approach is a raking ratio estimation procedure that results in the assignment of two sets of weights: a weight for each sample person record (both persons in housing units and in GQs) and a weight for each sample housing unit record. Ratio estimation is a method that uses auxiliary information to increase the precision of the estimates as well as correct for differential coverage of geographic areas and population groups. The ACS takes advantage of the availability of independent estimates of total housing units and the population by sex, age, race, and Hispanic origin that are produced by the Census Bureau’s Population Estimates Program. This methodology results in ACS estimates that are consistent with these independent estimates for specified areas of geography.

The production of multi-year estimates follows a similar process. All interviews conducted from January 1 through December 31 of the three-year or five-year time period are pooled. New weights and noninterview adjustments are applied so that each year of data contributes proportionately to the multi-year estimates. Since these estimates cover a three- or five-year time period, the final adjustment is to a simple average of the independent housing and population estimates for the three- or five year time period.

ACS estimates are based on samples and are therefore subject to sampling error. Estimates of the sampling error associated with all ACS estimates are calculated and appear alongside published ACS estimates as margins of error.

Deborah H. Griffin

Susan P. Love

See also Long Form; Sampling for Content; Sampling in the Census