The administrative data collected by IMPAQ for the "Workforce Investment Act Non-Experimental Net Impact Evaluation" project were received from state agencies in three segments: annual Workforce Investment Act Standardized Record Data (WIASRD) or closely related files, Unemployment Insurance data, and Unemployment Insurance Wage Record data. The analysis were conducted for twelve states; however, based on the data sharing agreements, the Public Use Data (PUD) set includes data for nine states only. Our agreement for use of these data required that the identity of those states was not revealed. As a result, all geographical identifiers were removed to preserve states' anonymity.
The PUD set is provided in three ASCII files with SAS and STATA data definition statements:
- PUD_WIA.DAT with PUD_WIA.SAS, PUD_WIA.DO, and PUD_WIA.DCT
- PUD_UI.DAT with PUD_UI.SAS, PUD_UI.DO, and PUD_UI.DCT
- PUD_WAGES.DAT with PUD_WAGES.SAS, PUD_WAGES.DO, and PUD_WAGES.DCT
- Download ZIP file.
Before proceeding with the data cleaning, SSNs and other IDs were first replaced with random identification numbers to ensure that each individual has one unique ID across all components of the PUD set. Please note that invalid SSNs in WIA and UI data were kept, while invalid SSNs in wages data were dropped. As a result, the PUD set includes information for 2,735,007 respondents.
|# of records||147,159||3,674,685||44,372,589||NA|
|# of respondents||133,145||2,664,008||2,703,531||2,735,007|
During data processing, variable names, variable labels, and value labels were standardized across all states. Additionally, entire record duplicates were removed and the undocumented codes were recoded as missing.
Education was captured in years of attained schooling. In several of the states, a large portion of individuals were coded as having zero years of education. In cases where this number was improbable, we recoded them as missing. Also, education > 24 was recoded to missing and education > 20 but < 24 was recoded to 20. In some states, education was captured as a descriptive variable and following recoding was performed:
LESS THAN HIGH SCHOOL GRADUATE = 10 years of education
SCHOOL GRADUATE OR EQUIVALENT = 12 years of education
TECHNICAL OR ASSOCIATES DEGREE = 14 years of education
SOME COLLEGE/NO DEGREE = 15 years of education
BACHELORS DEGREE = 16 years of education
GRADUATE SCHOOL/NO DEGREE = 17 years of education
GRADUATE AND/OR PROFESSIONAL DEGREE = 18 years of education
The PUD set includes several date variables and all of them were captured in YYYYMM format. The original dates in format 00000000 or 99999999, or any other format not corresponding to a date were recoded as missing.
Race was categorized into four groups:
- 1=White only
- 2=Black (including black and any other racial category)
- 3=Other (any other specified racial category)
- 4=No racial category identified
The WIA data set was restricted only to those respondents who participated in Adult Local or Dislocated Local programs.
AGE at event < 18 was recoded to missing.
The wages data were provided by states usually with multiple records per ID and year-quarter, with wages from different employers reported on separate records. During the data cleaning, wages for the same year and quarter were summed up and the information about the employer with the highest wages was kept.
In order to preserve states' anonymity, we restricted each data file to quarters of data available across all states:
Note that not all states provided the same information, and variables not available in a particular state are coded as missing.
For more details regarding description of variables please see SAS or STATA data definition statements corresponding to each dataset.