HDCI Report Appendix: 

Sample Design 

The statistical goal of the project was to measure yeartoyear change in the extent to which health plans subject to Part 7 of Title I of ERISA are in compliance with various provisions of that Part. For purposes of this project, the universe of private sector health plans was divided into three segments  multiemployer plans, singleemployer plans sponsored by large firms, and singleemployer plans sponsored by small firms. 


Firms with 100 or more employees were considered to be large. A separate compliance measurement effort was conducted for each of the three segments of the health plan universe. The same statistical goal applied to measurements for each of the three segments of the universe  to measure yeartoyear changes in violation rates to within 10 percentage points with probabilities of type I and type II error of 5 percent or less. The caps on the two types of error guard against erroneous conclusions that EBSA could draw after completing its project and a followup project in some future year. Type I error would arise if the true universe violation rate had not changed at all and EBSA falsely concluded that the violation rate had changed. Type II error would arise if the true universe violation rate changed by 10 percentage points, and EBSA falsely concluded that it had not significantly changed. 

The sample size calculations were implemented using two sample size calculation tools:


In applying the sample size calculation based on the twosample ttest, the first sample is the base year (2001) sample, the second is the sample from whatever future year the project is repeated. The second tool does not compute sample size directly. It computes power^{(2)} for a specified range of sample sizes. Each run of the program reports the statistical power that results from 10 to 20 sample sizes evenly spaced across a specified interval. By running this program three or four times and adjusting the specified sample size interval as necessary, it is a simple matter to zoom in on the minimum sample size that produces the power of 95 percent or more. Compared to the second tool, the first has the advantage of computing sample size in a single run rather than through a sequence of runs. It has the disadvantage of requiring that the standard error of an estimated percentage p be approximated as square root of p(1p). As discussed below, the approximation turns out to be quite good, so both tools were used. 

Both of the sample size calculation tools assume that the universe size is infinite. The sample size computed using these tools was therefore adjusted downward to account for actual sizes of the three universes using the standard formula for finite population correction.^{(3)} 

Multiemployer Sample  The sample size can be calculated to achieve the target variance provided the estimated violation rate does not exceed a specified level. For surveys where no ceiling on the percentages to be estimated can be provided, a variancemaximizing estimate of 50 percent can be used. The problem with this approach is that the sample size will be larger than necessary if the estimated percentages turn out to be much lower than 50 percent.^{(4)} For the multiemployer sample, the violation rate ceiling used was 25 percent, based on an earlier EBSA project that estimated violation rates for Part 7 of ERISA. 

Using these assumptions, a sample size of 488 was computed based on the chisquare sample size routine. The twosample ttest requires the standard deviation, which was approximated as the square root of .25(1.25) which is .433. The routine based on the twosample ttest also requires specification of null and alternate hypotheses. The null hypothesis is that the base year mean is 25 percent. The alternate hypothesis is that the initial rate of 25 percent changes by 10 percentage points. The sample size produced using the twosample ttest procedure is 489, which is nearly identical to the chisquare sample size, whether the alternate hypothesis is specified as 15 percent or 35 percent. 

The sampling frame for multiemployer plans was the 1997 5500 file maintained by EBSA’s Office of Information Management for purposes of the Freedom of Information Act. It includes all types of employee benefit plans. Health plans were identified based on a question on the Form 5500 that indicates all of the types of welfare benefits that the plan provides. A code of ‘A’ flags health benefits. Plans entering an ‘A’ were classified as health plans regardless of what other codes were entered. Other codes that can be entered in this field identify dental and vision plans. Plans indicating that they provide dental or vision benefits were not included unless they also indicated provision of health benefits. 

Multiemployer plans were identified based on an entry of ‘C’ (multiemployer plan) or ‘D’ (multipleemployercollectively bargained plan) in the type of plan entity field. Plans entering plan entity code ‘F’ (group insurance arrangement) were also classified as multiemployer plans if they also indicated that they were collectively bargained. It was clear from the sponsor names that many of the plans identifying themselves as multiemployer plans did so incorrectly. The list was therefore manually reviewed to eliminate all obvious singleemployer plans. 

The edited samples frame that resulted from this process numbered 2,169 multiemployer plans. Correcting the infinite population sample size of 489 (the more conservative of the two estimates) for the size universe results in a multiemployer sample size of 399. 

Concepts for Large and Small Firm Sample Design  Sampling singleemployer health plans is not as easy as sampling multiemployer plans because there is no satisfactory sampling frame for these plans. The series 5500 reports do not constitute a satisfactory sampling frame because most health plans are exempt from filing under ERISA. To our knowledge, no firm or government agency maintains a comprehensive national list of health plans. It may be possible to construct a list of insured plans by obtaining lists of insurers from the states and lists of health plans from insurers. A sample frame could then be constructed by combining this list with a list of selfinsured plans based on 5500 filings. That process was considered timeconsuming, expensive, and uncertain. It was therefore decided to sample singleemployer health plans via the firms that sponsor them. 

In the parlance of sampling theory, firms serve as the “primary sampling units” because it is firms that are directly selected for the sample. Because the analysis is conducted at the plan level, plans are the elementary units. This type of divergence between the primary sampling units and the elementary units implies that the sample is a cluster sample rather than a simple random sample. If each firm had no more than one plan, then plan characteristics could be regarded as firm characteristics, and the sample as a simple random one. Because some firms sponsor more than one health plan, the large and small firm samples are properly regarded as cluster samples. Because a large majority of firms sponsor only one health plan, this cluster sample is close to being a simple random. 

Three alternative rules could have been used to associate health plans with sample firms:


All of these alternatives were considered statistically viable. The first introduces a statistical weighting issue, but was selected because it was considered to be the most consistent with procedures normally followed in EBSA investigations. Under this approach, plans covering workers of a parent firm and at least one subsidiary would require investigation if the parent or any of the participating subsidiaries fell into the sample. The probability of selection for each plan therefore depends on the probabilities of selection for the subsidiaries that participate in the plan. The probability of selection for each subsidiary depends only on whether it is large or small (with 100 employees being the dividing line). To accurately compute statistical weights, national office coordinators were asked to determine, and investigators to verify, counts of the number of large and small subsidiaries participating in each plan. 

To compute sizes of the large and small firm samples using cluster sampling theory would require three kinds of information about the health plan universe in addition to that required for a simple random sample:


All three of these data requirements pose serious problems. 

Data to meet the first requirement initially appeared to be available. The Bureau of Labor Statistics once published an article in the Monthly Labor Review that reported a distribution of firms by number of health plans based on its Employee Benefit Surveys. FosterHigginsMercer and KPMG each report distributions of health plans per firm based on annual surveys conducted by each of those firms. For two reasons, each of these sources significantly overestimates the number of ERISA plans that EBSA would have to investigate. 

First, these surveys included plans for workers at all locations of multilocation firms. Given the chosen strategy for associating plans with sample firm locations, the fact that each of a firm’s subsidiaries sponsors their own plans has no bearing on cluster size. Whether the parent, or one of the subsidiaries it covers falls in the sample, investigators would find only one plan covering workers of that firm. Thus data from any of these surveys would overestimate cluster size for this project. 

The second reason that these surveys would overestimate cluster size arises from the ambiguity concerning the word “plan.” In response to surveys such as those above, many companies that offer health insurance from multiple carriers would count each carrier’s offering as a separate plan. The entire set of health insurance offerings may be regarded as one plan under ERISA, however. Based on the ERISA definition, EBSA would recognize one plan and would open only one case that examines health insurance offered to the plan by any of the carriers. 

Employer identification numbers (EINs) on the series 5500 data could also be used to count the number of health plans per firm. In addition to being subject to the multiplelocation problem mentioned above, large firms may sponsor small plans, most of which would be exempt from filing. Thus 5500 data are also unable to provide usable estimates of health plans offered at individual firm locations. 

The second data requirement (withinfirm homogeneity in violation rates) is highly problematic. Not only does it require knowledge of the quantities EBSA is attempting to measure (violation rates) before they are measured, but it requires knowledge of the extent to which those quantities vary from plan to plan within firms having more than one plan. It seems reasonable to speculate that there would be a substantial tendency for plans within the same firm to be uniform in their compliance status. It does not seem reasonable to quantify that speculation in the absence of any supporting data.^{(5)} 

The third data requirement is to estimate the distribution of plans by their probability of selection. Within the large firm sample, the probability of selection for firms is designed to be uniform. Probabilities of selection for plans will not be uniform, however. As explained above, plans covering workers at multiple subsidiaries will be investigated if the parent or any of the subsidiaries it covers is selected for the sample. We are aware of no data that permit an estimate of the distribution of plans by the number of subsidiaries they cover, so this data requirement also remains unfulfilled. 

The lack of data with which to credibly estimate any of these data requirements lead to acceptance of a simple random design as the only feasible approach. To the extent that firms offer only one health benefits package at each location or consider the variety of benefits packages offered to be a single ERISA plan, the approximation is accurate. In the small firm sample, the approximation is undoubtedly very accurate. In the large firm universe, the available data can provide only an (possibly substantial) overestimate of the extent to which firms have multiple ERISA health plans covering workers at individual locations. 

Textbook formulas for computing the size of cluster samples cover only the simplest cluster sample designs where either cluster size is constant or the sampling fraction within each cluster is uniform. Cluster size for this project (number of ERISA health plans per location of a firm) is clearly not constant. Uniform sampling fractions are problematic when many clusters are of size one, because any sampling fraction less than one will cause entire clusters to drop out of the sample. Fortunately there are software packages that can be used to estimate variance for more complex cluster sample designs. 

Despite large gaps in the data required to implement a sample design for this project, cluster design tools offer the only approach to answering one fundamental design question  the number of plans to investigate for firms with multiple plans. Simple random samples do not involve subsampling, so the associated theory offers no guidance on this subject. This question is not trivial because most cluster samples present a tradeoff between some number of clusters subsampled at one rate and a higher number of clusters subsampled at a lower rate, where both designs achieve the target variance, and thus precision. The choice between the alternative designs is normally made on the basis of cost. 

To answer the question of the optimal number of plans to investigate per firm, the Office of Policy and Research used a software package capable of estimating variance from complex surveys, version 8.0 of the SAS/STAT software, which includes a variance estimation procedure called PROC SURVEYMEANS. The analysis using this procedure required estimates of the three factors mentioned above as necessary for estimating the size of cluster samples. Guesses regarding these factors were used, and the sensitivity of the conclusion to these guesses was examined. The SAS program simulates the consequences of alternative ceilings or caps on the number of plans investigated per firm. A cap of three, for example, would mean that all plans of firms with three or fewer plans would be investigated. At firms with more than three plans, three plans would be randomly selected for investigation. 

The simulations showed that estimated variance varied considerably between simulations with the same assumptions due solely to chance, and that the distribution of plans per sample firm was an important determinant of the variance. Thus to assure that the target variance would be met with a high degree of assurance, the program computes the 95th percentile of the variance. For each set of assumptions, the sample size was selected to achieve the target variance in 95 percent of the simulations. The figure shows how the number of large firms to be sampled and the number of plans to be investigated varies with the cap. Because the numerical assumptions underlying these estimates are mere guesses, the sample size estimates are not usable. The usable conclusion is that investigating all plans of sample firms minimizes not only the number of firms to be visited, but also the number of plans to be investigated. Fortunately, this conclusion proved insensitive to reasonable changes in the three determining factors.^{(6)} For this reason, it was decided to investigate all health plans covering workers at the selected location of each sample firm. 

Two of the ERISA Part 7 statutes^{(7)} are applicable only to plans having at least two participants who are current employees. To reduce the chances that plans located would be exempt from these statutes it was decided to limit the universe to firms having at least three employees. 

A comprehensive database of U.S. companies maintained by Dun and Bradstreet (D & B) was selected as the sampling frame. This database includes records for branch locations. According to the D & B definition, branches are locations of a company with no separate legal responsibility for their debts. For this reason, branch locations were believed to lack the authority to sponsor their own health plans. Although it is possible that a small number of firms sponsor separate health plans for one or more of their branches, including branches in the samples would have complicated the investigation of health plans for branches in the far more common situation where branch workers are covered under a headquarters plan. Experienced EBSA investigators judged the existence of separate plans for branches to be too rare to justify the added investigatory complexity. Branches were therefore excluded from the sample. 

The universe for the study was restricted in two other ways intended to simplify investigations and reduce their cost without significantly compromising the findings. First, sponsor firms were geographically limited to those sponsored in either the District of Columbia or one of the 50 states. Second, firms were limited to those having at least three employees. Although some firms with fewer than three employees sponsor ERISA health plans, most firms that small do not sponsor health plans, and many of those that do are not ERISA plans. The effort to screen large numbers of such tiny firms for ERISA health plans was judged too great to justify the small expansion in the scope of the study. 

At the request of EBSA, D & B drew two separate simple random samples from their database  1,604 privatesector firms having 399 employees, and 622 privatesector firms with 100 or more employees. These numbers of firms were calculated so that the number of inscope firms with health plans would at least equal the target sample sizes. 

The D & B database has no flag to distinguish private sector from public sector organizations. It does have an eightdigit Standard Industrial Code (SIC) code. A list of 17 D & B SIC codes (or ranges of codes) was used to exclude from the D & B sampling frame organizations such as public secondary schools that were clearly public sector organizations and organizations whose plans were judged likely to qualify for the ERISA church plan exemption. (See Attachment 1.) 

Calculation of Large and Small Firm Sample Sizes  In the EBSA project that was the source of the estimated 25 percent violation rate ceiling, plans were selected for investigation through EBSA’s normal targeting methods rather than through random sampling. Violation rates in randomly targeted cases will undoubtedly be lower than in targeted cases, but the magnitude of the difference is unclear. The sample size calculation for the large and small firms was based on a 22 percent violation rate ceiling. This ceiling resulted from the judgment that three percentage points is the smallest conceivable amount by which singleemployer violation rates in targeted cases could exceed those for random cases. 

Just as in the multiemployer sample, the infinite population sample size was computed using both of the available tools. The sample size computed using the ttest procedure was 448. The chisquare sample size procedure estimated a sample size of 446. The larger, and thus more conservative, sample size of 448 was corrected for the actual finite populations. After adjustment for a population size^{(8)} of 134,016, the large firm sample size became 444. The small firm population size of 4,957,773 was sufficiently large to leave the infinite population sample size of 448 unaffected by the finite population correction after rounding. These estimates of the size of the large and small firm universes were provided by D & B at the time of sample selection. 

Strategy for Contacting Firms and Multiemployer Plans  Achieving the target number of investigations of small firm plans, large firm plans, and multiemployer plans required contacting more than the target number of sample units due to firms/plans being outofscope, unreachable, or the subject of a nonproject investigation in the past 12 months.^{(9)} (The most common reason that firms, especially small firms, were outofscope was that they did not sponsor health plans.)^{(}^{10}^{)} The number of firms and plans to contact was therefore unknown at the start of the project. An approximation of the number of firms and plans to contact could have been calculated given estimates of the rates at which contacts would yield inscope health plans, but a more accurate method was chosen. 

A longerthanneeded list of sample units was prepared for each of the three samples and sorted into random order. The first round involved contacting firms and plans up to the target number of investigations from the top of the randomly ordered list. Based on experience from this round, the size of the second round of contacts was estimated. The target number of investigations for each sample was thus approached incrementally. 

Calculating Sample Weights  The sample weights are the ratio of the universe size to the sample size. For purposes of the weighting calculation, the sample size is the number of attempted contacts, as opposed to the number of plans investigated. Weights computed in this manner support estimates of the results that would have been found had the project screening and investigation methodology been applied to the universe of private sector health plans. Attempted contacts to sample units that did not lead to investigations because the sample unit was outofscope, unreachable, or ineligible for investigation due to a recent prior investigation (See Table 1) thus represent corresponding segments of the health plan universe that would not have led to investigation had the project targeted the entire universe. Among unreachable sample units (multiemployer plans or firms), there were an unknown number of inscope plans. No attempt has been made to impute the number of such plans or their violation rates. The inability to represent this portion of the universe results in some degree of underestimation of health plans in Table 2. Violation rates could be biased for the same reason in either direction, depending on whether violation rates among plans of unreachable firms were higher or lower than rates among reachable plans. 

Due to the incremental contact strategy, the number of attempted contacts was not known until near the end of the project. The final counts (including plans investigated under recent, nonproject Part 7 investigations) are shown in the sample size column below and in Table 1. 

The probabilities of selection are therefore: 



For multiemployer plans and for plans of large and small firms that cover no subsidiaries, the statistical weights are simply the reciprocals of the probabilities of selection, as shown in the last column. The probability of selection, P_{i}, for a plan i that covers L_{i} large subsidiaries^{(}^{1}^{1)} and S_{i} small subsidiaries is:


P_{S} and P_{L} are the probabilities of selection for small and large firms (or subsidiaries). This formula, which is derived in Attachment 2, was applied solely in the large firm sample because no plans covering subsidiaries were identified through the small firm sample. 

The weight for plan i is the reciprocal of P_{i}. Some of the weights that result from applying this formula using the large and small firm probabilities of selection shown above are: 



Reliability of Estimates 

EBSA attempted to minimize all types of error in this project. Nevertheless, violation rates estimated from this survey may differ from the true universe violation rates for a number of reasons:


Sampling Error  This error refers to the risk that the true violation rate among sample plans and firms differed from the true violation rate among all plans and firms simply because the random sample did not perfectly represent the corresponding universe. This is the error that sampling theory attempts to control and statistical theory attempts to measure with tools such as confidence intervals. 

Tables 3 and 4 provide lower and upper 95 percent confidence limits for violation rates for each sample and statute. The first row of Table 3, for example, shows a lower confidence limit of 41 percent and an upper confidence limit of 50 percent for the 45 percent point estimate of the overall Part 7 violation rate for all plans. The confidence limits indicate that there is a 95 percent chance that the interval from 41 percent to 50 percent brackets the true overall Part 7 violation rate. 

Response Bias  If the sample units from whom data cannot be collected are meaningfully different from sample units from whom data can be collected, the resulting response bias is a source of measurement error. Although response bias generally cannot be directly measured, a response rate is often computed to assess the potential for response bias. In this project, the response rate concept can be applied to phase I, to phase II, and to the project as a whole. For the large and small firm samples, the first phase involved calls by national office coordinators to sample firms provided by D & B. Coordinators were unable to contact 342 firms, 87 percent of which were in the small plan sample (Table 1). Thus for this phase of the effort, the response rate was 87.3 percent. Table A shows the derivation of this percentage and the considerable variation in these response rates across samples. 

The second phase of the project was the investigation of plans determined in the first phase to be inscope. EBSA has authority to investigate all inscope health plans and consistently invoked this authority to achieve a 100 percent rate of response for the second phase. 

Computing the response rates for phase 1 and 2 combined is more difficult because there is no way of knowing the percentage of unreachable firms that sponsored inscope health plans, so the denominator of the overall response rate is not known. Because 70 percent of small firms that could be contacted were outofscope, it seems likely that among unreachable firms, the percentage outofscope would be at least that high. That assumption underlies the estimates that appear in the bottom row of Table A. 

Because the actual percentage of unreachable firms that were outofscope could be as low as 0 percent or as high as 100 percent, combined phase 1phase 2 response rates are also computed using these assumptions. The result is a range of possible overall response rates from a low of 78 percent (if all unreachable firms are inscope) to a high of 98 percent (if all unreachable firms are outofscope). The response rate derived from the assumption that unreachable firms are inscope to the same extent as reachable firms is 86 percent, and it seems reasonable to hope that this estimate is low. 

Error in Identification of Firms with InScope Plans  National office coordinators contacted sample firms to determine whether they sponsored health plans. Sample firms determined to have inscope health plans were referred to the field for investigation. In some cases, the investigators found that the initial determination by the national office was wrong and that, in fact, the firm did not sponsor an inscope plan. There was no comparable check for firms determined by the national office not to have health plans. Thus it is likely that national office coordinators failed to identify all firms that had health plans. 

Coordinators began their contacts with firms by identifying themselves as employees of the Employee Benefits Security Administration because less direct approaches were regarded as unethical. One reason that inscope health plans may have been missed is that firms falsely claimed not to have a health plan because they knew they were speaking to a representative of the agency that investigates health plans. It is likely that violations rates among plans that were not identified were different from the violation rates measured, especially if deliberate evasion occurred. 

Sampling Frame NonCoverage  EBSA relied on the Form 5500 filings as the sampling frame for multiemployer plans, and on D & B for firm data. It is possible that multiemployer plans or firms with plans were missing from these frames. The potential for error from this source is probably small, however. Plans as large as most multiemployer plans are very unlikely to avoid filing partly because EBSA has a Division of Reporting Compliance that identifies nonfilers. Maintenance of the D & B database is a high priority for that company as it is the foundation for a number of that company’s products. It is frequently used as a sampling frame for surveys of firms. 

Investigator Error  As described in the body of the report, EBSA devoted considerable resources to training investigators for Part 7 investigations. Nevertheless, human error in identification or reporting of violations may have occurred. 

Table A 



Attachment 1 



Attachment 2  Computation of Plan Probability of Selection 

A plan i is in the sample if the sponsoring firm or any of its subsidiaries that have employees covered under plan i is in the sample. Assume that firms with subsidiaries have 100 or more employees and therefore fall into the large category. 

Let L_{i} be the number of large subsidiaries having employees covered under plan i. 

Let S_{i} be the number of small subsidiaries having employees covered under plan i. 

Let P_{L} be the probability of selection for large firms. 

Let P_{S} be the probability of selection for small firms. 

Let P_{i} be the probability of selection for plan i. 

1+L_{i} is the number of large subsidiaries or parents covered under plan i. 

1P_{L} is the probability that one large subsidiary or parent is not in the sample. 

(1P_{L})^{1+Li} is the probability that none of 1+L_{i} large firms or subsidiaries fall in the sample. 

1P_{S} is the probability that one small subsidiary is not in the sample. 

(1P_{S})^{Si} is the probability that none of S_{i} large subsidiaries fall in the sample. 

1P_{i} = P (Plan i is not in the sample) 



Substituting the two expressions derived above, we have: 

1  P_{i} = (1  P_{S})^{Si} (1  P_{L})^{1+Li} 

solving for P_{i} yields: 

P_{i} = 1  (1  P_{S})^{Si} (1  P_{L})^{1+Li} 

