One of the perks of my job is that I get to poke at a lot of datasets from customers and prospective customers. PCM/WAT, wafer sort, final test, other production datasets, and all sorts of engineering datasets (gauge R&R, repeatability experiments, and on and on). It’s no surprise that one of the themes we see is a concern on the part of our customers that they have accurate and consistent test solutions. In recent weeks, some of the datasets that I’ve looked at data include a part run 30 times in a repeatability experiment, multi-site wafer sort data with a distinctive site-to-site failure pattern, production ramp results for thousands of parts run at hot and room temperatures to check consistency of results across temperature, and gauge R&R data (from a prospective customer) for ~50 parts runs two times each (repeatability) across program revisions (reproducibility).

An interesting aspect of the gauge R&R data was that we essentially ended up having to resolve for ourselves the actual method that our prospective customer (let’s call them “PC”) is currently using for its gauge R&R studies. For example, when we asked about a particular aspect of the measurement study, PC replied that they were using a t-test to compare results across program revisions (and they sent us the formula for a standard t-test from the help system from their current solution). It took some doing, but the first thing I noticed is that their results were not based on a standard t-test but rather a paired t-test (which is what you might expect since they ran the same parts across program revisions). A key notion to consider here is that a statistically significant difference in a paired t-test does not necessarily translate to a practical difference. (A subtle but consistent difference between two factors may prove to be statistically significant, but even if it is practically significant it may actually be the result of calibration differences or other factors.)

In their summary table, PC had a column labeled “R&R” that was really the precision to tolerance ratio (PTR). It’s easier said than done, but we recommend using standard terms (well, standard as can be) and not developing your own terms. Given that gauge R&R studies are designed to estimate precision, once you recognize that ‘tolerance’ is just another way of saying ‘width of spec limits’ then the term *precision to tolerance *ratio makes a lot of sense. Further, there was no indication in the summary table that PC was using 5.15 as the sigma multiplier and not 6.0. As we’ve noted before, a sure-fire way of improving your PTR metric is to use 5.15 instead of 6. Of course, PTR will be lower but it doesn’t mean your measurements are better. (Your workplace may dictate 5.15, but if you share your results with a broader audience you might want to note your choice of 5.15.) Note that when you use 6 as the multiplier, the ratio of %GRR to PTR is the formula for the process capability ratio Cp.

A much more interesting aspect of PC’s current method is that it does not include an interaction component, while the dataConductor method (standard gauge R&R using analysis of variance) does include interaction. While this isn’t always critical, in one case we noted a test with a strong interaction component and a PTR of 30% that our prospect considered to have a PTR of 14%. If there were no process variation, a PTR of 14% suggests a measured Cp of greater than 6, while a PTR of 30% suggests a Cp less than 3. The point here is that there really is process variation, and a PTR of 30% is usually considered on the high side. By using a method that essentially looked at one-factor-at-a-time but ignores interaction, PC had set itself up for some hiccups during the production ramp.

Design of Experiments guru Dr. Douglas Montgomery of Arizona State University likes to say that all experiments are designed experiments – some are designed well, and some are designed poorly. The basic design of this gauge R&R experiment that we’ve been discussing was fairly sound, but the analysis was lacking. While it’s hard to extract useful conclusions out of a poorly-designed experiment, a well-designed experiment still depends on a proper analysis.