Monday, 26 April 2021

Design Of Experiments (DOE) in Minitab

Hii all ....!! 
Hope everyone is safe and healthy.

Previously i've shared a post on DOE on minitab and how to perform it, but it is quite in a simple way without much explanations.

Today, this post is going to be a full length post which will mostly cover all aspects of DOE.


This post might get heavier for those who have no previous experience, hence to gain some, visit: Design of Experiments Using Minitab

In Six Sigma, we have two ways of implementation which depends on the scenario i.e., DMAIC & DMADV.


Let's say for an example, we have taken a process which is designed in a laboratory and scaled-up in plant for meeting commercial orders. The theoretical yield that can be obtained is 750 Kg for 500 Kg input and post completion of design at lab scale the yield is standardized as 600 Kg with a acceptable variation of ~25 Kg on both sides. The remaining yield i.e., ~150 Kg (Theoretical yield - standardized yield) can be considered as losses which might be attributed by the limited conversion during reaction, losses of product due to partial distribution into spent workup layers, slight solubility of product in the solvent used in isolation.

Post completion of scale-up, the commercial yield is at a level of 550 Kg i.e., 50 Kg less than that of the lab standard, with some variations with about ~50 Kg on both sides which are attributed by some common cause variations.

Now, the actual scenario begins,

Our supply chain team has fore-casted an order which could be a future requirement. Based on the requirement and product time-cycles we came to be a conclusion that if we need to meet the supply demand then we need an average output of ~590 Kg / batch.

In this scenario, as per the lab design and standard yield (at lab) the maximum yield that can be obtained is 600 Kg (with some allowable variation) and the target to meet the requirement is 590 Kg/batch, then we can consider that there is a scope for improvement and as a regular practice we can have some kaizens implementation / PDCA, but to be more precise and to show a better commitment towards towards meeting the requirement we can implement a green belt six sigma project (i.e., DMAIC approach) to improve the output.

In an alternate scenario, lets say we need to get a output of ~650 Kg / batch to meet the future market demand, we can improve the process performance by proposing some kaizens but the point is the process is designed for a standard yield of ~600 Kg and that could be a limitation and to further increase the output, it requires a design change (i.e., DMADV approach). That is where mostly we implement DOE to further refine/optimize or to extend the design for meeting the demand.

So basically there is a insight about the difference in implementing the DMADV vs DMAIC approach to many that DMAIC shall be applicable to existing process and DMADV shall be applicable to new development, but the above scenario is an exemption to their insight.

Let's get into our topic i.e., performing DOE, but before that there is a need to understand the basics like Factorial, levels, factors, response etc. I'll explain these in manufacturing terms which will make easy for our pharma guys to understand.


What is a factor ?

A factor is a parameter which might / might not have a significant impact on the output which is in our scope of study.

What is response ?

The name itself is self explanatory, response is nothing but the output of the run / experiment.

What is a level ?

Level is the count of factor values that we need to study, simply lets say we have to study the impact of temperature at 0 to 5 C, then the level is 2. If we need to study at 0 C, 2.5 C, 5 C, then the level is 3.

What is Factorial design ? 

Factorial design is a tool which helps in studying the effect of factors and the interaction of factors on the output.

How to calculate the no. of experiments required for study ?

The total no. of experiments required for study shall be calculated as Level no. to the power of factor no.
Lets say, we have 3 factors (A, B, C) and 2 levels (1, 2), 
then the no. of experiments shall be 2^3 = 8.
The experiments shall be 
A B C
1 2 2
1 2 1
1 1 2
2 1 1
2 2 1
1 1 1
2 2 2
2 1 2
So, this is going to be full factorial.

What is half factorial design ?

Half factorial design includes only impact of the main factors and it doesn't bother about the impact of interactions between factors. Half factorial design (1/2 th fraction) experiment no. shall be calculated as Level no. (L) to the power of factor no. (F) - 1 i.e., L ^ (F - 1).

Similarly for 1/4 th fraction, the no. of experiment's shall be L ^ (F - 2).
& for 1/8 th fraction, the no. of experiments shall be L ^ (F - 3),
& for 1/16 th fraction, the no. of experiments shall be L ^ (F - 4).

1/4 th, 1/8 th, 1/16 th, 1/32 th fraction designs are simply called as Fractional Factorial Design.


What is resolution ?

Resolution indicates the degree of factorial. In Minitab DOE, the levels is considered as 2 by default, unless you proceed with "General Full Factorial Design". Below is the screen depicting the resolutions:
Lets say, for a 2 factor experiment the total no. of experiments shall be 2^2 = 4.
for a 3 factor design, the total no. of experiments shall be 2^3 = 8 and if we want to reduce it to half factorial design i.e., 1/2 th fraction, it shall be 2 ^(3-1) = 4, which is called as resolution 3.

Similarly, with increase in factors there will be increase in resolutions, which indicates we are reducing the study of interactions.

As the resolution count is low, the risk will increase because we are not studying all the interactions and this is applicable only when we are high on confidence about the interaction.

What are Replicates ?

Replicates itself indicates those are copy of the experiments which are previously performed for the same factor levels. So, then a doubt might strike through your mind that "Why to replicate and what's the necessity of replication?".

Why to replicate and what's the necessity of replication?

Replicates help us in identifying the variability and deriving smooth conclusions.

What are Blocks in DOE ?

Blocking is a technique which help in reducing the effect (i.e., bias and variance) due to nuisance factors, by separating the factors based on interest.

What are nuisance factors ?

Nuisance factors are those which have an impact on the response but not of primary interest.

What are Center Points ?

The term itself indicates that center point will take a value of levels mid point. Lets say i've taken two factors and the factors are Dose & pH, where the levels of these factors are between 2 and 10, that means the center point would be 6 and 6.

But please note that the no. of replicates are applicable only to the levels mentioned and not for the center points, until or unless we have selected blocks more than 1.

I've done a full time project during B. Tech final semester. As i'm providing it here, i have mentioned that "i've done it", but actually we had a team of 5 members and the project is "Industrial waste treatment (parameters we have taken is turbidity and COD) using Response Surface Methodology (Minitab)".

Being a team member i've dealt with lab experiments along with other members and our team leader (Battula Amritha) has taken the responsibility of performing DOE in Minitab. Being frank i was quite uncomfortable with DOE during the project time, but after started working with Dr. Reddy's Labs got the significance of it and i'm grateful to our guides / Mentors "Ms. Kalyani Gaddam & Dr. Shisir Kumar Behera".

Study Example

Lets start our show,

Lets jump into topic i.e., creating a factorial design for identifying the best set of parameters to get optimum output. And begin the design for a reaction where the factors are Reagent mole Equivalent, Temperature and Dosing time.

As per the proof of concept studies (i.e., POC), to further finetune the process DOE study is proposed and the levels of the factors are as below:

Reagent mole Eq.: 1 to 5,
Temperature : 20 to 80 ℃,
Dosing time: 2 to 10 hours.


So, now there are three factors and the levels are 2 for each factor.
The number of experiments shall be 2 ^ 3 = 8 experiments.

Lets have one center point for these factors to evaluate the performance in-depth i.e., if we don't have center point the performance of the factors at the mid can't be understood.

So the no. of experiments would be = 8 (full factorial) + 1 center point = 9 experiments.

And to have better understanding of the bias / variability, lets have replicates and i would prefer 3 replicates each.  So the number of experiments would be = 8 x 3 + 1 = 25.
[please note that the replicates is not applicable to center point here].

Step - 1: Create a factorial design
[Approach: Stat --> DOE --> Factorial --> Create Factorial Design]



Then the session window would be displayed with the factors, runs, blocks, replicates and center point we have selected and the worksheet would be containing the StdOrder, RunOrder, CenterPt, Blocks and the factors we have selected. 

The worksheet is depicted below:


Step - 2: Now its time to perform the experiments as per the randomized runs provided by minitab in worksheet & Include the results of the experiments against the runs as shown below:
[i've considered the conversions just for completing the case study]


Step - 3: Analysing the Factorial design
[Approach: Stat --> DOE --> Factorial --> Analyze Factorial Design]


Below is the output after analyzing [check in the session window]



Interpretation:
From the ANOVA (Analysis of Variance) table, it can be observed that the P - value for all of the factors and interactions between factors reported as less than 0.05, which indicates that those interactions are having significant impact on the conversion (i.e., output response).

The response can be predicted based on the regression equation,

Conversion (%) = 51.62 - 0.097 Eq. + 0.0135 Temp. - 0.424 Time + 0.05174 Eq.*Temp.
- 0.0069 Eq.*Time + 0.00712 Temp.*Time + 0.00191 Eq.*Temp.*Time
+ 31.04 Ct Pt

** Ct Pt are error estimates, which shall be eluted during the usage of center points.

Rationale for P - value & 0.05:
Rationale shall be explained in my next post, which shall be about hypothesis testing.

Below are the graphs (Normal & Pareto)
Normal Plot:
Interpretation:
Based on the normal plot, it can be concluded that the interactions are having significant impact on the output response.

Pareto Chart:
Interpretation:
From the above pareto chart, it can be concluded that the factors and their interactions are having significant impact on the response, except interaction ABC, which is below the dotted line.

Step - 4: Interpretation using Cube plot
[Approach: Stat --> DOE --> Factorial --> Cube plot]

Cube Plot:

Interpretation:
From the above cube plot, the 3D interactions can be observed and the way in which response is varying can be found.

Step - 5: Analyzing through response optimizer
[Approach: Stat --> DOE --> Factorial --> Response optimizer]

Enter the target value for conversion as shown below:

Interpretation: 
From the above response optimizer, we can conclude that the target of 95% conversion with a desirability of 0.4000 i.e., 40%.

Now i've changed the target to 90 from 95, now check the interpretation:

Interpretation: From the above response, it can be concluded that the 90% conversion can be achieved at points Eq. = 5.0, Temperature = 80 C & Time =10 hours.


Now lets check by maximizing to possible extent with high level of desirability.

Interpretation:
From the above response optimizer, it can be concluded that the maximum conversion can be achieved is
92% at inputs of Eq. = 3, Temperature = 50 C and time = 6 hours.

Note: As the desirability increases, the probability of getting the predicted response will be high.


web survey

About The Author


Hi! I am Ajay Kumar Kalva, Currently serving as the CEO of this site, a tech geek by passion, and a chemical process engineer by profession, i'm interested in writing articles regarding technology, hacking and pharma technology.
Follow Me on Twitter AjaySpectator & Computer Innovations

Monday, 19 April 2021

Process Capability (Cp, Cpk) & Process Performance (Pp, Ppk)

H
ii Everyone ......!! Took a long break to focus on my job and studies ..!!

Hope everyone is safe from Covid - 19, if not wish you a quick recovery.

Today i would like to give a short demo on using Minitab for identifying the Process Capability and performance, which will be using in identifying whether process is in control and able to deliver the current / future business requirements.

In this post, i might be using some terms which might be new to some readers, but you are familiar with Lean & Six sigma topics, it will be easy for you.


Anyhow, i'll deliver those common terms prior post to make understanding for others better.

What is VOC & VOP ?

VOC refers to Voice Of Customer & VOP refers to Voice Of Process.

What is CTQ ?

CTQ refers to Critical To Quality.

What does Cp, Cpk represents ?

In this post context, Cp doesn't mean Specific heat. Cp, Cpk are Process Capability indices.

Cp - Potential Process Capability;
Cpk - Actual Process Capability.

What does Pp, Ppk represents ?

Pp, Ppk are Process performance indices.

What is the difference between Cp / Cpk and Pp / Ppk ?

I think many of you will be much confused with this, in simple the Cp/Cpk indicates the short term capability and Pp/Ppk indicates the long term capability of a process.

What are the formulae for calculating Cp, Cpk, Pp, Ppk ?

Cp, Pp have a similar formula but reminding you its not same,
Cp = USL - LSL / (6 x 𝝈w); Pp = USL - LSL / ((6 x 𝝈o);

USL - Upper Specification Limit; LSL - Lower Specification Limit;
𝝈w - Standard Deviation (within); 𝝈o - Standard Deviation (overall)

Cpk = min (Cpu, Cpl); it means it indicates minimum value of Cpu, Cpl is considered as Cpk;
Ppk = min (Ppu, Ppl); it means it indicates minimum value of Ppu, Ppl is considered as Ppk;

Cpu = USL - Xb / (3 x 𝝈w); Cpl = X - LSL / (3 x 𝝈w)
Ppu = USL - Xb / (3 x 𝝈o); Ppl = X - LSL / (3 x 𝝈o)

Xb - is the average of the data (simply mean)
𝝈w - shall be calculated by converting the data into sub - groups,
𝝈o - shall be calculated for the total population, i.e., for the total data.

What are UCL & LCL ?

UCL represents the Upper Control Limit & LCL represents the Lower Control Limit.

What are the types of variations ?

Variations are of two types, i.e., common cause variation and special cause variation.

How to identify Common cause and special cause variations ?

Special cause variations can be easily identified as the results due to these variations will be outliers. Common cause variations can be result of variation in operating parameters which are within specified ranges. 

How the UCL & LCL values derived ?

UCL can be calculated as Mean of the given data + 3 times the standard deviation
LCL can be calculated as Mean of the given data - 3 times the standard deviation

How to select the capability analysis for different data types ?

Usually data shall be classified into two types, i.e., discrete and continuous.
If you can confirm that the data is continuous then we need to check the normality, if the data is normal then Capability Analysis (Normal) shall be selected,
If the data distribution is non - normal, then we have to proceed with Weibull / Exponential analysis.

or else, in Minitab we have an option to transform non - normal data into normal using Johnson transformation / Box - Cox transformation.

If data is discrete, then again their can be further classification like binary data or ordinal data,
If its a binary data then proceed with Binomial, if its ordinal then proceed with Poisson analysis.

What does Z represents ?

Z represents Sigma level of process. As Sigma level increases the variability / defects will reduce per unit no. of batches / parts.

How to calculate Z ?

Sigma level shall be calculated (USL - Xb)/s (or) (Xb - LSL)/s.

Now lets perform the capability analysis for the available data.

CAPABILITY ANALYSIS DEMO

To demonstrate the process of capability analysis, let me assume some data:
[The data considered here is pure assumption, not taken from any where]

I've assumed data for ~50 batch for which the yield range would be 500+20 Kg , below is the data:

www.pharmacalculations.com

Step - 1 : Copy the data into Minitab worksheet and plot a control chart (I - MR chart).
[Approach for plotting I-MR chart: Stat --> Control charts --> Variable charts for individuals --> IMR..]
the plot for the above data would look like below:

below the control chart, there will be a text showing where the test is failing:


For Individuals chart:
TEST 1. One point more than 3.00 standard deviations from center line.
Test Failed at points:  12, 37.

For Moving range chart:
TEST 1. One point more than 3.00 standard deviations from center line.
Test Failed at points:  12, 13, 38.

These points are called as outliers, which are not lying inside the control limits i.e., UCL & LCL.

Inference from above control chart: The outliers indicate that the process is not in control, hence the batches which produced the outliers shall be investigated for variation in yields and shall be addressed by implementing the appropriate Corrective action and Preventive actions. Post implementation of the CAPA, the outliers shall be eliminated from data set and the control charts shall be re-plotted.

Re-plotted Control chart:


For Moving range chart:
TEST 1. One point more than 3.00 standard deviations from center line.
Test Failed at points:  2, 3, 38.

Inference from the above control chart: The outliers in the moving range chart indicates that the range of the consecutive points is lying outside the control limits, which might be attributed by some special cause variation. Again the variation need to be addressed through investigation and implementing the CAPA.

by eliminating the outliers, control chart shall be re-plotted:


For Individuals chart:
TEST 1. One point more than 3.00 standard deviations from center line.
Test Failed at points:  27

For Moving range chart:
TEST 1. One point more than 3.00 standard deviations from center line.
Test Failed at points:  2

Inference from the above control chart: The outliers in the moving range chart indicates that the range of the consecutive points is lying outside the control limits, which might be attributed by some special cause variation. Again the variation need to be addressed through investigation and implementing the CAPA.

by eliminating the outliers, control chart shall be re-plotted:


For Moving range chart:
TEST 1. One point more than 3.00 standard deviations from center line.
Test Failed at points:  16

Inference from the above control chart: The outliers in the moving range chart indicates that the range of the consecutive points is lying outside the control limits, which might be attributed by some special cause variation. Again the variation need to be addressed through investigation and implementing the CAPA.

by eliminating the outliers, control chart shall be re-plotted:



Inference from the above control chart: The process is free from outliers and we can conclude the process is in control now.

Now, we can proceed to step - 2.

Step - 2: Check the normality of data:
[Approach for checking the data normality: Stat --> Basic Statistics --> Normality test]
it would appear like below:




Insert Yield (kg) in Variable, click Ok [Ensure test for normality is selected as Anderson - Darling]

Inference: From the above normality test, the obtained p - value is found to be 0.422, which is greater than 0.05, hence the data can be considered as normal.

Story behind the greater than or less than of p - values:
The approach depends on hypothesis testing, there shall be two claims, one is null hypothesis and the other is alternate hypothesis.
By default, the null hypothesis would state that data is normal and hence the opposite claim to null is alternate which is data doesn't follow normal distribution..

Ho: Data is normal,
Ha: Data is non-normal.

We have to compare the p - value with level of significance (𝞪), by default the level of significance considered by minitab would be 0.05.

Hence, in this case p - value is 0.422 which is greater than 0.05 (𝞪), hence the data can be said as normally distributed.

[If p - value is less than (𝞪), then the conclusion would be data is not normal and the approach for estimating the process capability would be different, that i'll explain at the end]

Now, lets goto step - 3.

Step - 3: Performing Capability analysis:
[Approach for performing capability analysis would be Stat --> Quality tools --> Capability analysis --> Normal]



Inference: From the above Capability report, we can conclude that the process is capable of meeting the specification limit, but the process is skewed towards left due to non-centering of mean. The Cpk less than 1.00 indicates the short term capability is bad. As the short term itself is bad, there is no need to check the overall capability i.e., Ppk.

Hence the mean shall be moved towards the median i.e., ~504, the specification limit would be 504+20 Kg. Now lets check the process capability with revised specification limits:


Inference: From the above capability chart, we can conclude that the process capability is improved when compared with the previous one (i.e.,0.95 to 1.22), which is due to shift of mean towards the process median. Based on the overall process performance indices (i.e., Ppk), it can concluded that the process is marginal and there is a scope for further improvement.

Below is the classification for the Ppk values:

If Ppk < 1.00, then the performance is bad,
If Ppk is between 1.00 and 1.33, performance is typical,
If Ppk is between 1.33 and 1.67, performance is average,
if Ppk is between 1.67 and 2.00, the process is good,
If Ppk is above 2.00, the performance is world class.

Now, lets go back and say that our data is not normally distributed, then we have to transform the data using Johnson transform / Box - Cox transformation.

[Approach: Stat --> quality tools --> Capability analysis -->Normal (Transform: Johnson transform)].


Hope, you have understood, how to perform Capability analysis for the data.

So, most of the readers would have satisfied by here, and some of you must have a doubt like 'we have investigated and eliminated all the outliers in the process by addressing through appropriate CAPA, but still the process performance is only marginal and what can we do to make it good / world class ?'.

The solution for this query is we can make the process excellent by identifying some other patterns and some more investigations are required to make the process more robust, the tests are:
Note: While identifying the outliers we have used only 1st test, but to make the process more robust, the first 4 tests to be performed and any patterns are highlighted, there can be some special cause variations for that, which need to be identified during investigation and shall be addressed through appropriate CAPA. If there are no patterns observed than it can be concluded that the variation in the yields is highly attributed by common cause variations, which need to be identified and addressed.

Is the post helpful to you ?
Yes absolutely, well connected
No, not at all
time waste and needs improvement
Created with Survey Maker
About The Author


Hi! I am Ajay Kumar Kalva, Currently serving as the CEO of this site, a tech geek by passion, and a chemical process engineer by profession, i'm interested in writing articles regarding technology, hacking and pharma technology.
Follow Me on Twitter AjaySpectator & Computer Innovations