Statistical Optimization of Pharmaceutical Formulations

Because there are so many formulation/process variables a scientist must consider when developing a formulation, statistical experimental design and analysis allow both efficient and effective study of the same. This article provides several recommendations, in a succinct manner, in the use of statistical design. These recommendations are based on both my own experience and those reported in the literature. In fact, the literature is replete with examples of the successful use of this approach, a few which are cited at the end, covering a long time period. There are several advantages to statistically designed experiments, and when compared with other test methods, the results are striking. For example, one-at-a-time experimentation is 18% less costly but 190% less accurate; intuitive experimentation is 76% more costly and 55% less accurate; Bureau of Standards experimentation is 59% more costly, but 15% less accurate. In comparison, statistically-designed experimentation is actually 15% less costly and just 10% less accurate than traditional methods. Moreover, there are many other advantages to using the statistical design method. One chief reason is that it is strongly favored by regulatory agencies because it justifies the choice of ranges and finds a robust (optimum) region. In addition, it gives the researcher the ability to study interactions between factors. In contrast, merely studying one factor at a time does not allow the researcher to study interactions and is not scalable to production. The statistical design method often provides a more economical use of resources, especially when many factors exist and provides a greater chance of finding optimum conditions. Finally, predictions can be made about future experiments. There are several types of statistical design for pharmaceutical formulations, including:
Factorial Designs: (both full and fractional factorials);
Sequential Simplex Techniques;
Response Surface Methodology;
D-Optimal Techniques and
I-Optimal Techniques.(16) Statistical optimization allows the formulator to study a wide range of independent and dependent variables. Independent variables include formulation issues such as granulating solvent/lubricant/disintegrant /diluent concentrations, etc. or process issues such as tablet compaction pressure, mixer speed, lubrication time, etc. Dependent Variables; i.e., responses that can be measured, include tablet dissolution/disintegration time/hardness/friability, etc.
Create a Better ExperimentBelow are some suggestions for running experiments:

Factors must be based on experience/preliminary experiments;
Centerpoint replicates to estimate error/significance;
Tradeoff analysis for optimum combination; i.e., may accept a softer tablet to get higher dissolution;
Normalize these factors to orthogonal (-1/0/+1) for interpretation;
Contour plots are the most useful depiction of the data;
Equally space factors for simpler design and study extremes;
Run experiments in random order so as to eliminate influence of extraneous variables that cause “noise” in data;
Data are analyzed using Yate’s algorithm for determining significant effects. A significant effect gives a response that is greater than twice the standard error for a dependent variable of the control batches;
The control batch uses the mid-point values for the independent variables and represents the current process.
Factorial DesignFactorial design of experiments can be divided into two classifications: full and fractional factorial design. The full factorial design method is characterized by:

23 factorial design: 3 factors and 2 levels (high +1; low -1) = 8 (2x2x2) trials;
Graphically represented by a cube;
Coordinates of the vertices represent individual trials;
Area bounded by the cube is studied. However, because of the large number of trials often needed for full factorial designs, (2n: n = # factors at two levels; nos. of trials (t) based on the nos. of factors is: n = 2/t = 4; n = 3/t = 8; n = 4/t = 16; n = 5/t=32; n = 6/t = 64; n = 7/t = 128), industry often uses partial or fractional factorial design, which is described below. Fractional factorial design may include a five factor, orthogonal, central, composite and second-order design. The five factor is described below:

Half-Factorial: 2n x 1⁄2, with n=5. Therefore, 16 experiments are conducted at +1 and -1 levels, two additional levels (extreme levels) at +1.547 and -1.547=10 experiments (+/-1.547 are for quadratic terms to study the curvature), and one more experiment at the zero level (midway between above levels) and therefore, 27 experiments.
Orthogonal: independent estimation/ significance of regression coefficient – guarantees that effects of different Xs on a given Y can be independently estimated; central = equidistant from center; compo-site=linear, interaction and quadratic terms in the model (X = independent variable and Y = dependent variable).
The Second-Order “predictor” polynomial equation: 21 terms – “overall” mean-a, 5-linear terms – X, 5-quadratic terms - X2 and 10 interaction terms-XX;
Y=ao + a1X1 + --------- a5X5 + a11X,21 + ------- a55 X25+ a1a2X1X2 + ----------- a4a5 X4X5
Y=level of dependent variable; a=regression coefficient. (slope and indicates if the independent variable (X) exerts a large or small, positive or negative effect on a dependent variable). Such an equation is generated for each Y, relating it to the set of five Xs, (number of experiments must at least equal the number of coefficients in a chosen model).
ExperimentationThe experiments are carried out as per the Yates Algorithm, an example for which is illustrated in table I, with the experimental design illustrated in table II, derived from the author’s experience7.
When conducting the experiments, keep in mind that orthogonal coding (-1, +1, etc.) of the Xs allows the direct comparison of the magnitudes of the regression coefficients. Therefore, apply the “F” statistic to each regression coefficient and evaluate its significance. Be sure to perform the “0” or base experiment at the beginning, middle and end of experimental runs. Perform 27 experiments in a random order and measure responses on the resulting tablets (e.g., hardness, dissolution, etc.). Carry out statistical analysis and get mean values for each of the dependent variables. Finally, carry out computerized regression analysis on the data to determine the fit to the second order model.
Statistical AnalysisAn important part of the planning stage is to estimate the experimental error, which is a measure of the variability inherent in the study. A large variability makes it difficult to obtain a suitable mathematical model. To obtain an estimate of this error, complete experiments need to be replicated. Predictions will be only be as good as the fit of the data to the equations generated; i.e., the Index of Determination, the R-square value, should be greater than 90%. A low value indicates that the particular dependent variable does not follow a second order model. If the number of parameters in the equation (p) to be estimated gets close to the number of observations (n) the R-square value may be misleading; in such a case use of the adjusted R-square is recommended:
R2adj = 1 – (1 - R2) (n – 1)/(n – p). The Model F Value tests whether all the included regression coefficients (other than the intercept) are zero or not. A larger F value, (smaller P value – less random chance and hence, more significant), is a better indicator of the fit of the regression equation/model. “S” is an estimate based on degrees of freedom (df) of the square root of the variability about the fitted model; df = observations – parameters – larger df better “s” and the smaller the “s” the stronger the “predictor equation.” In the Model Reduction-Hierarchy Principle, if the absolute value of a coefficient is smaller than twice the standard error, then the coefficient is not statistically different from zero and therefore dropped from the model. In the Cook’s D test, a large value denotes an “influential” observation and, hence, the model must be fitted with and without the influential observation in order to assess the effect of this influential observation. To obtain the best “predictor” equation in the Stepwise Regression (hierarchical) method, start with an equation using all factors, before sequentially eliminating terms that are less meaningful. Be sure to perform this at different levels of significance.
Dimension ReductionDimension Reduction Techniques focus on critical Xs and Ys and therefore have the least number of terms in the model, which simplifies the regression equation. The first technique, the Spearman Correlation Matrix, can determine if any pair of variables (Ys) have correlations close to = +/-1, which indicates strong positive/negative association. If there is a correlation, measure only one Y and not both. If one Y is unrelated to all other dependent variables then it should be measured. The Spearman Correlation Matrix examines two variables at a time. The second, Principal Component Analysis, requires the selection of key dependent variables that best distinguish between infinite formulations in a computer optimization. It should be the criteria upon which one selects a formulation (e.g., dissolution and not friability). This key variable should alone be constrained for a faster selection of an optimum formulation. Some variables (e.g., tablet weight, thickness and friability) may not contribute anything to overall variability and hence would not help in distinguishing between formulations. Principal component analysis examines all variables simultaneously and not just two at a time.
Contour PlotsFinally, Contour Plots (topographical plots akin to maps) are drawn by a computer and allow the representation of a three-dimensional situation in two dimensions. The Contour Plot demonstrates the contribution of X, XX and X2 (the latter “curvature” effects) on Y.
Figure 1 (below) is a contour plot of the four responses: tablet and capsule dissolution at 10 minutes, hardness and ejection force plotted as a function of changing polyvinylpyrolidone and magnesium stearate with granulation solution held constant at 23.175 mg and croscarmellose sodium at 8 mg. (author’s work, reference 10). The symbol OPTIMUM corresponds to the predicted response at the recommended response. It is seen from this plot that the effect of a decrease in magnesium stearate from this predicted optimum formulation increases ejection force while an increase in magnesium stearate decreases hardness, tablet dissolution at 10 minutes and capsule dissolution at 10 minutes, thus justifying the selection of the optimum formulation.

No comments: