# electric powered submersible pump survival

Petroleum Engineer, Quarter Corp. & Masters Degree Candidate Advisor Dr . Jianhua Huang With help via PHD Prospect Sophia Chen Department of Statistics, Texas A&M, University Station MARCH 2011 FUZY A common metric in Petroleum Engineering can be “Mean Time Between Failures or perhaps “Average Operate Life. It is used to define wells and artificial lift up types, being a metric to compare production conditions, in addition to a measure of the performance of the given surveillance & monitoring program.

Though survival competition analysis has been in existence for many years, the greater rigorous examines are relatively new in the area of Petroleum Engineering.

This kind of paper identifies the basic theory behind endurance analysis plus the application of all those techniques to this problem of Electrical Submersible Pump (ESP) Run Life. In addition to the general application of these types of techniques to an ESP data set, this kind of paper also attempts to answer: Is there a significant difference between the success curves associated with an ESP program with and without emulsion within the very well?

Although endurance curve evaluation has been in existence for many years, the more demanding analyses will be relatively new in regards to Petroleum Engineering. As an example in the growth of these analysis techniques in the petroleum industry, Electrical Submersible Pump (ESP) your survival analysis has been sparsely noted in specialized journals the past 20 years:?? First papers within the fitting of Weibull & Exponential curves to ESP run your life data in 1990 (Upchurch) & 1993 (Patterson) Papers discussing the inclusion of censored data in mil novecentos e noventa e seis (Brookbank) & 1999 (Sawaryn) Paper discussing the use of Cox Regression in 2005 (Bailey)

Unfortunately, the papers making use of these methods did very little to transfer the knowledge to the practicing Petroleum Engineers. They shared the technical concepts and equations, but not the practical familiarity with how to apply them to actual life problems or perhaps why these types of analyses increased the “take the average in the run life of failed wells approach most commonly used. THEORY OF SURVIVAL ANALYSIS Success analysis models the time it requires for situations to occur and focuses on the distribution in the survival instances.

It can be used in many fields of study in which survival time can reveal anything from time to fatality (medical studies) to a chance to equipment failing (reliability metrics). This conventional paper will present 3 methodologies to get estimating your survival distributions in addition to a technique for building the relationship between your survival syndication and one or more predictor factors (both covariates and factors). Appendix A has a set of important meanings relevant to your survival analysis. KAPLAN MEIER ( NON-PARAMETRIC ) Non-parametric your survival analysis characterizes survival features without supposing an underlying syndication.

The research is limited to reliability quotes for the failure times included in the data set (not prediction away from range of info values) and comparison of your survival curves one particular factor each time (not multiple explanatory variables). A common non-parametric analysis is Kaplan Meier (KM). KM is seen as a a lessening step function with gets at the observed event occasions. The size of the jump depend upon which number of occasions at that time t and the volume of survivors ahead of time to. The KM estimator offers the ability to estimation survival functions for correct censored data. ti is definitely the time where a “death occurs. my spouse and i is the quantity of deaths that occur by time ti. When there is not any censoring, national insurance is the number of survivors right before time usted. With censoring, ni is definitely the number of survivors minus the quantity of censored devices. The resulting curve, while noted, is actually a decreasing stage function with jumps on the times of “death ti. The MTBF is a area underneath the resulting curve; the P50 (median) the perfect time to failure can be (t) zero. 5. Lower and upper confidence times can be worked out for the KM competition using record software. A back-of-the-envelope computation for the confidence period is the KILOMETERS estimator +/2 standard deviations.

Greenwood’s formulation can be used to estimation the variance for non-parametric data (Cran. R-project): Number 1: Example Kaplan Meier survival contour showing calculate, 95% self confidence interval, and censored data points When you compare two endurance curves varying by a element, a visual inspection of the null hypothesis Ho: survival curves are the same, can be conducted by conspiring two your survival curves and the confidence time periods. If the self confidence intervals tend not to overlap, there is certainly significant facts that the your survival curves vary (with alpha < 0. 05%) COX PROPORTIONAL HAZARD (SEMI-PARAMETRIC)

Semi-Parametric analysis enables more insight than the nonparametric method. It can estimate the survival curve from a set of data as well as account for right censoring, but it also conducts regression based on multiple factors/covariates as well a judge the contribution of a given factor/covariate to a survival curve. CPH is not as efficient as a parametric model (Weibull or Exponential), but the proportional hazards assumption is less restrictive than the parametric assumptions (Fox). Instead of assuming a distribution, the proportional hazards model assumes that the failure rate (hazard rate) of a unit is the product of:? a baseline failure rate (which doesn’t need to be specified and is only a function of time) and a positive function which incorporates the effects of factors & covariates xi1 ” xik (independent of time) This model is called semi-parametric because while the baseline hazard can take any form, the covariates enter the model linearly. Given two observations i & i’ with the same baseline failure rate function, but that differ in their x values (ie two wells with different operating parameters xk), the hazard ratio for these two observations are independent of time:

The above ratio is why the Cox model is a proportional-hazards model; even though the baseline failure rate h0(t) is unspecified, the? parameters in the Cox model can still be estimated by the method of partial likelihood. After fitting the Cox model, it is possible to get an estimate of the baseline failure rate and survival function (Fox). A result of the regression is an estimate for the various? coefficients and an R-square value describing the amount of variability explained in the hazard function by fitting this model. Relative contributions of factors/covariates can be interpreted as:?? >0, covariate decreases the survival time as value increases, by factor of exp(? )? 0 scale; k>0 shape?(ln(2))1/k The Weibull shape parameter, k, is also known as the Weibull slope. Values of k less than 1 indicate that the failure rate is decreasing with time (infant failures). Values of k equal to 1 indicate a failure rate that does not vary over time (random failures). Values of k greater than 1 indicate that the failure rate is increasing with time (mechanical wear out) (Weibull). A change in the scale parameter,? , has the same effect on the distribution as a change of the X axis scale.

Increasing the value of the scale parameter, while holding the shape parameter constant, has the effect of stretching out the PDF and survival curve (Weibull). Figure 2: Example Weibull curves with varying shape & scale parameters The Weibull regression model is the same as the Cox regression model with the Weibull distribution as the baseline hazard. The proportional hazards assumption used by the CPH method, when applied to a survival curve with a Weibull function baseline hazard, only holds if two survival curves vary by a difference in the scale parameter (? ) not by a difference in the shape parameter (k).

If goodness of fit to the Weibull distribution can be achieved, a confidence interval can be calculated for the curve, the median value and its confidence interval can be calculated, and a comparison of the differences in two survival curves can be conducted. Goodness of fit can be tested in R using an Anderson Darling calculation and verified with a Weibull probability plot. Poor fit in the tails of the Weibull distribution is a common occurrence for reliability data due to infant mortality and longer than expected wear out time. STEPWISE COX & W EIBULL REGRESSION

Given a large number of explanatory variables and the larger number of potential interactions, not all of those variables may be necessary to develop a model that characterizes the survival curve. One way of determining a model is by using Stepwise model selection through minimization of AIC (Akaike Information Criterion). This model selection technique allows variables to enter/exit the model using their impact on the AIC calculated at that step. AIC is an improvement over maximizing the R-Square in that it’s a criterion that rewards goodness of fit while penalizing for model complexity.

APPLICATION TO AN ESP DATA SET As stated previously, these survival analysis techniques can be applied to many types of data in many industries ranging from survival data for people in a medical study to survival data for equipment in a reliability study. These methodologies have many uses in the petroleum industry; from surface equipment system and component reliability used by facility and reliability engineers, to well and downhole system and component reliability used by petroleum and production engineers.

As an example, this paper illustrates the use of these techniques on the run life of Electrical Submersible Pumps (ESP). ESPs are a type of artificial lift for bringing produced liquids to the surface from within a wellbore. Appendix B includes a diagram of an ESP. For this paper, the run life will refer to the run life of an ESP system, not the individual components within the ESP system. While this paper focuses on ESP systems, these same techniques could be applied to other areas of Petroleum Engineer interests including run life of individual ESP components, other types of artificial lift, entire well systems, etc.

DATA DESCRIPTION ESP-RIFTS JIP (Electrical Submersible Pump Reliability Information and Failure Tracking System Joint Industry Project) is a group of 14 international oilfield operators who have joined efforts to gain a better understanding of circumstances that lead to a success or failure in a specific ESP application. The JIP includes access to a data set of 566 oil fields, 27861 wells, 89232 ESP installations, and 182 explanatory factors/covariates related to either the description of the ESP application or the description of the ESP failure.

For the analysis described in this paper, a subset of the data has been used, restricted to:??? Observations related to Chevron operated fields observations with no conflicting information (as defined by the JIP’s data validation techniques) factors that were related to the description of the ESP application (excluded 27) factors not confounded with or multiples of other factors (excluded 30) factors with a large number (>90%) of non-missing data points (excluded 78) factors that were not free-form comment fields (excluded 27)

Appendix C has a list of the original 182 variables with comments on why they were removed from the analyzed data set, below is a table of the 20 remaining explanatory variables included in this analysis. SUMMARY TABLE OF DATA INCLUDED IN THE CPH/REGRESSION ANALYSIS: OBSERVATIONS: 1588 DESCRIPTION RunLife Censor Country Offshore Oil Water Gas Scale CO2 Emulsion CtrlPanelType NoPumpHouse PumpVendor NoPumpStage NoSealHouse

NoMotorHouse MotorPowerRating NoIntakes NoCableSys CableSize DHMonitorInstalled DeployMethod COVARIATE/FACTOR & # OF LEVELS Response Censor Flag (0, 1) Factor (7 levels) Factor (2 levels) Covariate Covariate Covariate Factor (5 levels) Covariate Factor (3 levels) Factor (2 levels) Covariate Factor (2levels) Covariate Covariate Covariate Covariate Covariate Covariate Covariate Factor (2 levels) Factor (2 levels) DESCRIPTION Time between date put on production and date stopped or censored 1 if ESP failure 0 if still running or stopped for a different reason Country & Field in which the ESP is operated Indication of whether the ESP was an onshore or offshore installation Estimated average surface oil rate (m3/day) Estimated average surface water rate (m3/day) Estimated average surface gas rate (1000m3/day) Qualitative level of scaling present in the well % of CO2 present in the well Qualitative level of emulsion present in the well Type of surface control panel used Number of pump housings Pump Vendor Number of pump stages Number of seal housings Number of motor housings Motor rated power at 60Hz (HP) Number of intakes Number of cable systems Size of cable Flag for installation of a downhole monitor Method of ESP deployment into the well FINDING THE P50 TIME TO FAILURE FOR A DATASET Example 1: Using the entire data set, what is the P50 estimate for the runtime of a Chevron ESP? The answers differ considerably for the 4 calculation types: METHODOLOGY Mean or Median Kaplan Meier Median CPH Median INCLUDES CENSORED?

No Yes Yes P50 ESTIMATE (DAYS) Mean: 563 Median: 439 1044 1043 ASSUMPTION non-e non-e None (as no comparison of levels/covariates, essentially same results as KM) Anderson Darling GOF for Weibull Distribution N/A N/A N/A ASSUMPTIONS MET? Weibull Median Yes 1067 NO (rejected the null hypothesis of good fit, due to poor fit in the tails) In this example, the biggest impact on the difference between the methods is the inclusion of censored data. A large number of the ESPs in this data set have been running for >3000 days without a failure and were excluded in the often used calculation of the average run life of all failed ESPs. Given that the Weibull distribution did not pass the Anderson Darling goodness of fit test, the most appropriate calculation would have been the KM or CMH.

Appendix E has the output from the various methodologies. The interpretation of these results is that the P50 estimate of run life for an ESP installation in Chevron is ~ 1044 days. Additional, output from the KM analysis sets the 95% confidence interval at 952 to 1113 days. Figure 3: Comparison of estimation methods for full data survival curve. Note the deviation of the Weibull in the tails of the data. COMPARING TWO SURVIVAL CURVES DIFFERING BY A FACTOR Example 2: Using the 2 level factor emulsion, does the presence of emulsion in the well make a significant difference in the P50 run life of an ESP system? METHODOLOGY Mean or Median Kaplan Meier Median CPH Median INCLUDE CENSOR?

No Yes Yes EMULSION P50 (DAYS) Mean 600 Median 458 606 533 NO EMULSION P50 (DAYS) Mean 536 Median 424 1508 1408 SIGNIFICANT DIFFERENCE? Don’t know Yes (visual Inspection of CI) Yes, with a Likelihood ratio test and a pvalue of 0, reject that B’s are the same. Yes, with a z test statistic and a pvalue of 0, reject that the scale values are the same. INTERPRETATION Well performance is about the same Wells without emulsion perform much better Wells without emulsion survive longer. Exp(B) indicates 2. 5 times increased survival time for no emulsion. Wells without emulsion survive longer. Scale parameter value indicates 2. 75 times increased survival time for no emulsion. ASSUMPTIONS MET? No. (Reject null hypothesis of prop. hazards with a p value of 0. 01. ) No. Reject null hypothesis of good of fit due to poor fit in the tails) Weibull Median Yes 531 1463 The more complex the methodology used, the more information is available to interpret the results. Again, the addition of censored data resulted in a very different interpretation of the data than just using the mean/median value of all failed ESPs; not just in the order of magnitude of the results, but also determination of which condition resulted in a longer run life. The results of both the CPH & Weibull methodologies are suspect due to their failure to meet the prerequisite assumptions. Looking at the plots, it is apparent that the fit is poor in the tails.

Appendix F has the output from the various methodologies The interpretation of these results is that wells without emulsion have >a 2x increase in their P50 run lifestyle than wells with emulsion. It should be noted that given the other factors that differ inside the operation of those ESPs, this difference will not be fully attributed only to the in emulsion, but this kind of interpretation should certainly lead to additional investigation. Physique 4: KILOMETRES estimated endurance curves pertaining to ESPs with and without emulsion with confidence period Figure your five: Comparison of appraisal methods (KM, CPH, Weibull) for ESPs with and without emulsion CHOOSING THE VARIABLES THAT CHARACTERIZE A SURVIVAL CONTOUR

Example 3: Of the factors collected by JIP, which usually most illustrate the success function? The actual variables collected in the dataset capture the variation inside the survival function? As stated recently, both Weibull & Cox regression match a model using explanatory variables. The introduction of Stepwise variable assortment to that regression allows the preferential fitted of the version by reducing the AIC. As Weibull regression can be described as special case of Cox regression with a Weibull baseline hazard function, and as Cox regression has less limited assumptions than parametric regression, this example will focus solely on Cox regression using Stepwise

one particular