Project Abstract

Behavioral analysis of data in process systems is the ultimate application of my accumulated experiences in statistics,
mathematics, and physics. The processes themselves are derived from mathematical and physical models, and are streamlined
using optimization methods that are at the heart of calculus. As the theoretically formulated processes are turned into production,
statistical analysis is used to isolate and perfect different aspects of the procedures to ensure the utmost efficiency of the overall
system. Research in process engineering is essential to improving industrial practices. In the case of plastic injection molding,
inefficiencies in production process result in the disruption in the uniformity of the product, excess wastage of materials, and other
potential disasters related to inconsistent measurements and calibrations of machines. The use of data driven methods to analyze
the entirety of the production process assists with the perpetual task of improving and optimizing the outputs while reducing errors
and eliminating opportunities for failure.

Wednesday, March 8, 2017

Background Research: Part 1

Hey everyone!


I know its been a while; I have been travelling and even though I have been continually researching, I haven't had stable internet access and so haven't been able to write out a post. 

I began my background research with Mr. Clark's materials from his Master Black Belt Six Sigma Training and Certification Course at the ASU School of Engineering. As a short summary, six sigma is a set of techniques, methods, and standards that are used in Process development in industry and manufacturing, as well as in business processes. The name itself comes from a common statistical practice that I learned in AP Stats. For data to be able to be statistically significant, it has to meet a certain threshold of satisfaction to prove that the results weren't due to random chance and that the data can be used to reasonably back hypotheses and draw conclusions, and the most commonly used threshold is 99.7% certainty, a value which corresponds with ±3σ (3 standard deviations either above or below the default value), totally a range of 6 standard deviations, thus 6σ. Six Sigma training is widely recognized and used across many industries, and the principles and methods can be applied to resolve a broad range of issues. The Master Black Belt is the highest level of certification, and other levels exist such as the green and black belt, and the course itself can be very expensive, and I had the privilege of accessing all of the materials from Mr. Clark's certification courses, and I am studying all the sections related to the application of statistical practices, such as multivariate regression, logistic regression, and categorical data analysis.

Multiple regression analysis is used when there are multiple inputs, but still one output. In simple linear regression, we deal with an input and an output, usually referred to as X and Y. The model for simple linear regression is y=β0+β1 X+ ε, where β0 and β1 are constants and ε represents random error. In multiple regression, there are multiple inputs, labelled X1, X2, X3,... The equation for this y=β0+β1X1+ β2X2+ β3X3+⋯ + βkXk+ ε for a model with k regression factors. Anyone who has takes calculus will recognize that this looks extremely familiar to a Taylor series expansion, and that is actually where this is derived from. The remainder is ε, and each dependent variable(X) had a coefficient(β). This model also can account for interactions between variables, adjusting the model to y=β0+β1X1+ β2X2+ β12X1X2 +  ε , and by letting X3 = X1 + X2, the model for interactions can be written the same as the original form, y=β0+β1X1+ β2X2+ β3X3+ε.  When using multiple regression, the goal is to create a fitted regression line using the above model that gives the best possible predictions, explains the behavior of the data, and uses as few independent variables as possible while providing good predictions for the response. The fitted regression line takes the form y ̂  = b0 + b1X1 + b2X2 + b3X3 .... bkXk. After developing the fitted regression line, it is subject to statistical testing to determine whether or not the independent variables are significantly related to the response variable. It is also subject to residual analysis and other model fitting methods to ensure that the line is the best possible line that accounts for the most possible data with the least residuals (difference between the actual data points and the value predicted by the line). Once the line passes all of these statistical examinations, it can be used to predict and model behavior the behavior of a certain output by manipulating the various inputs. 

I will continue talking about what I have learned from the six sigma materials as well as the other materials I borrowed from Mr. Clark in another post soon. 

4 comments:

  1. I think it's neat that you're able to relate real-world research to AP math classes! Sort of a running joke in Diff Eqs this year was how what we were learning could be useful in the real world, but it looks like you're answering that every day. Are you finding that taking AP stats gave you a jumpstart into this project?

    ReplyDelete
  2. Interesting and complicated stuff for sure, glad this is helping you put use to knowledge you gained previously. Also, it's pretty great you have access to all these resources!

    ReplyDelete