About a year after leaving graduate school, I was sitting in a face-to-face meeting with a customer who was reviewing their progress on integrating my employer’s technology in their organization. During the meeting, their analytically savvy technology lead spoke about how they had used a numerical analysis method, “PCA”, to help determine the root cause of a problem they’d been having. Being the scientific analytical lead in my organization, and coming from a well known graduate institution, I was caught off guard – why hadn’t I heard of this problem solving method? After the meeting, I began looking into PCA. I learned the acronym stood for principle component analysis but quickly my searches led me into uncharted territory, with results involving matrices, singular value decomposition, and dimensional reduction. By this point, I felt daunted and knew I had a great deal to learn before I could even get close to understanding what was going on with PCA. This realization marked the beginning of my decent into statistics and data science.
Before embarking on my journey, I needed a statics package. There were many high quality packages to choose from, including Minitab, Spotfire, and JMP to name a few. Working at a startup, though, I didn’t want to get too cozy with a tool costing hundreds if not thousands of dollars a license per year. What if the firm closed it’s doors and the next employer didn’t want to invest in the package I spent so much time learning. I wanted the freedom to work with a tool at my own pace, it needed to be operating system independent, and it needed to be something I could use any ware if needed. Tall order… essentially free, capable and versatile. This ultimately brought me to R. However, after installing the software, and anxiously double clicking the R icon, a few things became readily apparent. There were no snazzy splash screens and no spreadsheet-style data-entry forms. There was just an unassuming terminal window claiming to be the “RGui” with a blinking cursor following the character “>”. Free (Check), versatile (So they said), Easy to use (at first glance, I had my doubts.)
So now, I had two problems PCA and figuring out R. What was I getting myself into?
Lastly I needed documentation and learning materials. For R, I had the internet, the wonderful posts on stackexchange, and countless other reference sites. For stats, I moved away from your typical college text book, and used an industry oriented book – Statistics for Experimenters (George P. Box, et al. 2nd ed.). I worked through this book for the better part of a year, augmented what I learned with online research, and dragged myself through MIT’s opencourseware Linear Algebra course. A year later, I had a pretty good sense of PCA, and many other numerical methods including ANOVA and DOE. As my knowledge grew, the R language was there to support me and often inspired me to dig more deeply into a numerical method’s details.
A year later, I was sitting in a meeting listening to the same technical lead. After the meeting, we spoke briefly. I was curious to learn in detail how he applied PCA in the context of the problem he was trying to solve the year before. He looked at me with some confusion and said, “PCA, I’m not sure what your talking about. Perhaps I said something like DOE and you misheard me.” I smiled inwardly. We spoke a while longer. I thanked him for his time, and we parted ways. Thankfully George P. Box already taught me about DOE.
Some 5 years later, I continued to learn R and stats on a daily basis, leveraging my understanding in aspects of experimentation, process control, measurement systems analysis, image analysis, and automation. These days, I’m curious about concise and efficient code, mixed models, and machine learning. Its a lot of ground to cover, and I can honestly admit I forget and have to go back sometimes, but that is what practice is for. Living in the information age, there is plenty of datasets to keep the brain’s neural network trained. Teaching and sharing helps too. I hope this blog helps you in your journey
Welcome to R-bar,
-KG