Questions to Statistics

I have been thinking about some questions about Statistics for a few years and trying to ask different people for vairous answers.

Image credit: Mei Dong

As a student in Statistics, I find that this subject is more and more interesting to me over these 7 years. However, there are also more and more questions for me, about the foundation of Statistics, the future of Statistics, and so on. I want to recommend the book “The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century” to have a basic knowledge of the history of Statistics. It is a very interesting history and basically just between a few famous statisticians. Well, if you check your academic genealogy on Mathematics Genealogy Project, you will find someone really famous. For me, 3 generations before is Blackwell, 6 generations before is Fisher and 10 generations before is Poisson. This field is very small, so it’s very easy to find the connection between each other.

I asked some questions to some of my professors and also tried to answer them by my own little experience. Some questions are easy to answer and I got similar answers. Some are too hard to answer based on the current status. I would like to also ask these questions to the readers, and you can also think about some questions like this.

1. Is “are you a Bayesian or frequentist” a reasonable question? What does it mean to be a Bayesian statistician? Will they be unified in the future?

Nowadays, people are using methods from both schools to solve problems. It seems that people now are not interested in the philosophical debate between frequentist and Bayesian, as long as they can do their jobs. So some people call themselves a pragmatic Bayesian, instead of just Bayesian. Like many conflicts between two definitions in the history, now we just learn and use both of them, without knowing that once there was a big debate between them. Does it matter? I don’t know, but I think it is good to at least know what do they mean and represent. When I graduated from UC Irvine, I thought I was a Bayesian since I choose to make Bayesian methods as one of my research interest. So when Jock asked “is there any Bayesian in our class?” I answered without doubt. But after several classes, I found I was too immature to call myself a Bayesian. There are still too many ideas I haven’t heard of or fully understood. As a result, I’m now very careful about answering such questions. For the last question, I’m thinking of something like Theory of Everything, not just simply combining frequentist and Bayesian methods. As we know, when we use an uninformative prior and the sample size is large enough, the posterior distribution will give us estimates similar to MLE. So there may be a possibility that all the similar methods can be unified as one general method.

2. What is your definition of Statistics, and maybe data science? What do you think Statistics will be in the future?

With the borderline between Statistics and other subjects (e.g. data science and CS at some level) becoming more and more unclear, it is harder and harder to precisely define these subjects. There are many papers, articles and talks about what is data science. Different people have different definitions. Some people said data science is an interdisciplinary field, and some said Statistics is included in the data science since Statistics deals with data. In China, some departments of Statistics have been replaced or included in a school of data science or school of big data. The reason, which I think is obvious, is that data science or big data is a very hot trend and this title can bring more fundings and projects. Like Paul, I won’t be surprised if Statistics disappears or becomes something else in the future. In my opinion, Statistcis itself has a really weak theoretical foundation, since it’s an emipirical subject. And what I learned in these 7 years is that nothing is impossible. Sometimes I’m thinking if this just means that we have learned nothing, which is pretty sad.

3. Though we are trained to have the knowledge and skills as professionals, how much value does capital think we have? What is our advantage in the industry?

Many companies have the position “data scientist”, but also many of these companies have no idea what should these “data scientists” do. At the beginning of the pandemic, I’ve heard that some companies dissolved their data science department. Maybe this is a good time for them to really think about this problem. On the other hand, CS students, who know how to use R/Python packages with some level of Statistics knowledge, can also be a data scientist and may be better at coding than stats students. Jock said “Statistics must be one of the most widely and worst taught subjects. Tell someone that you are a statistician and almost surely you will get the answer that Statistics was the worst and most useless course they had ever taken.” As a result, the impression of Statistics is uselessness. Also, in this era, people are focusing on speed, and quality is the second. If a package can handle everything, why do I need a statistician?

4. When did you think you have enough knowledge to work on a project independently, and when did you find you have enough knowledge to think about the fundamental questions about Statistics?

These two questions are not as important and deep as the prior ones. I just don’t know if this is a good time to think about these questions and what will it be after PhD. What level should a PhD graduate be? How far am I to that level?

I always agree that it’s good to think some general and fundamental questions about something. To think about the deepest foundation of everything helps you understand this world and form your worldview. I hope after few years, I could answer some of these questions and come up with some new questions.

Yiran Wang
Yiran Wang
Postdoctoral Associate

My current research areas are Bayesian methods and causal mediation analysis.

comments powered by Disqus