I recently gave a short talk to a mixed group of scientists and engineers (including a small body of statisticians). The purpose of the talk was to briefly describe a rather large simulation study that I had completed. The study sought to understand how various departures from our model assumptions affect quantities that we were interested in calculating.
So suppose that our model is a very simple linear model:
Many of you will recognize this as a straight line. For this model I will assume the typical distributional properties for , i.e., iid and normal distributed, let's say standard normal for simplicity. We also assume that the model parameters and (my apologies for the misaligned math symbols, I am figuring out the best way to input math symbols into this blog) are unknown. Assume this is the model that I was describing in my talk.
After I described the model given above, I then discussed how when we calculate our various statistics, we just assume that we know the values of the model parameters given above and evaluate them into our carefully derived statistical formulas. In the simulation study, we simulated data from the model form above, calculated parameter estimates for the unknown parameters (given above) from the simulated data, and then assess the variability in the test statistics (I was being a proper frequentist). Here is where my talk was criticized.
The main criticism that I received was that the "head" statistician in the room claimed that my linear model given above was not a model. She then issued her definition of a model, which is not only the linear form of the model and distributional assumptions, but we must also specify values of the unknown model parameters. My definition of the model as presented was the distributional assumptions and the linear form whereas her's would have also included values of the model parameters. Do you see the slight difference?
Unfortunately, when this criticism was thrown at me, I stood there without a response. In all honesty, I felt that I did not have a good response to her criticism because I realized that as an early career statistician, I have not personally developed my own philosophy/definition of what I think a model is. Or, to put it another way, at what point has our mathematical object been specified enough that we may declare it as a model?
Currently I do not have a good answer to my own question "At what point have we listed enough things that we may declare our list a model?" But I am curious, what do you think a list of objects must include in order for us to have a proper, statistical model?