Robustness and Reliability in the Analysis of Count Data

– Hello, I’m Deepak Somaya. I’m a professor in the Business School at the University of
Illinois at Urbana Champaign. And I’m gonna talk to you a little bit about count data models. So, roughly, the agenda for my talk today is to begin with an introduction
to count data models, just to describe what they are. And then talk about two sets of issues having to do with count data models. One set of issues having to
do with mean-specification, and within that we’re also gonna look a little bit at curvilinear relationships, interactions, and how
do you specify those, how do you test those. And distributional assumptions, something called over-dispersion
that a lot of people who’ve worked with count
models will be familiar with. And also other special considerations that might fall under
this broader category of distributional assumptions. So let me dive straight in to start by introducing you to count data
models and what they are. So essentially count
data models are models where the dependent variable, the Y, is distributed as count, so
it’s zero, one, two, three, etc. Lots of examples, especially
in the social sciences and my field, which is
strategic management. Lots of examples that are phenomenal, that we’re interested in,
that are essentially counts. So an example that I’ve
worked with quite a bit is the number of patents
that a firm might be granted. But also you could think
about number of alliances, number of IPOs, number of lawsuits, number of product introductions, or market entrants, or start-ups,
and so on and so forth. The list is quite long. And one of the effects of this fact that there’s actually a lot of phenomena in our social science research
that are manifested as counts is that in our fields, we have seen a lot of increase in the number
of count data model papers. And within strategic
management, for example, I have listed the set of
journals I’m looking at at the bottom here. We’ve seen an increase from the mid 1990s into the mid 20-teens of about a three or four-fold increase. So, today about six or seven
percent of our articles in our major journals
consist of count models. And if you look only at empirical papers then that number is even higher, so you’re talking about
something like 10 percent of papers being count model-based papers. At least one of the models in
the paper is a count model. Sometimes there are
papers that use different types of models within the same paper. So what I’m highlighting here is the idea that these models are
important for our research and it’s imperative that
we actually understand how they work and understand them well. So typically when we have a count model, and this is an example
using a Poisson model, what we tend to have is
a discrete distribution to model the count variables, right? So the count variables can only take on these discrete non-negative
integer values, and so we essentially
model the probability of any one of these Y’s, so this Y-I could be zero,
one, two, three, etc. So the probability of a particular Y-I in terms of this lambda, which is sort of the mean term here, the average of this distribution. So for the Poisson distribution, lambda is the expected value of Y, the mean of the distribution, and usually what you do is
that you essentially then specify that mean in
some way as a function of the X’s, which is the
independent variables. Now, notice one thing that we need here. So because the Y, all the
Y’s in our distribution, have to be zero or greater, it means that essentially lambda, the expected value of Y, essentially has to be larger than zero. And the way that we force that is by using an exponent
here, so we essentially run a regular regression
inside the bracket here, so you have X beta inside the bracket, but then we take the
exponent of that to ensure that the mean of the distribution
count is non-negative. Now, the choice of what kind
of probability distribution we use for the count variable is what we might broadly
put under this category of distributional issues. And then the choice of how we
specify this mean function, that’s what we might think of
as mean specification issues. And it’s important to
understand this distinction, because that’s how the
rest of my presentation’s gonna be organized. I’m gonna focus first on these
mean specification issues and then come to the distribution, in terms of what kind of
distribution we’re gonna use. So that’s the overall structure. Within mean specification,
I’m gonna talk about the choice between Log-Lin models versus Log-Log models
and then implications for modeling curvilinear
relationships and interactions. And then we move to
distributional assumptions. We’ll talk about something
called over-dispersion and then the consistency of estimates, which is really an important aspect of how do you think about distribution and distributional assumptions. So let’s begin by looking at
mean specification issues. Let’s start with one of the big issues with count models, which is this choice between Log-Lin and Log-Log models. So if you remember my
specification of the mean in the previous slide, right, the expected value of Y
conditional on the X’s, typically is modeled as this exponent of a typical regression equation, right, the X, B, they’re sort of
linear terms added together. Now, the exponential form is the default form of count models. This comes packaged in with
any count model package that you might see in
the statistical packages. We’re using Stata or SAS
or any of the others. So what that does is it automatically yields a Log-Lin model,
because what you have here is Y being modeled as the exponent of X. So if you take logs on both sides, you get the log of Y
essentially being modeled as the sum of the set
of linear terms in X. Of course, the value here
is that it’s constrained to be non-negative. That’s the reason we have this, because we know that the
mean cannot be negative, so we’re constraining
it to be non-negative. And it’s an implicit feature of the model and it’s not obviously to a lot of users, but when you automatically use a package that has a count model in it, you automatically end up using a Log-Lin version of the model. Now there’s an alternative
that one could use, plausibly better, and
we’ll talk about that in just a little bit, where the X’s are all logged before we add them together in this equation. So what ends up happening, then, is then you get, if you
take logs on both sides, you get the log of Y
modeled as essentially a function of a series of logs of X’s. So that would yield a
Log-Log functional form of count data models, and the question, of
course, is which one to use. So let’s take a look at
how these models look in linear space, so the axes
here are just regular axes, right, linearly increasing. And what you see is that the Log-Lin model essentially has this
exponentially increasing sort of shape here, right. By contrast the Log-Log model can have a set of different types of shapes, but really in the ranges
that we care about, they tend to be sort
of more or less linear, and so in some ways the Log-Log model is kind of a little bit
more familiar and friendly to those of us who are
thinking about phenomena in linear terms as opposed
to exponential terms. And I would also go so far as to say that in social science phenomena, it’s somewhat hard to
find something that has this kind of exponential shape, where linear increases in X yield exponential increases in Y. It’s much more likely to
find this kind of phenomena, where it’s more or less linear or at least slightly increasing curves, not the strong exponential curve that you have with the Log-Lin model. Now it’s also a matter
of interpretation, right? So Log-Lin model, essentially the beta is D Y by D by D X. What this means is that
there’s a constant ratio of the relative change in Y, so percentage changes in Y, for example, to the absolute change in X, so X is changing in linear terms. X goes up from one to two, you have, let’s say, a doubling in Y. If it goes up from two to three, you have another doubling in Y. Now that would be somewhat unnatural, and it’s kind of hard to think of examples that would actually fit that. And yet, nonetheless, in about 90% of the papers in my field,
in strategic management, what you see is essentially
the Y’s and the X’s get modeled in this way, in
the default Log-Lin approach. Now an alternative might
be a Log-Log model, and the Log-Log model, the
betas can be interpreted as D Y by Y divided by D X by X, which essentially implies
that relative changes in both Y and X have a constant ratio, or simply, that there
is an elasticity of Y with respect to X which is constant, which is given by the
coefficient that you estimate. I think this might be plausible
in a lot of circumstances. Essentially what you’re saying is that percentage changes in X result in percentage changes in Y and that seems fairly reasonable in potentially a lot of
different applications. Now it turns out that this choice between Log-Lin and Log-Log is not a purely academic choice,
it’s actually something that has practical implications for the choice of count models. It turns out that there’s a very large set of count models, which we’ll talk about when we talk about
distributional assumptions, that yield consistent
coefficient estimates. So the coefficient
estimates can be presumed to be consistent, but
only if the mean function is correctly specified. So if we don’t actually
specify the mean function correctly, if you’re using, let’s say, a Log-Log model where the
true relationship is Log-Lin, or more likely perhaps,
if you’re using a Log-Lin, where the true relationship is Log-Log, then you end up with coefficient estimates that are likely to be inconsistent. So we’ve got these two issues, potential inconsistency, and also, the question of interpretation, or how you’re interpreting
the relationship. That can result from not
getting the specification right. So my recommendation very
generally in this category would be that it’s important to actually get mean specification right, actually explicitly
consider mean specification, to not just use the default form that the statistical
packages sort of have. And then to actually
justify the functional form that we are using, especially if you’re
using Log-Lin, I think, all the more reason why we need to justify why that might be right. One of the simple things
that we can always do is to conduct some specification tests. Example might be a Bayesian
information criterion sort of test to evaluate fit. Compare the Log-Lin and
the Log-Log specification and see which one fits the data better. I will tell you that I have
done this many, many times, and almost never do I find
that the Log-Lin model actually fits the data better
than the Log-Log model. So in general I would say I would favor the Log-Log specification, and then there are, of
course, implications with a Log-Log specification
for how do you deal with independent variables
that might be difficult to log. For example, what do you
do with dummy variables? So one of the things that’s pretty clear about this one is that for dummy variables you should just keep
them as dummy variables. And then you might have variables
that might contain zeros and what do you do with that? Might be bounded at the
lower end by zeroes, so variables that contain
zero, one, two, three, etc. One option here might be to add a one and then take the log. That does transform the
variable a little bit. One of the things I should emphasize here is you should add a very small number and then take logs,
’cause if you take the log of a very small number, let’s say 0.0001, the log of that number is going to be a very big negative number, and so you’re adding outliers to the data, and you’re artificially
adding outliers to the data, and that would not be
something you should do. But adding a one does
change the data structure a little bit, but not too badly. Then there’s bigger issues about data that contain negative values. If your data do contain negative values, you might wanna ask yourself how important the negative values are. Can you somehow think of a natural origin that would make all the values positive? And then take logs, so
that might be one solution. Really, this one’s the hardest one if you have negative values
for some of your X’s, and really requires a lot
of contextual understanding of the research question to be able to say how you’re going to resolve that. And last but not least, I
think it’s really important, no matter what kind of model you use, to interpret the coefficients
accurately based on the model. So if there’s a Log-Log specification, the coefficient’s essentially
going to be last in series, but if it’s a Log-Lin model, you’re gonna have to
interpret them accordingly as a Log-Lin model. The second part of what I wanna emphasize about mean specification
is that mean specification actually has implications for the testing of some kinds of things that
we have a lot of interest in in social science research, such as curvilinear relationships between an X and the Y, or interactions between two X terms. So specifying the mean
function, the first stage, accurately actually is pretty important for these other
considerations that we usually get into in research. Let me highlight why. So what I’ve done here
is to essentially assume that the Log-Log model is accurate. We’ll talk about that
assumption just a little bit, and so here are a set of Log-Log models. There’s only one X, and
there’s a Y on the other axis, and three different Log-Log models. And I’ve reproduced
those same Log-Log models in Log-Lin specs. So what you have here, these curves, are essentially the same models, but the axes are scaled differently. So you have a linear axis
here and a log axis here. And then what you’ll notice here, is that these same models in
this space will look curved. So if you were to use a Log-Lin model and your true data would
fit a Log-Log model, then what you’re gonna get is these artificial
curvilinear relationships, and in fact, the typical type
of quadratic curvilinear term that we sort of tend to model is one that I’ve tried to sort of add on to this particular curve, so you can actually see
that this particular line sort of proxies, and if
you have a set of data distributed here in the
middle of the screen, this line would proxy
that data pretty well. So you might end up getting these sort of artificial curvilinear relationships if you don’t have the
mean specification right in the first place. Now, one of the things
you can actually do, which I did as a fun exercise, was to actually see if
there’s any evidence that this is happening. So what you can do is, you can use Taylor series expansion to
examine the expected coefficient of this kind of misspecification. So if the data’s actually Log-Log and then you’re modeling as Log-Lin, then you could say, okay, what would be the coefficient of X and what would be the coefficient of X squared
that you might actually end up estimating as a result of this. So from this Taylor series expansion, essentially you get two
coefficients for the linear term X and the square term X squared if you were actually modeling
this in Log-Lin space and you were trying to do
a curvilinear relationship. And essentially the beta
here is the coefficient of, in the Log-Log model,
is the coefficient of X, and the A is the mean
of the variable X, okay. So you get this
relationship where beta one is two beta by A, and
beta two is minus beta by two A squared, and from
this what you can really do is get a relationship
between what this beta one is gonna be and what this
beta two is gonna be. Along with the mean of X as
part of that relationship. And what we did in the
paper was to really look at all the research in strategic management that uses square terms to
model curvilinear relationships in count models, and sort of see if those square terms, the
coefficient of the X and the coefficient of the X squared, if they actually had this relationship, this hypothesized kind of relationship, which also involves the
mean of the variable X. And what we found is that on average, it turns out that in our field, this is exactly what
seems to be happening. That most curvilinear
relationships are essentially Log-Log relationships
that are now being modeled in Log-Lin space and
then you, lo and behold, get a curvilinear relationship. And in fact, less than 20% of the results turn out to be true, or appear to be true, curvilinear effects, so
we seem to be modeling a lot of otherwise linear relationships through this sort of mean misspecification issue that I highlighted
earlier as curvilinear effects. So the recommendation
in terms of what we do about testing curvilinear relationships, therefore, is actually quite simple. Recognize first of all that this Log-Lin specification
has major implications for testing curvilinear relationships, and linear Log-Log relationships are gonna manifest as
curvilinear in Log-Lin space, and so it’s important to choose the right mean specification, so all the things I said earlier about mean specification apply. But then in addition to that, if you really want to model
curvilinear relationships, and you’ve actually found
that the Log-Log model is the right model for this set of data, then use the squared term of log X, which is the square of log X as a whole. Remember, you shouldn’t use log X squared, because that’s simply two times log X. But you essentially take the square of the log term as a whole, and you wanna interpret
these results with some care, because of all the issues
that are involved here. Let me now turn to modeling interactions. And the first question that I
really have to put out there is what is really an interaction? And if you truly think about this, what we mean by an interaction is the cross-partial
derivative, all right? So essentially it’s D squared Y by D X one, D X two, so
what we’re really asking is when both X one and X
two simultaneously change, is there sort of an additional effect of this relative change on Y? The simultaneous change. Actually let me back up. That last sentence is not needed, sort of you can just stop at D squared Y by D X one, D X two, okay. So if you had linear
terms in a regression, this is quite simply the
coefficient of the interaction term if you put it in there. Because the cross-partial of
all the other linear terms basically become zero. So you have a series of
some coefficient times X, and then the moment you
have D X one, D X two, it’s gonna be the, either X one or X two is not gonna be in that
particular term, right? It’s going to be coefficient times X one or coefficient times X two or coefficient times X
three, and so automatically all of those are going to evaluate to zero and it’s only the interaction term whose coefficient is gonna manifest as the cross-partial
derivative and that’s why our usual sort of
application of interactions, we only focus on the interaction term. Now, there’s something
different that happens when you have nonlinear models. And what I mean by nonlinear models is that the model
specification is no longer sort of this sum of the linear terms as we have in a regression. So here, the cross-partial,
the true interaction is often not simply the coefficient
of the interaction term, and in fact, in general you could say that the cross-partial might depend on all the other X variables
and their coefficients. And this is a very well-recognized issue and some of you might be familiar with this issue in the context
of other nonlinear models, such as choice models, for example. And count models are essentially nonlinear in their functional form. So you might expect the
same phenomenon to manifest in count models and they actually do. So both Log-Lin and Log-Log models contain an implicit interaction even without the introduction of an
explicit interaction term. So if you think about the Log-Log case, and you’ve got sort of a
set of additional variables, but the focal variables
we’re interested in are X one and X two, these
two independent variables, and we say log Y is in
this functional form, which is the Log-Log functional form. Then the true interaction, the D squared Y by D X one, D X two,
evaluates to this value. So essentially it’s
the expected value of Y divided by X one, X two,
times the coefficient of X one, X two, so
without any interaction, there already is, without
any interaction term, there already is an
interaction sort of buried in this functional form. Now into this Log-Log model, if you actually add an
additional interaction term, that interaction term adds more complexity to what the true interaction is. So the true interaction,
essentially the cross-partial, is not simply that. Now it’s a much more
complicated expression as you can see down here, right? A whole bunch of other
variables interacting in various ways sort of play a role in specifying this true interaction. So what do we see in practice about how interactions are modeled and tested in count models? And again, I’m thinking about my field, strategic management,
and the papers I’ve seen within that field. So rarely do we see true interactions actually be discussed or
interpreted in these models. In fact, the majority of studies simply use and interpret
the interaction term, often with the default
Log-Lin functional form, so there’s sort of a
couple of issues layered onto each other here, and about 50% of the cases,
our estimates suggest that there is actually
no true interaction. That this is not an
interaction that would actually withstand the scrutiny of a cross-partial derivative analysis. And last but not least, we could attempt to actually graphically interpret this moderation or interaction effect, and look at it over the
range of the sample, and actually use the functional form off the count model that we’re specifying to evaluate what the
value of the predicted value of X would be, or
predicted value of Y would be, sorry, for different ranges of X, and actually look at it graphically and sort of try to interpret
it graphically in that manner. Hoetker, as well as Wiersema and Bowen, sort of talk about this kind
of graphical interpretation, the context of other nonlinear models as an approach that can be easily extended to count models as well. And now let’s turn to to
distributional assumptions, the last topic of my
little mini-lecture here. So when thinking about
distributional assumptions, what we’re really sort of focusing on is this idea that the
count dependent variable can only take on
non-negative integer values, which means that we then have to use a discrete distribution to describe them. Now one thing that I would point out is that with very large count values, and for me, sort of the rough metric is if the average count
is in the 25 to 30 range, and there’re not a lot of zeroes, then the estimates from an OLS are probably quite a robust alternative to using a count model. So I think that one should
very strongly consider OLS because there are lots of advantages to using an OLS model. Lots of other kind of tools and robustness tests and
so on that can be brought to bear and there’s a
lot, it’s much more stable and much more reliable model in many ways. So one could certainly consider using OLS. On the other hand, if there’s a small mean for the count-dependent variable, or there’s a large
proportion of low counts, including zeroes, then perhaps an OLS might be inappropriate. It could very easily
lead to biased estimates and therefore one should use
discrete distributions instead and Poisson and negative
binomial are the two that are the most commonly
used models in this context. I should say, the two most
commonly used distributions. Invariably, when you use
one of these distributions, the model gets labeled
by the distribution, so when you use a Poison distribution, we say it’s a Poisson model, and when we use a negative
binomial distribution, we say it’s a negative binomial model. Let’s start by looking at
the Poisson distribution and this idea of over-dispersion. So one of the features of
the Poisson distribution is that the mean of the
Poisson distribution is equal to its variance. So that lambda that we saw earlier and I described as the mean
of the Poisson distribution, is actually also equal to its variance. And this property is
called equi-dispersion. It turns out that
equi-dispersion is something that we are assuming, and
a lot of real-world data actually don’t contain
equi-dispersed counts. So what we then uncover in real-world data is this phenomenon of over-dispersion, where the variance of the
data is greater than the mean. It’s much more common
to see over-dispersion than under-dispersion. And this is very common, as I have mentioned earlier, in data. Solution to this is to actually use the negative binomial model,
suggested by Long and others, and very often used in
actual empirical work in order to deal with this problem. I’ll talk about binomial
models in just second, and what you’ll find
is that binomial model is a little bit more flexible. It allows us to have means and
variances that are different. So what I’d like to bring our focus to is not simply a question of whether there is over-dispersion
or equi-dispersion, but what that means for the consistency of the estimates that we
get from such a model. So it turns out that Poisson models, as well as the commonly used
negative binomial model, which is the NB-2 model, I’ll say more about that
in just a little bit, they’re both from the
linear exponential family. And so coefficient
estimates from this family turn out to be consistent,
so long as the mean is specified correctly. So refer back to my discussion earlier about correct mean specification. However, the problem with
this over-dispersion, where you’re not actually
finding that the variance of the data is equal to the mean, is that the standard error
estimates may be inconsistent. So essentially what we’re saying, is that if the distribution
is not Poisson, or in the case of negative binomial, if it’s not a negative binomial, then the standard error
estimates may be inconsistent. And in particular, in the Poisson case where you have over-dispersion, the variance is larger than
what Poisson is allowing, you might end up estimating relatively small standard errors, which is essentially
something that creates a bias towards finding significance
in your results, even though the significance
might not truly be there. So the key issue here
that we need to focus on is the consistency of estimates, and not just a sort of
vague idea of realism as to whether a particular count model sort of actually maps
on to true data or not. I think this idea can be
underscored a little bit more by talking about the negative
binomial distribution, and ultimately, what we should do about distributional assumptions. So it turns out that negative
binomial distributions are essentially a modified version of the Poisson distribution
where the variance is allowed to be not equal to the mean. In general, the way we deal with this is by having a functional
form for the variance, where the variance is a function of the mean, so you have
mu I, which is the mean, but also some other parameters. So there’s this parameter alpha, which is some parameter
that can be estimated from the data. There’s also this parameter P. And the parameter P is one
that we specify in advance. If P is specified to be one, then that yields the NB-1 model, the negative binomial one model. If it’s two, it yields the
negative binomial two model, etc. Now this NB-2 model is what most people go around calling the
negative binomial model, it’s the common version of
the negative binomial model. Turns out that that NB-2 distribution also belongs to the
linear exponential family, just like Poisson, and so therefore it’s gonna yield consistent
coefficient estimates. So just as in the Poisson case, the standard errors of the NB-2 model may also be inconsistent
if the data aren’t truly distributed as NB-2. But this concern is
mitigated to some degree because of the alpha. What the alpha does is
allows over-dispersion. So it allows, and this
parameter is specified, is estimated from the data, so it allows a fair
degree of over-dispersion. So you might believe that
the negative binomial model allows itself to conform more to data than the Poisson model does, and therefore, perhaps this concern about inconsistency of the standard errors might be lower in the
negative binomial case. Now, I wanted to recommend a couple of different approaches. One that sort of takes on the
negative binomial seriously, but then also provide an alternative. The first thing that I recommend here is that the mean function needs
to be specified correctly. All the stuff that we do about
distributional assumptions matters very little if the mean function isn’t specified correctly. Now, of course, the true mean function may never be really known, but at the very least as researchers, one of the things that we can do, is to try and compare
alternative mean functions and see which one fits the data better. And as I pointed out in almost every case that I’ve seen, in my own data, it’s always the Log-Log model that tends to fit the data better. So starting with that,
one of the things that, I think a step that a lot of people skip, which I think is a very valuable step, is to actually estimate the
Poisson regression model. Now the standard errors in this model might be inconsistent,
but at least it generates a set of consistent coefficient estimates, so you get some idea of what
the coefficient estimates are. In terms of the Poisson
model as a very robust model, it’s relatively more robust than some of the other models,
and it’s a simple model, so you tend to sort of put
fewer constraints on the data, and you get a relatively
good, sort of first take, of what the coefficients look like. And then you might also compute and report measures of over- or
under-dispersion in the data, and really try to get your hands around how much of a problem is over-dispersion in my particular data? And of course, if you
have over-dispersed data, you need to do something about that. There are two approaches
that I’d recommend. One, of course, is to use the NB-2 model, the commonly-used negative binomial model if the data over-dispersed. But also, another approach
that I am quite a fan of, which is to continue to
use the Poisson model, as the relatively simply and robust model as I described earlier, but to scale the standard errors, to essentially deal with the
standard errors separately. To increase, to ensure the consistency of the standard errors,
particularly by using something like the
Huber-White standard errors, which are generally
considered to be quite robust, and therefore consistent. Now there are a couple
of specialized models that one could think
about within the context of count models, and one
involves count models with panel data. One of the neat features about
count models and panel data is that very often when one controls for the unit specific
effects in the model, which is typically what we
do in panel data analysis, the residual over-dispersion
might be quite small. In fact, there’s some theories about why do we see over-dispersion, which sort of arise from
these unit level differences. So they’re actually
differences between units and unobserved differences and that’s why we end up with over-dispersion. And so once you sort of control for those unit level differences
by actually controlling for the unit itself, you might
have less over-dispersion. But the second thing that I really want to almost warn you about, is that there is a fixed effects negative binomial model that has been derived
and is used very commonly in a lot of different settings. But there’s a real problem with that fixed effects negative binomial model. The problem is that in
order to condition out the fixed effects, a
simplifying assumption was made by the scholars who put
together that research, and essentially what that
simplifying assumption does, is it models the fixed
effects in the variance term. Now, most of us, when we think about a fixed effects model, almost never think about fixed effects in the variance term. It’s a very unusual type of specification. Usually, we think about fixed effects in the mean term, in the average. And so I, one would
strongly sort of advise against that particular
fixed effects model. I think you have to have very
unusual or unique assumptions about what you’re looking for in the data in order to use that fixed
effects specification. Instead, there is another specification which sort of does something similar to what we discussed earlier, which is to compute robust standard errors in order to eliminate issues
relating to inconsistency, and it’s a fixed effects
Poisson model effectively. It’s called the quasi-maximum
likelihood estimator, so the fixed effects quasi-maximum
likelihood estimator, in Stata as implemented using xtpqml, if you find that particular Stata command, that’ll explain what this
estimator’s all about. And I very strongly advocate
using that if you’re looking for a fixed effects panel
model for count data. And last but not least, I
want to also give a shout out to zero inflated models. It turns out that very often
when we have count data, we might have relatively
large frequencies of zeroes relative to the rest of the distribution. So the count often are not distributed in the way that a typical Poisson or negative binomial distribution
might be distributed. And so using zero inflated models, either the Poisson version or
the negative binomial version, sort of helps deal with these
excess zeroes in the data. Really valuable to use when
you have lots of zeroes, which are disproportionate to the number of counts in your data. So, just to summarize really quickly the issues that I sort of raised in this short video lecture, and wanna highlight for you. First of all, pay attention
to mean specification, especially to the functional form. I think that’s the number
one thing that I think about when I think about
doing count models well. Particularly this Log-Log versus Log-Lin specification choice. The default ends up being Log-Lin. It might not fit most of the phenomena that we’re looking at. And then, based on that,
to appropriately model curvilinear relationships
as well as interactions. And finally, to pay attention to distributional assumptions. And typically when we think about this, we think about Poisson versus
the negative binomial choice. Both of these distributional assumptions yield consistent coefficient estimates. It’s important to recognize that. However, there’s a
challenge with the estimates of the standard errors,
especially with over-dispersion. The Poisson standard
errors might be too low. The estimated standard
errors might be too low. And so we could address in
a number of different ways, and one of the ways in which
you could address this, of course, is to use the
negative binomial model, but also we could address this by using Huber-White’s standard
errors, for example. And last but not least,
again I wanna highlight this issue with panel models, that especially for fixed
effects panel models, you don’t want to use the
negative binomial version of that. Instead, use the fixed effects quasi-maximum likelihood version, which is actually a very
robust and very nice model. Thank you, I hope this was helpful. I look forward to seeing you sometime.

Leave a Reply

Your email address will not be published. Required fields are marked *