– Hello, I’m Deepak Somaya. I’m a professor in the Business School at the University of

Illinois at Urbana Champaign. And I’m gonna talk to you a little bit about count data models. So, roughly, the agenda for my talk today is to begin with an introduction

to count data models, just to describe what they are. And then talk about two sets of issues having to do with count data models. One set of issues having to

do with mean-specification, and within that we’re also gonna look a little bit at curvilinear relationships, interactions, and how

do you specify those, how do you test those. And distributional assumptions, something called over-dispersion

that a lot of people who’ve worked with count

models will be familiar with. And also other special considerations that might fall under

this broader category of distributional assumptions. So let me dive straight in to start by introducing you to count data

models and what they are. So essentially count

data models are models where the dependent variable, the Y, is distributed as count, so

it’s zero, one, two, three, etc. Lots of examples, especially

in the social sciences and my field, which is

strategic management. Lots of examples that are phenomenal, that we’re interested in,

that are essentially counts. So an example that I’ve

worked with quite a bit is the number of patents

that a firm might be granted. But also you could think

about number of alliances, number of IPOs, number of lawsuits, number of product introductions, or market entrants, or start-ups,

and so on and so forth. The list is quite long. And one of the effects of this fact that there’s actually a lot of phenomena in our social science research

that are manifested as counts is that in our fields, we have seen a lot of increase in the number

of count data model papers. And within strategic

management, for example, I have listed the set of

journals I’m looking at at the bottom here. We’ve seen an increase from the mid 1990s into the mid 20-teens of about a three or four-fold increase. So, today about six or seven

percent of our articles in our major journals

consist of count models. And if you look only at empirical papers then that number is even higher, so you’re talking about

something like 10 percent of papers being count model-based papers. At least one of the models in

the paper is a count model. Sometimes there are

papers that use different types of models within the same paper. So what I’m highlighting here is the idea that these models are

important for our research and it’s imperative that

we actually understand how they work and understand them well. So typically when we have a count model, and this is an example

using a Poisson model, what we tend to have is

a discrete distribution to model the count variables, right? So the count variables can only take on these discrete non-negative

integer values, and so we essentially

model the probability of any one of these Y’s, so this Y-I could be zero,

one, two, three, etc. So the probability of a particular Y-I in terms of this lambda, which is sort of the mean term here, the average of this distribution. So for the Poisson distribution, lambda is the expected value of Y, the mean of the distribution, and usually what you do is

that you essentially then specify that mean in

some way as a function of the X’s, which is the

independent variables. Now, notice one thing that we need here. So because the Y, all the

Y’s in our distribution, have to be zero or greater, it means that essentially lambda, the expected value of Y, essentially has to be larger than zero. And the way that we force that is by using an exponent

here, so we essentially run a regular regression

inside the bracket here, so you have X beta inside the bracket, but then we take the

exponent of that to ensure that the mean of the distribution

count is non-negative. Now, the choice of what kind

of probability distribution we use for the count variable is what we might broadly

put under this category of distributional issues. And then the choice of how we

specify this mean function, that’s what we might think of

as mean specification issues. And it’s important to

understand this distinction, because that’s how the

rest of my presentation’s gonna be organized. I’m gonna focus first on these

mean specification issues and then come to the distribution, in terms of what kind of

distribution we’re gonna use. So that’s the overall structure. Within mean specification,

I’m gonna talk about the choice between Log-Lin models versus Log-Log models

and then implications for modeling curvilinear

relationships and interactions. And then we move to

distributional assumptions. We’ll talk about something

called over-dispersion and then the consistency of estimates, which is really an important aspect of how do you think about distribution and distributional assumptions. So let’s begin by looking at

mean specification issues. Let’s start with one of the big issues with count models, which is this choice between Log-Lin and Log-Log models. So if you remember my

specification of the mean in the previous slide, right, the expected value of Y

conditional on the X’s, typically is modeled as this exponent of a typical regression equation, right, the X, B, they’re sort of

linear terms added together. Now, the exponential form is the default form of count models. This comes packaged in with

any count model package that you might see in

the statistical packages. We’re using Stata or SAS

or any of the others. So what that does is it automatically yields a Log-Lin model,

because what you have here is Y being modeled as the exponent of X. So if you take logs on both sides, you get the log of Y

essentially being modeled as the sum of the set

of linear terms in X. Of course, the value here

is that it’s constrained to be non-negative. That’s the reason we have this, because we know that the

mean cannot be negative, so we’re constraining

it to be non-negative. And it’s an implicit feature of the model and it’s not obviously to a lot of users, but when you automatically use a package that has a count model in it, you automatically end up using a Log-Lin version of the model. Now there’s an alternative

that one could use, plausibly better, and

we’ll talk about that in just a little bit, where the X’s are all logged before we add them together in this equation. So what ends up happening, then, is then you get, if you

take logs on both sides, you get the log of Y

modeled as essentially a function of a series of logs of X’s. So that would yield a

Log-Log functional form of count data models, and the question, of

course, is which one to use. So let’s take a look at

how these models look in linear space, so the axes

here are just regular axes, right, linearly increasing. And what you see is that the Log-Lin model essentially has this

exponentially increasing sort of shape here, right. By contrast the Log-Log model can have a set of different types of shapes, but really in the ranges

that we care about, they tend to be sort

of more or less linear, and so in some ways the Log-Log model is kind of a little bit

more familiar and friendly to those of us who are

thinking about phenomena in linear terms as opposed

to exponential terms. And I would also go so far as to say that in social science phenomena, it’s somewhat hard to

find something that has this kind of exponential shape, where linear increases in X yield exponential increases in Y. It’s much more likely to

find this kind of phenomena, where it’s more or less linear or at least slightly increasing curves, not the strong exponential curve that you have with the Log-Lin model. Now it’s also a matter

of interpretation, right? So Log-Lin model, essentially the beta is D Y by D by D X. What this means is that

there’s a constant ratio of the relative change in Y, so percentage changes in Y, for example, to the absolute change in X, so X is changing in linear terms. X goes up from one to two, you have, let’s say, a doubling in Y. If it goes up from two to three, you have another doubling in Y. Now that would be somewhat unnatural, and it’s kind of hard to think of examples that would actually fit that. And yet, nonetheless, in about 90% of the papers in my field,

in strategic management, what you see is essentially

the Y’s and the X’s get modeled in this way, in

the default Log-Lin approach. Now an alternative might

be a Log-Log model, and the Log-Log model, the

betas can be interpreted as D Y by Y divided by D X by X, which essentially implies

that relative changes in both Y and X have a constant ratio, or simply, that there

is an elasticity of Y with respect to X which is constant, which is given by the

coefficient that you estimate. I think this might be plausible

in a lot of circumstances. Essentially what you’re saying is that percentage changes in X result in percentage changes in Y and that seems fairly reasonable in potentially a lot of

different applications. Now it turns out that this choice between Log-Lin and Log-Log is not a purely academic choice,

it’s actually something that has practical implications for the choice of count models. It turns out that there’s a very large set of count models, which we’ll talk about when we talk about

distributional assumptions, that yield consistent

coefficient estimates. So the coefficient

estimates can be presumed to be consistent, but

only if the mean function is correctly specified. So if we don’t actually

specify the mean function correctly, if you’re using, let’s say, a Log-Log model where the

true relationship is Log-Lin, or more likely perhaps,

if you’re using a Log-Lin, where the true relationship is Log-Log, then you end up with coefficient estimates that are likely to be inconsistent. So we’ve got these two issues, potential inconsistency, and also, the question of interpretation, or how you’re interpreting

the relationship. That can result from not

getting the specification right. So my recommendation very

generally in this category would be that it’s important to actually get mean specification right, actually explicitly

consider mean specification, to not just use the default form that the statistical

packages sort of have. And then to actually

justify the functional form that we are using, especially if you’re

using Log-Lin, I think, all the more reason why we need to justify why that might be right. One of the simple things

that we can always do is to conduct some specification tests. Example might be a Bayesian

information criterion sort of test to evaluate fit. Compare the Log-Lin and

the Log-Log specification and see which one fits the data better. I will tell you that I have

done this many, many times, and almost never do I find

that the Log-Lin model actually fits the data better

than the Log-Log model. So in general I would say I would favor the Log-Log specification, and then there are, of

course, implications with a Log-Log specification

for how do you deal with independent variables

that might be difficult to log. For example, what do you

do with dummy variables? So one of the things that’s pretty clear about this one is that for dummy variables you should just keep

them as dummy variables. And then you might have variables

that might contain zeros and what do you do with that? Might be bounded at the

lower end by zeroes, so variables that contain

zero, one, two, three, etc. One option here might be to add a one and then take the log. That does transform the

variable a little bit. One of the things I should emphasize here is you should add a very small number and then take logs,

’cause if you take the log of a very small number, let’s say 0.0001, the log of that number is going to be a very big negative number, and so you’re adding outliers to the data, and you’re artificially

adding outliers to the data, and that would not be

something you should do. But adding a one does

change the data structure a little bit, but not too badly. Then there’s bigger issues about data that contain negative values. If your data do contain negative values, you might wanna ask yourself how important the negative values are. Can you somehow think of a natural origin that would make all the values positive? And then take logs, so

that might be one solution. Really, this one’s the hardest one if you have negative values

for some of your X’s, and really requires a lot

of contextual understanding of the research question to be able to say how you’re going to resolve that. And last but not least, I

think it’s really important, no matter what kind of model you use, to interpret the coefficients

accurately based on the model. So if there’s a Log-Log specification, the coefficient’s essentially

going to be last in series, but if it’s a Log-Lin model, you’re gonna have to

interpret them accordingly as a Log-Lin model. The second part of what I wanna emphasize about mean specification

is that mean specification actually has implications for the testing of some kinds of things that

we have a lot of interest in in social science research, such as curvilinear relationships between an X and the Y, or interactions between two X terms. So specifying the mean

function, the first stage, accurately actually is pretty important for these other

considerations that we usually get into in research. Let me highlight why. So what I’ve done here

is to essentially assume that the Log-Log model is accurate. We’ll talk about that

assumption just a little bit, and so here are a set of Log-Log models. There’s only one X, and

there’s a Y on the other axis, and three different Log-Log models. And I’ve reproduced

those same Log-Log models in Log-Lin specs. So what you have here, these curves, are essentially the same models, but the axes are scaled differently. So you have a linear axis

here and a log axis here. And then what you’ll notice here, is that these same models in

this space will look curved. So if you were to use a Log-Lin model and your true data would

fit a Log-Log model, then what you’re gonna get is these artificial

curvilinear relationships, and in fact, the typical type

of quadratic curvilinear term that we sort of tend to model is one that I’ve tried to sort of add on to this particular curve, so you can actually see

that this particular line sort of proxies, and if

you have a set of data distributed here in the

middle of the screen, this line would proxy

that data pretty well. So you might end up getting these sort of artificial curvilinear relationships if you don’t have the

mean specification right in the first place. Now, one of the things

you can actually do, which I did as a fun exercise, was to actually see if

there’s any evidence that this is happening. So what you can do is, you can use Taylor series expansion to

examine the expected coefficient of this kind of misspecification. So if the data’s actually Log-Log and then you’re modeling as Log-Lin, then you could say, okay, what would be the coefficient of X and what would be the coefficient of X squared

that you might actually end up estimating as a result of this. So from this Taylor series expansion, essentially you get two

coefficients for the linear term X and the square term X squared if you were actually modeling

this in Log-Lin space and you were trying to do

a curvilinear relationship. And essentially the beta

here is the coefficient of, in the Log-Log model,

is the coefficient of X, and the A is the mean

of the variable X, okay. So you get this

relationship where beta one is two beta by A, and

beta two is minus beta by two A squared, and from

this what you can really do is get a relationship

between what this beta one is gonna be and what this

beta two is gonna be. Along with the mean of X as

part of that relationship. And what we did in the

paper was to really look at all the research in strategic management that uses square terms to

model curvilinear relationships in count models, and sort of see if those square terms, the

coefficient of the X and the coefficient of the X squared, if they actually had this relationship, this hypothesized kind of relationship, which also involves the

mean of the variable X. And what we found is that on average, it turns out that in our field, this is exactly what

seems to be happening. That most curvilinear

relationships are essentially Log-Log relationships

that are now being modeled in Log-Lin space and

then you, lo and behold, get a curvilinear relationship. And in fact, less than 20% of the results turn out to be true, or appear to be true, curvilinear effects, so

we seem to be modeling a lot of otherwise linear relationships through this sort of mean misspecification issue that I highlighted

earlier as curvilinear effects. So the recommendation

in terms of what we do about testing curvilinear relationships, therefore, is actually quite simple. Recognize first of all that this Log-Lin specification

has major implications for testing curvilinear relationships, and linear Log-Log relationships are gonna manifest as

curvilinear in Log-Lin space, and so it’s important to choose the right mean specification, so all the things I said earlier about mean specification apply. But then in addition to that, if you really want to model

curvilinear relationships, and you’ve actually found

that the Log-Log model is the right model for this set of data, then use the squared term of log X, which is the square of log X as a whole. Remember, you shouldn’t use log X squared, because that’s simply two times log X. But you essentially take the square of the log term as a whole, and you wanna interpret

these results with some care, because of all the issues

that are involved here. Let me now turn to modeling interactions. And the first question that I

really have to put out there is what is really an interaction? And if you truly think about this, what we mean by an interaction is the cross-partial

derivative, all right? So essentially it’s D squared Y by D X one, D X two, so

what we’re really asking is when both X one and X

two simultaneously change, is there sort of an additional effect of this relative change on Y? The simultaneous change. Actually let me back up. That last sentence is not needed, sort of you can just stop at D squared Y by D X one, D X two, okay. So if you had linear

terms in a regression, this is quite simply the

coefficient of the interaction term if you put it in there. Because the cross-partial of

all the other linear terms basically become zero. So you have a series of

some coefficient times X, and then the moment you

have D X one, D X two, it’s gonna be the, either X one or X two is not gonna be in that

particular term, right? It’s going to be coefficient times X one or coefficient times X two or coefficient times X

three, and so automatically all of those are going to evaluate to zero and it’s only the interaction term whose coefficient is gonna manifest as the cross-partial

derivative and that’s why our usual sort of

application of interactions, we only focus on the interaction term. Now, there’s something

different that happens when you have nonlinear models. And what I mean by nonlinear models is that the model

specification is no longer sort of this sum of the linear terms as we have in a regression. So here, the cross-partial,

the true interaction is often not simply the coefficient

of the interaction term, and in fact, in general you could say that the cross-partial might depend on all the other X variables

and their coefficients. And this is a very well-recognized issue and some of you might be familiar with this issue in the context

of other nonlinear models, such as choice models, for example. And count models are essentially nonlinear in their functional form. So you might expect the

same phenomenon to manifest in count models and they actually do. So both Log-Lin and Log-Log models contain an implicit interaction even without the introduction of an

explicit interaction term. So if you think about the Log-Log case, and you’ve got sort of a

set of additional variables, but the focal variables

we’re interested in are X one and X two, these

two independent variables, and we say log Y is in

this functional form, which is the Log-Log functional form. Then the true interaction, the D squared Y by D X one, D X two,

evaluates to this value. So essentially it’s

the expected value of Y divided by X one, X two,

times the coefficient of X one, X two, so

without any interaction, there already is, without

any interaction term, there already is an

interaction sort of buried in this functional form. Now into this Log-Log model, if you actually add an

additional interaction term, that interaction term adds more complexity to what the true interaction is. So the true interaction,

essentially the cross-partial, is not simply that. Now it’s a much more

complicated expression as you can see down here, right? A whole bunch of other

variables interacting in various ways sort of play a role in specifying this true interaction. So what do we see in practice about how interactions are modeled and tested in count models? And again, I’m thinking about my field, strategic management,

and the papers I’ve seen within that field. So rarely do we see true interactions actually be discussed or

interpreted in these models. In fact, the majority of studies simply use and interpret

the interaction term, often with the default

Log-Lin functional form, so there’s sort of a

couple of issues layered onto each other here, and about 50% of the cases,

our estimates suggest that there is actually

no true interaction. That this is not an

interaction that would actually withstand the scrutiny of a cross-partial derivative analysis. And last but not least, we could attempt to actually graphically interpret this moderation or interaction effect, and look at it over the

range of the sample, and actually use the functional form off the count model that we’re specifying to evaluate what the

value of the predicted value of X would be, or

predicted value of Y would be, sorry, for different ranges of X, and actually look at it graphically and sort of try to interpret

it graphically in that manner. Hoetker, as well as Wiersema and Bowen, sort of talk about this kind

of graphical interpretation, the context of other nonlinear models as an approach that can be easily extended to count models as well. And now let’s turn to to

distributional assumptions, the last topic of my

little mini-lecture here. So when thinking about

distributional assumptions, what we’re really sort of focusing on is this idea that the

count dependent variable can only take on

non-negative integer values, which means that we then have to use a discrete distribution to describe them. Now one thing that I would point out is that with very large count values, and for me, sort of the rough metric is if the average count

is in the 25 to 30 range, and there’re not a lot of zeroes, then the estimates from an OLS are probably quite a robust alternative to using a count model. So I think that one should

very strongly consider OLS because there are lots of advantages to using an OLS model. Lots of other kind of tools and robustness tests and

so on that can be brought to bear and there’s a

lot, it’s much more stable and much more reliable model in many ways. So one could certainly consider using OLS. On the other hand, if there’s a small mean for the count-dependent variable, or there’s a large

proportion of low counts, including zeroes, then perhaps an OLS might be inappropriate. It could very easily

lead to biased estimates and therefore one should use

discrete distributions instead and Poisson and negative

binomial are the two that are the most commonly

used models in this context. I should say, the two most

commonly used distributions. Invariably, when you use

one of these distributions, the model gets labeled

by the distribution, so when you use a Poison distribution, we say it’s a Poisson model, and when we use a negative

binomial distribution, we say it’s a negative binomial model. Let’s start by looking at

the Poisson distribution and this idea of over-dispersion. So one of the features of

the Poisson distribution is that the mean of the

Poisson distribution is equal to its variance. So that lambda that we saw earlier and I described as the mean

of the Poisson distribution, is actually also equal to its variance. And this property is

called equi-dispersion. It turns out that

equi-dispersion is something that we are assuming, and

a lot of real-world data actually don’t contain

equi-dispersed counts. So what we then uncover in real-world data is this phenomenon of over-dispersion, where the variance of the

data is greater than the mean. It’s much more common

to see over-dispersion than under-dispersion. And this is very common, as I have mentioned earlier, in data. Solution to this is to actually use the negative binomial model,

suggested by Long and others, and very often used in

actual empirical work in order to deal with this problem. I’ll talk about binomial

models in just second, and what you’ll find

is that binomial model is a little bit more flexible. It allows us to have means and

variances that are different. So what I’d like to bring our focus to is not simply a question of whether there is over-dispersion

or equi-dispersion, but what that means for the consistency of the estimates that we

get from such a model. So it turns out that Poisson models, as well as the commonly used

negative binomial model, which is the NB-2 model, I’ll say more about that

in just a little bit, they’re both from the

linear exponential family. And so coefficient

estimates from this family turn out to be consistent,

so long as the mean is specified correctly. So refer back to my discussion earlier about correct mean specification. However, the problem with

this over-dispersion, where you’re not actually

finding that the variance of the data is equal to the mean, is that the standard error

estimates may be inconsistent. So essentially what we’re saying, is that if the distribution

is not Poisson, or in the case of negative binomial, if it’s not a negative binomial, then the standard error

estimates may be inconsistent. And in particular, in the Poisson case where you have over-dispersion, the variance is larger than

what Poisson is allowing, you might end up estimating relatively small standard errors, which is essentially

something that creates a bias towards finding significance

in your results, even though the significance

might not truly be there. So the key issue here

that we need to focus on is the consistency of estimates, and not just a sort of

vague idea of realism as to whether a particular count model sort of actually maps

on to true data or not. I think this idea can be

underscored a little bit more by talking about the negative

binomial distribution, and ultimately, what we should do about distributional assumptions. So it turns out that negative

binomial distributions are essentially a modified version of the Poisson distribution

where the variance is allowed to be not equal to the mean. In general, the way we deal with this is by having a functional

form for the variance, where the variance is a function of the mean, so you have

mu I, which is the mean, but also some other parameters. So there’s this parameter alpha, which is some parameter

that can be estimated from the data. There’s also this parameter P. And the parameter P is one

that we specify in advance. If P is specified to be one, then that yields the NB-1 model, the negative binomial one model. If it’s two, it yields the

negative binomial two model, etc. Now this NB-2 model is what most people go around calling the

negative binomial model, it’s the common version of

the negative binomial model. Turns out that that NB-2 distribution also belongs to the

linear exponential family, just like Poisson, and so therefore it’s gonna yield consistent

coefficient estimates. So just as in the Poisson case, the standard errors of the NB-2 model may also be inconsistent

if the data aren’t truly distributed as NB-2. But this concern is

mitigated to some degree because of the alpha. What the alpha does is

allows over-dispersion. So it allows, and this

parameter is specified, is estimated from the data, so it allows a fair

degree of over-dispersion. So you might believe that

the negative binomial model allows itself to conform more to data than the Poisson model does, and therefore, perhaps this concern about inconsistency of the standard errors might be lower in the

negative binomial case. Now, I wanted to recommend a couple of different approaches. One that sort of takes on the

negative binomial seriously, but then also provide an alternative. The first thing that I recommend here is that the mean function needs

to be specified correctly. All the stuff that we do about

distributional assumptions matters very little if the mean function isn’t specified correctly. Now, of course, the true mean function may never be really known, but at the very least as researchers, one of the things that we can do, is to try and compare

alternative mean functions and see which one fits the data better. And as I pointed out in almost every case that I’ve seen, in my own data, it’s always the Log-Log model that tends to fit the data better. So starting with that,

one of the things that, I think a step that a lot of people skip, which I think is a very valuable step, is to actually estimate the

Poisson regression model. Now the standard errors in this model might be inconsistent,

but at least it generates a set of consistent coefficient estimates, so you get some idea of what

the coefficient estimates are. In terms of the Poisson

model as a very robust model, it’s relatively more robust than some of the other models,

and it’s a simple model, so you tend to sort of put

fewer constraints on the data, and you get a relatively

good, sort of first take, of what the coefficients look like. And then you might also compute and report measures of over- or

under-dispersion in the data, and really try to get your hands around how much of a problem is over-dispersion in my particular data? And of course, if you

have over-dispersed data, you need to do something about that. There are two approaches

that I’d recommend. One, of course, is to use the NB-2 model, the commonly-used negative binomial model if the data over-dispersed. But also, another approach

that I am quite a fan of, which is to continue to

use the Poisson model, as the relatively simply and robust model as I described earlier, but to scale the standard errors, to essentially deal with the

standard errors separately. To increase, to ensure the consistency of the standard errors,

particularly by using something like the

Huber-White standard errors, which are generally

considered to be quite robust, and therefore consistent. Now there are a couple

of specialized models that one could think

about within the context of count models, and one

involves count models with panel data. One of the neat features about

count models and panel data is that very often when one controls for the unit specific

effects in the model, which is typically what we

do in panel data analysis, the residual over-dispersion

might be quite small. In fact, there’s some theories about why do we see over-dispersion, which sort of arise from

these unit level differences. So they’re actually

differences between units and unobserved differences and that’s why we end up with over-dispersion. And so once you sort of control for those unit level differences

by actually controlling for the unit itself, you might

have less over-dispersion. But the second thing that I really want to almost warn you about, is that there is a fixed effects negative binomial model that has been derived

and is used very commonly in a lot of different settings. But there’s a real problem with that fixed effects negative binomial model. The problem is that in

order to condition out the fixed effects, a

simplifying assumption was made by the scholars who put

together that research, and essentially what that

simplifying assumption does, is it models the fixed

effects in the variance term. Now, most of us, when we think about a fixed effects model, almost never think about fixed effects in the variance term. It’s a very unusual type of specification. Usually, we think about fixed effects in the mean term, in the average. And so I, one would

strongly sort of advise against that particular

fixed effects model. I think you have to have very

unusual or unique assumptions about what you’re looking for in the data in order to use that fixed

effects specification. Instead, there is another specification which sort of does something similar to what we discussed earlier, which is to compute robust standard errors in order to eliminate issues

relating to inconsistency, and it’s a fixed effects

Poisson model effectively. It’s called the quasi-maximum

likelihood estimator, so the fixed effects quasi-maximum

likelihood estimator, in Stata as implemented using xtpqml, if you find that particular Stata command, that’ll explain what this

estimator’s all about. And I very strongly advocate

using that if you’re looking for a fixed effects panel

model for count data. And last but not least, I

want to also give a shout out to zero inflated models. It turns out that very often

when we have count data, we might have relatively

large frequencies of zeroes relative to the rest of the distribution. So the count often are not distributed in the way that a typical Poisson or negative binomial distribution

might be distributed. And so using zero inflated models, either the Poisson version or

the negative binomial version, sort of helps deal with these

excess zeroes in the data. Really valuable to use when

you have lots of zeroes, which are disproportionate to the number of counts in your data. So, just to summarize really quickly the issues that I sort of raised in this short video lecture, and wanna highlight for you. First of all, pay attention

to mean specification, especially to the functional form. I think that’s the number

one thing that I think about when I think about

doing count models well. Particularly this Log-Log versus Log-Lin specification choice. The default ends up being Log-Lin. It might not fit most of the phenomena that we’re looking at. And then, based on that,

to appropriately model curvilinear relationships

as well as interactions. And finally, to pay attention to distributional assumptions. And typically when we think about this, we think about Poisson versus

the negative binomial choice. Both of these distributional assumptions yield consistent coefficient estimates. It’s important to recognize that. However, there’s a

challenge with the estimates of the standard errors,

especially with over-dispersion. The Poisson standard

errors might be too low. The estimated standard

errors might be too low. And so we could address in

a number of different ways, and one of the ways in which

you could address this, of course, is to use the

negative binomial model, but also we could address this by using Huber-White’s standard

errors, for example. And last but not least,

again I wanna highlight this issue with panel models, that especially for fixed

effects panel models, you don’t want to use the

negative binomial version of that. Instead, use the fixed effects quasi-maximum likelihood version, which is actually a very

robust and very nice model. Thank you, I hope this was helpful. I look forward to seeing you sometime.