Replications and Reproducibility in IO-OB-HR Research: How Are We Doing, and Can We Do Better?


– If I was to just go through my slides it would be just short
and not as interesting so, feel to interrupt, you
know, at any point in time. I’ll try to manage the discussion. I do have some things to say and I feel like before I start, I feel like I have to run off and
lawyer up, as an editor. (audience laughs) And also, as an editor of one
of APA’s largest journals. I hope I can also provide perspective that’s a little different from
the APA’s journal and others but the perspectives I’m gonna share are primarily around… So I’m gonna sort of play
that from mobile perspective not just as an author, as a scholar, as an editor reviewer as well, but I’m gonna try to
basically get into some evidence in terms of what is, what I believe is sort of strong evidence versus
some doubtful evidence. So it may not be, I think
that they pull the issues of replication or replicability, oh sorry I’m running ahead. I think we need to
approach it like scientists in a sense that just because somebody has published something suggest
we don’t do science right doesn’t mean we should
accept it at face value. And yet there is evidence that suggests we do have some issues that we need to address so, I don’t subscribe to one position or another. I just think that we
need to think about it and self improve the
science as we go forward. So that’s the kind of
perspective I’m gonna be taking. I’ll talk a little bit about replication, a little bit about reproducibility. Where just for, so we
are on the same page. Replication is really about
somebody has done a study and then you conduct
another study to try to see if you come up with similar results. Reproducibility is
somebody has done a study you basically verify the result so that’s you know, people can
define these differently but just wanna say that
before we get going. But take a look because we have different subdiscipline of management here so I wanna make sure we’re on same page. Yeah so I’ll spend some of the time talking about the evidence and looking at it with a
critical eye if you will. And then I’ll also try
to think of solutions for how we may improve
the way we do science. Some of the solutions I
think are already in place others are you know,
still a work in progress. And I should mention up front that you hear the term crisis,
you know the reproducibility, replication crisis, and different crisis. I think the fact that we’re
having these discussions both in the literature
and conferences elsewhere, I think for me that’s an
indication we don’t have a crisis. We are actively engaging in self improving and self correcting the way we do science and that’s a positive thing. Doesn’t mean that our science is perfect, but I’m not one to subscribe
that we have a crisis. So I’ll say that up front but you’ll see that I will share some evidence suggesting different opinions on that. So let me start with Nosek, I don’t know if that Nosek
article is gonna come now, we’re three for three in
terms of summarizing it. So I’m not gonna spend a lot of time because I think both Joe and Rich talked about it a little bit. What I would add to the
conversation is the fact that this article is not without criticism and Joe and others have
published their own opinion on it and he mentioned maybe
25% versus, you know, so even if we are 75% successful that still could be a problem. Well depend, depend how you look at it. Alright I wanna highlight
one comment on the article by Gilbert et al, that noted some note worthy problems
with the Nosek article. For example, you know trying to replicate study of American stereotypes
of African American using Italian sample,
now there’s a rebuttal, a reply to the comment that you know, also try to defend some of this. But there’s been some
problem with Nosek article. Other things that I would
mention haven’t been discussed. There is something to be
said about the way they did that study, I mean they basically have, they basically supported graduate students and scholars to try to
engage in this effort. There’s some issues of
conflict of interest that could be there. I should also mention that
this is not the only effort for replication, there’s many lab studies. There’s multiple efforts like that. Social psychology, the
journal of social psychology actually had, I think they looked at
10 very, to reach as earlier point they looked
at 10 classic findings of social psychology
and they were actually able to replicate it pretty well with much higher percentage. So there’s mixed evidence
out there in terms of the replication issue, I
don’t doubt that everything, I don’t think everything replicates but I think that, my own
sense is that Nosek’s article is a little overly pessimistic. And they have a stick, they have sort of, they wanted to prove something and I think we need to look
at it a little critically. Alright so, and feel free to
interrupt and jump in okay. Otherwise I’ll kinda go forward
and if you don’t jump in, I have some places where
I’ll ask you for questions. Yes, Jeff? – [Woman] Catchbox. – [Man] The catchbox. – Here. – [Jeff] There you go.
(applause) – Alright Jeff. – [Jeff] I maybe should
know the answer to this but were the studies that
were in the Nosek paper, randomly selected to be replicated or were they ones where, I
find this hard to believe I’m gonna see if I can replicate them. – Yeah, I don’t know, I don’t know. I haven’t read it carefully enough. There’s a lot of issues in my
opinion when you look at it. For example, I know one
scholar in management that was one of the people involved and you know, that person will
try to replicate something in learning and memory and
he’s not a psychologist. He’s not an expert in that area. For me, somebody coming from, different field try to replicate us there’s some issues there,
you know what I mean? But like I said, it’s a good effort. It’s still the part, it’s a
good way to kinda go about it but there’s criticism of it and there are other aspects out there. The other thing I would
mention about Nosek is that one thing that does bother me a little bit is that you see a lot
of articles in our field that the beginning couple pages would be, we have a crisis and it would take this as a word of god, you
know like this is like, it’s well known it’s been publicized and we have an issue and
I think it’s important for us to look at it with a critical eye. That’s my only point, okay. – [Man] (chuckles) Okay, – [Student] Good catch. – [Man] One handed, yeah. (laughter) – No, I think it comes back
to, with replications again, some of the things that
we saw when we did in SMJ is almost the opposite
of P hacking going on and that the sexy
replications don’t replicate. And to find something that’s there and sort of do the same thing and find it. That’s a little less
fun it seems to people. And so they almost go out of their way. – I’m sorry the sexy replication,
the original studies of. – Yeah, the sexy replication
cast out on the original study to actually go to the original study and find it with all that effort in that. – Yeah, well I’ll go back to some issues that may shed more light on that. So let me move forward, so one. Yeah, so Scott Maxwell and his colleague, this is just two articles. I really like the way
he approaches this topic because he looks at the
question of what replication is very thoughtfully in my opinion. So this is just couple of studies, one in American Psychologist, where you’re basically saying look, what are we looking for when
we’re trying to replicate? So if Steve does a study and find that the relationship
between X and Y is .5 and then Paul goes and does
a study and he finds .49, is that a replication? You know, or if the effect
size is .2 in one study and versus D of .2 verus D of .5. What is a replication and if you, and then in a follow up
psych method article. Anderson and Maxwell basically said there are multiple criteria
we could apply there, okay. From confidence interval
overlapping et cetera, et cetera. So, and given just what we know about sampling error and
distributions of effect. It’s not that simple to even know what replicates and what not. So that’s all I wanna say about that is you can read the article
and there’s more in there but it’s not that easy to
replicate to begin with so even the benchmark
for replication is high and that’s one of the message and then there are multiple
ways of looking at it, okay. So that’s Maxwell’s
article, now the other point I wanna say about replication is that it’s not an equal playing field for different subdiscipline
of psychology in this case. And I would argue even within management, some things are more easily
replicable than others. So for example, in this study
Mitchell who is actually publishing a Perspective of
Psychological Science 2012, and what they did is
they basically correlated effect size in the lab
to similar effect sizes of correlations in the field and, I’m not just showing you
because I’m an eye psychologist (audience laughs)
and he chose really well here. But you see variety here,
you see social psychology you know is not doing as well as IO. That negative relationship
with developmental is bizarre to me, I don’t
know how they came up with a negative
relationship, in other words, the more likely you are to find
a positive effect in the lab the less likely you are, I
don’t know how they found that. But I think it’s important to
keep in mind that distinction. Now… It was mentioned earlier
about APA versus APS. So APA actually started, a few journals started doing
those replication sections it’s online only so JPSP’s doing it and Journal of Experimental
Psych Journal started doing it. And in JEP General they actually, I think that they require
preregistration to do replications and if you go and look at that. I haven’t done it for a year but I’ve looked at the first year or two when they’ve done it. They actually found that a lot of that research didn’t replicate. Now to your question, Miles right? Is it because we select studies that we think are not gonna replicate and then find it or is it not, but I think the other lesson from this is one of the reason I think
that in IO psychology and OB a lot of our studies
may be more replicable is that the construct validity
of the studies is hard. So we’re not just going
after internal validity and other aspects of validity but the aspects of construct
validity that is higher. If you look at the original
Cook and Campbell defined construct validity differently than the Shadish, Cook and
Campbell, the newer version. One of the new aspect that they added is that part of construct validity is the mimicking of the real phenomena. It’s almost like lipping
into external validity and I think that if you compare the kind of research we do
with the phenomena in mind it’s more likely that you capture it in a way that other people
can pick it up and mimic it. Whereas in more basic aspect of psychology it may be that it’s harder to mimic it because the construct validity of the original studies are lower, okay. A because, you kind of have
to make some conceptual leap into the phenomena, the
phenomena is not as easily found. Or because of other factor,
maybe like in JPSP for example, which is sort of the holy grail of social psychology in a way. They emphasize internal
validity above all else and I think that what’s
one of the costs may be in that field is that you
don’t pay enough attention to capturing the phenomena carefully. So that’s one of my own inferences. I don’t have anything to back it up except you know, reading thousands
of papers in our field and seeing how important that is. But I think that’s one of these. I’m happy to hear other thoughts on this but any other thoughts on that idea? Alright, it’s either overly controversial or not controversial
enough so I’ll move on. (laughter) Yeah so let me move to the journal side, so this is an article from
the centennial issue of JAP last year and Cortina,
Aguinis, and DeShon you know, after much toning down in
the processes still said that there’s a problem in our journals in a sense that we look
too much for novel things and we underemphasize rigor in terms of like replicability and so forth. Well I think that they’re
partially right, ‘kay. And sorry Jason for throwing
AMJ and I know that AMJ has a broader mission statement that talks about the
fact that theory testing can be a contribution and yet, those of you who publish a
lot and review a lot for AMJ know that the, a lot of the reviewers
emphasized theoretical boldness and novelty above other things. And it’s not just AMJ, JAP
I work really really hard with the AE’s particularly
to try to make sure that we realize our different
kind of contribution that could be on equal grounding okay. So testing a theory
that’s never been tested can be a really important contribution. And yet, you know if you
wanna be theoretically bold for one reason or another
we sometimes confuse it, and it’s not just AMJ
it’s JAP, other journals. Sometime reviewers view it
as a lesser contribution. So that’s a case where I would argue that journals can play a role in terms of appreciating multiple
kind of contribution and the fact that replication particularly constructive replication
can be quite beneficial. Let me give some example of
constructive replication at JAP so, and you can find other
journals for these as well. So the first paper has, in my
opinion, has absolutely zero theoretical contribution
in terms of novelty. And yet it’s really important. What they did is they basically came up with a three hour workshop of
manipulating goal orientation and in three hours they
were able to basically increase substantially the
chances of reemployment of Dutch unemployed at the time. Now, it also hit in the right timing ’cause it was right as the great recession was setting stage and so forth. So that’s an example of a
constructive replication right? You kind of show the journal’s
ability of the phenomena that we already know, or the
theory that we already know. It’s not theoretically bold,
it’s practically bold, okay. So again, an example of that. Schmidt and DeShon, you know sorry Jeff. Well they kind of qualify
some of Jeff’s findings right? They basically showed that Jeff is right in terms of self efficacy
negatively predicts performance when the task is performance is ambiguous but it positively relate
to subsequent performance when performance is unambiguous, okay. So they kinda qualified
that debate between Vancouver’s work and
Bandour and Luck right. That’s sort of a constructive replication it shows when in effect
replicates when it doesn’t, okay. And then Shultze et al, they basically had direct replication
that failed to replicate of a classic escalation
of commitment finding. They basically showed that
no it’s not what we thought. It’s not that we escalate commitment because we search for bias information and the reason that it’s not
the search for bias information is I think it’s because
it’s too obvious for others and you try to appear fair but
it’s more implicitly based. It’s how we evaluate information
which is more affected by implicit biases that we don’t control. So they failed to replicate
in couple of studies and then in couple other studies showed that detailed alternative mechanism. So these are just some example
of how we can do replication a constructive way in our field. And I think those are pretty well, at least some of these are well cited. That the Schmidt and DeShon one I think it’s quite a bit citation because of the contribution they made. It was just a research reported JAP that you know were able to
basically do some of that. So anyways, so just to kind of summarize my thought of replication. So there’s certainly examples of findings that don’t replicate and I
think that the previous two, I think Joe you highlight
some that didn’t replicate like the, you know, the
power poles and others. You know, there’s certainly
examples in the literature of things that don’t replicate. But there’s likely variance
across different phenomena, different subdisciplines that
we need to take into account. That’s one of the point. Now in terms of top down
and bottom up influences so, you do as an editor, even
though I’m not lawyered up like Rich suggested, you
do hear a lot of criticism. You should do this, you should
do that, you should do this. I do think that we could do
some things as journal editors and journals in a policy to affect things but there is no one simple solution there. For example, if I was
to write an editorial that you should submit
replication studies. It doesn’t mean that reviewers
are gonna agree with it. It doesn’t mean that we’re gonna see more. And the thing is though,
the people who argue that we don’t publish
replications, that’s just not true. I gave three example I can give more. There are quite a few, we don’t
always call it replication but you know, there’s reason
why we can do meta analysis. You know the fact that there
are different variation of it. The other thing I would
mention, in that sort of, also about the three
way interaction is that it’s harder to do direct replication, in the more applied
aspects of our science. So our science has more basic all the way to applied kind of continuum. So to your question earlier, about how do you do replication like that? Well if I do a big study on expatiates, I did this study on newcomer’s
expats okay, so we looked at I don’t know, 70 expatiates, as they joined the
assignment over time, okay, in one large oil and gas company. What would be a direct replication of it? I don’t know, I don’t know how you can directly replicate it. So that’s a complexity that
is very hard to deal with. And then there’s those more
complex kind of effects right. So very few studies in our field now are just about by variate relationship. Typically you see some
mediation, moderation, those complexities, it’s very
very hard to replicate that. It’s very hard to find those things but you see so in terms
of editorial policy. It’s not editorial policy
but I do see reviewers and editors asking for, hey that’s a really nice three way
interaction, can you replicate it? Or at least explain it theoretically. So you do see a little bit of a push in terms of trying to, of demanding more from those complex
findings but I also think that replication is tough for when it’s more complex phenomena than
simpler phenomena, okay. Yeah, so that’s all I have. Yeah questions. – It’s a easy catch. Hey so, one of the things
I’m thinking about this is well our field does have this culture of origination bias right, so
whenever you publish a study, you supported something,
then that study is hold as the golden standard you
kind of put it on the pedestal. – Yep, yep.
– Right. So anyone who, even it’s
a very cool question. Like it’s actually a very
worth of re-exploring. And anyone’s gonna try to do a new study or replication on it, so they hold against to the old finding. You have to say, why your new finding, like first of why your study is better, or why you have you add new things to it. So what do you see about this? – Now that’s a great
point, and as an author I face that often where, you know, there’s one study that found x and people think, there’s
a tendency of thinking we already know about
it, nothing new move on. And that is something that it think we do all share that bias, as reviewers, authors, and editors. And I think the question then is, what makes it an important
contribution to replicate, or constructively replicate. – I mean because statistically
you can always argue, hey your sample is a
convenient sample right. So then it’s not
necessarily representative. And I’m not even talking about
crossing different contacts. It’s just even from the same population, it’s a convenient sample
so that give grounds for people to do a exact replication? However, exactly replication
may not be very efficient. – So let me give you one example. Recently, I was, I don’t
know, fourth or fifth author on a paper I got published in AMJ with John Matthew and his student. Where one of the thing we argued for, is an interaction between I think it was unit level empowerment and
psychological empowerment predicting job performance, and I had a previous paper in 2007 that found that interaction
in a different way. So one of the thing that we argued and thankfully the AE’s
and reviewers at AMJ were open to that argument was saying, “Look, this is a very different context “and because it’s a different
context it’s more complex” Sort of dynamic nursing kind of situation. We argued for a different
form of interaction, okay. And now we have N of 2 that found interaction in different ways, we had sort of an explanation
for why it’s important to replicate it and why we would
expect something different. And luckily reviewers,
I think what helped us is that context, the
context was really important to replicate it in. It wasn’t an easy battle but
that’s the kind of argument I think you have to
make to replicate right. So, on the other end
of the spectrum though. Like in medicine… The question of should
we do direct replication, why it’s important, especially
when you’re dealing with, you know like drugs or things
that can really save lives, in a way it’s easier to
show why it’s important. In our field, it’s not always that easy. Nobody dies if you don’t replicate. (laughter)
Rarely do. You know what I mean but
it requires more thinking in terms of why is it
that it’s important to look at that, okay? So that would be one way to think of it. – Yeah well, not every study
that publishes deserves or merits replication,
this goes back to the point that Rich was making, I mean
what are the most important things that we really need to know. And we don’t have a great
deal of agreement on that at any of the particular
levels of analysis that are represented in this conference. So having some targets
around core phenomena that we really need to have
clarity about would help a lot, put some credibility around the effort, invested in doing some kind of replication or constructive replications which I agree are more likely to be found in our field. Rather than, anything
is equally meritorious. That would help a lot on the journal side. It’d help a lot on the
evaluating faculty side and the other untapped, if you want people to be replicating stuff, I
don’t know how many of you have masters level people
trying to do theses but a lot of the times
they don’t have a clue what they wanna do and a lot
of those things die anyway because it’s not a great idea, and if they were just to do
a constructive replication, you’d have a lot more raw
material to work with. – Absolutely, so yeah and I
think couple of other criteria that we can apply to this is, you know obviously finding
something important that begs more replication, that has real practical
implication I think that, like the example from the job seeking. That would be an example
where it’s actually not I mean there’s other job
seeking kind of study that did focus more on self efficacy and other things that
have done experimental, field experimentation but that’s one area where I think, you know and Paul, I know you were very
sympathetic to that as well in terms of if you do large
scale field interventions that’s easier to convince people the importance of replication, and it doesn’t always replicate. We publish couple of
papers of large scaled filled intervention that a lot of findings were non-significant but
it was still important to show that okay, so that’s one. The other place where you see it more, it’s more within study replication is, is in those complex phenomena like three way interactions, okay, or you know, or just complex kind of, especially if it can be done with relatively convenient
samples if you will. Where you see it less are the example of the expatiate studies,
that’s very hard, conduct studies like that and replicate but a lot of time those
studies in and of themselves have a replication component, okay. They replicate more basic research in a more complex phenomena. I have some other thing to
say about that later but yeah. – I wanna throw it.
(laughter) – No the whole point, it’s padded so you can toss it at each other. – Doesn’t trust.
– Yeah. – Liability to catch. Okay, so I agree that I
think it’s really good that you’re making the point that replication attempts
have to be of high quality. So they have to be highly powered. They have to be well designed. A lot of the OSF, in their 2015 paper, it was such a big endeavor, that they couldn’t really
do quality control very well but there have been lots of other studies that I highlighted in
my talk that I think are very high quality replications
that failed to show. But my question here is, how do you do a constructive replication if the original effect
is completely false? Like always false? ‘Cause it seems to me the way you define a constructive replication
is you’re able to say, it replicates under this condition but under this one, but
if it never replicates, then you can’t do a
constructive replication because it’s just not true full stop. – Yeah, so there’s couple
of points about it. First of all, there are
different definition of constructive replication out there. For me, constructive
replication is any variation in either the sample or the measures or manipulations
or anything like that right. So even if you, and it
could be a variety of that, but if the original study is wrong. Well I mean there are
examples like that right. I mean that’s, and I’ll talk about it more in the reproducibility
part where I think that comments, allowing
comments in our journal, I think that actually is an
antidote for a lot of this. I think that that’s a
case where you could, you know so Tammy just publish a study that showed that something
affect something else, do you think she’s wrong? And you do a series of study to show that but the example I’ve given from the escalation of commitments sort of did what you were questioning. It was an assumed effect
that they suggested was wrong and they did both the direct replication, the best they could, then they did a highly powered direct replication, in both cases they failed to replicate, but I think what made it
more of a contribution, they didn’t just show that
something didn’t work. They didn’t just show that
one mechanism is wrong, they showed that an
alternative mechanism is right. And I think that’s a more
powerful contribution in a sense that it doesn’t just apply negativity saying Gloriana
you are just wrong on that but it also shows when she might be right when she might be wrong,
you see what I’m saying? – [Man] Yeah but I’m worried
that if that’s the standard and if that’s more
likely to get published, now I’m incentivized to p hack
in order to get something. So basically, I can show
that the original finding is not true under the original conditions but then I’m motivated in
order to get this published,. I have to then show,
– So let me go back. – Another mechanism that is true. – To later because,
– Okay. – One of the other
suggestions I have later would speak more to that, okay. Which is about different journals, okay, that’s the other caveat
but I’ll get to that later. Alright, let me talk about reproducibility and then I have couple of more slides where I revisit
suggestions and we can talk about more question and
solution then, alright? So reproducibility, so couple of effort. So the, I don’t know how
to pronounce that name. Nuijten et al, they basically
develop this R program that kinda went fishing, literally, and look at sort of whether results were statistically correct or not, and they found that in about, so they looked at eight
psychology journal, including JAP finding
13% of inconsistency. About half of 30,000 studies
had statistical errors and in about 13% of the cases the conclusion were wrong, okay. Now there’s those two folks,
Daniel Lakens and Thomas, there’s these blogs where they kinda went and they found some real
problem with the R program. So, this again a case
where somebody claim that we’re doing irreproducible science, and other people question the validity of their methods, okay. PubPeer, that’s another kind of effort. It’s not meant to be anonymous PubPeer, it meant to be like, it’s just an open, you could go and post yourself and you know basically said
Jen Jen that study you did, I think had a mistake and
you, can we engage in that? Anyway, however in our field, particularly in IOBHR it was taken to be an anonymous
platform, it was really, I don’t know who posted
because it’s anonymous, but basically in the last, I think it’s quiet down a little bit but there were like two to three years where they would basically
contact primarily, well-known researchers in our field and it would almost be
like a public shaming, anonymous shaming, saying you know Paul, here is a list of studies
that Paul believes, he wasn’t actually there ’cause
he does careful research. (laughter)
– [Man] Probably all true. – And you have a mistake
here, here, here and here. And this such and such
study but there’s no way to engage in discussion
because they’re anonymous. In fact in a few cases, authors
responded non anonymously saying if you let me know who you are, I’ll send you the data
so that you can reproduce and see that I was actually
right but they went quiet. They went further so as
an editor for a while I would get those emails
that would be something like Arrow and JP at gmail.com
or something like that. And it would basically say,
the following paper was wrong that you publish in that
journal was wrong for the, you know, and they would also for a while they went after interaction effects, and in that case many times
they were actually wrong. Those interaction effect were right. As verified by multiple authors, but there’s no way to
engage in a civil discussion because it was anonymous, and when I raise that issue to them once,
once they finally responded I said, if you let me know
who you are we can discuss. They basically said, well
the cope prevent me from, cope is this, what is it? – [Man] Code of.
– Code of. (man stutters answer) – Something like that, so there’s a whistle blowing caveat there now, I’ll say it later, that’s not
really what cope suggests. You know, and in fact we’ve
handle quite a few cases, me as an author, as an editor,
I’ve handle a few of those where somebody ask, raise
issues about an article and often time that leads to corrections. It even led to a retraction, okay. So to say that you can’t approach authors, but if you don’t provide your name, then we don’t know where
there’s conflict of interest and all that so anyway those are efforts that I would put suggestive that there are reproducibility problem but
they’re very hard to evaluate because they’re anonymous, they have like statistical
errors and so forth. Moving forward, these are places, Jose Cortina, is on
the first one and then, you know they basically
find mixed finding. We do like in the mediation
analysis they actually if you report correlation
tables, you could kinda tell whether mediation is plausible and we found that we do
pretty well in terms of, matching indirect effect to
the pattern of correlations. But they’re correct in
saying that we’re not often very good at distinguishing
partial and full mediation so that’s one area that we have a caveat. The last two articles here basically noted some problems in terms of reporting of degrees of freedom and, SEM kind of test both in
OB HR and then strategy. I don’t doubt that in terms of reporting, we don’t do enough to report well. For example, the example of interaction. It’s still very rare to
report the correlation and descriptive statistics
with the product terms and therefore you cannot reproduce it unless you engage in those R programs that have a lot of problems in them. So that’s something but,
you know if you look at the main effect, and the
interaction and the plot. You can kinda tell
whether they match, okay. So there are ways for us to verify it but we could do better on that. And then in terms of the CFA’s and confirmed to fact analysis
structural question modeling. That’s a case where I think that there are mistakes out there, in terms of reported results, but they very very rarely
make a substantive difference. Very few of those cases
that were highlighted in those papers would change
the discussion section. Okay, so I would just point that out. Alright, so just moving forward. So, retractions. Leadership quarterly, JP,
had it’s first retraction, and the common theme there is that journals take in steps to do something about results
if we’re wrong, okay. I can tell you, and I have
this course to show for it it’s not easy, it is very
very hard to do retractions but now I can go to those
PubPeer folks and say, you know I had an anonymous request that I’ve addressed and, if
they’re right they’re right. If they’re not right,
we tell, in most cases people raise issues that are minute. So the response could be very easy or it can be done with corrections, but in severe cases it can
lead to retractions, okay. Commentaries, so give
Steve a lot of credit for putting a policy
in place for JAP 2011, to handling comments and replies. And we’ve published several of these under Steve’s editorship
and under my editorship and that’s a great way of
enhancing rigor in the science because basically the reply,
most of the comments are about, to your point Joe, you know, so it would be sort of here
is a problem we’re seeing in that study and then you
get the authors to reply. In this case as Zigerell
and Ryan and Nguyen so the original study was a meta analysis and they actually had a
mistake in one of the tables. The tables that summarize
effect size from studies, they had a mistake, fortunately for them, they did not have, that
mistake didn’t translate to, the meta analysis was right, the description of studies were wrong. So it was a way to correct the record, and the two of them to their credit, both the common reply got into
a very thoughtful exchange about basically follow to
our problems in meta analysis so there’s a way to sort
of correct the record and move the field forward in
terms of thinking about that. And there’s other example in T testing and other kind of back and forth. I think that that, what that does, both retractions and commentaries, and then also APA came with this recent version of the journal
article reporting standards. All of this together give us vehicles to report statistics more accurately but it also create a
climate of accountability where we have a vehicle to
hold author’s accountable but do it in a thoughtful
way that move things forward. Okay, so those are just
some other thoughts. Just to summarize reproducibility. I do think that we can
do better as a field in terms of reporting statistics. Those of you in the room
who work with me at JAP know that I spend a
considerable amount of time working with AE’s to address reporting. So every time there’s an R and R at JAP, I skim it the best I can and if I see that we don’t ask for certain thing, I try to, I ping the AE’s and we
try to correct that. I’m mentioning that
because all these solutions you see out there in terms of ways to address reproducibility
and replication and so forth at the end of the day
it’s about reviewers, and AE’s and editors working
to address those issues and it is very time consuming. We dealing with, I think Jason, you have what 1600 submissions a year. We’re at about 1100 now. You multiply it with R and R’s. There’s a lot of
possibility for errors there but I find that the most we can do is really create a climate where the editor and the associate
editors discuss issues and we try to address it, but
then at the end of the day if it slip through the
crack and it get publish and something is wrong
we have corrections, common replies, and when
need to, retractions. I think that’s the way
science should work, but we can’t expect
everything to be caught at that point. I don’t think there’s
widespread evidence for problem of reproducibility in our field. I think that happens occasionally, but I think that, you know the evidence
is more mixed on that. You can find your examples
and highlight them but I doubt it that that’s a
major problem in our field. In part because we don’t know. (chuckles) we don’t know how much
of a problem that is but that’s something
that I think as a field we pay more attention to and
doing better and better in. Any other thoughts on reproducibility? Before I move on. Alright, let me go over
some summary and suggestions and then we can maybe
talk a little bit about ways to move the field forward on that. Okay so in terms of replication. I really think it begins with, so one of the thing I would
say I like the most about this conference and the
whole exchange that we see in our field or not is,
it kinda move the needle. I think that in looking
back at our science, and I reflect particularly
on IO psychology, okay. So it started as a very practical field and if you look at the early 1900’s when it all started until
really the 50’s and 60’s it was really about applying
psychological principle to help organization
with practical problem. So it started with very much
being a very rigorous science focusing on practical problem, but then somewhere in
there in the 60’s and 70’s we start realizing that
we’re doing a little bit of dust ball empiricism,
and we start caring more about theory, and that’s
also when you start seeing that in sociology, in economics, and other places where theory
start being more critical. I do think that somewhere along the way we became a little too, too theory fetish in a way. I mean we became a little
too, we move a little too much to the side of theoretical novelty. And what we see now with
the open science movement is a push back to rigor which
I think is a positive shift. It’s not that we’re gonna
be dust ball empiricists but I think that it’s a nice shift to see. And I think that’s a good thing, but I think that goes to sort
of what do we appreciate? So not every replication
should be published. Okay, so there could be different journals that may be published so a little bit of the experimentation that JPSP and Journal of Experimental
Psych are doing where they publish online
version of replications. That could be one solution. There could be other journals that do that but you’re not gonna change the mission of journals like AMJ, JAP,
you know, Psych, OBGP, by basically saying, you just about reproducibility and replication because we also need to
move the knowledge forward. So somewhere in between there
between different journals different sections of journal
there may be a solution. Transparency of method and reporting I think is really critical. That’s a starting point
for any replication. If you don’t know what’s been
done, you can’t replicate. And that’s something that I
think you see improvement on. Let’s see… Research reports. I think that, I’ve seen
many cases where studies that I would consider more on the constructive replication side end up getting publish
as research report at JAP but probably would have
been rejected otherwise. It’s not always like that, and we publish research report
that don’t do replications, we publish Fisher article
that do replication so it depends how you look at it, but at least it gives you a portfolio of magnitude of contribution. So at JAP we have research report, featured article monographs
and then comments. So we have a portfolio approach. And I think that gives you a
little bit more flexibility in terms of how you address those issues. It’s not a perfect solution but it gives you a
little more output there. Other journals you wouldn’t see that for a variety of reasons but maybe we have other journals that come up that do more replications so forth. So those are all some thoughts I have. Other thoughts in terms
of improving replications if you will and I don’t know Joe, if I answered your
question yet completely but any other thoughts on that? Steve you can’t sit on
the football. (laughs) – [Man] And this is
kind of to Joe’s thing, this is gonna be something
that would be true in say social psychology
or maybe the kind of work that I do, it’s more lab based. Where often the idea is what you’re doing with a constructive is saying, so here’s this finding, I bet it might not generalize to this, I
bet there’s a moderator. And so you do the study
where you have one condition as the exact replication
and then another condition as the other level of the variable. Now suppose what happens
is, that first cell doesn’t replicate and
social psychologist said, yeah this happens all the time and this is one of the reasons they got to this crisis issue. Well to me the answer to that is, okay now you’re gonna
design your next study which is going to be focused
on alternative explanations for why you’re not finding an explanation. So one of the main reasons we don’t get nonsignificant results
published is because we didn’t focus on the
alternative explanations for the null results, if we focus on that then there’s fewer
things that the reviewers can come back and says,
oh but you got a null because of x, y, and z,
so you go to replicate, you go to essentially you’re now saying, I’m now thinking that that
original thing isn’t really true. I’m gonna do, study one showed me that turns out it didn’t work out
here, I’m gonna do study two. Try to show it’s not true, do study two. One of two things had happened, it could come out not replicated and then you’re on to something. And you’re onto a sexy thing
for the replicators right. It’s like, oh here’s four studies that couldn’t replicate this. And that’s a publishable
paper, certainly now in JPSP. And I think.
– So you know, so couple of thoughts and I don’t know. There are a lot of issues you raised. So let me just address couple of things. So first in terms of JPSP, it’s funny, the editors there reconsider
whether to keep it up or not. So one of the issues they’re facing and it’s because of the
nature of the journal and the studies they publish there is that it’s very very hard for them to separate the regular articles from the replication studies
because they all do this you know, multi study
and reviewers and editors really study with the contribution issue. And what’s the difference between a replication section study
versus a regular article? So that’s one of the
thing they struggle with. The other thing you mentioned about, and that’s something that I
didn’t mention on the slide but I should raise, I think
there is an assumption out there that significant results are
more likely to be published. (sighs) Maybe. I can’t tell you how many
paper I’ve seen rejected. I’ve seen, so as an editor of JAP, I’ve seen I think over 4,000 papers and if you have an associate
editor, I don’t know, it’s probably like, I don’t
know, over 5,000 papers by now. I have seen, I don’t have a number, but it’s a significant portion
of studies that get rejected where everything is supported. And I also have examples of papers that a lot of things didn’t get supported. Now it’s very hard to study it explicitly, but I think that it’s a misnomer that just because you
found a significant effect that that would get you
publish or not publish. – [Man] I agree with that actually. – The other thing that I would say, that if I was to do like a venn diagram of what accounts for R and R’s, I would say, I wouldn’t
put more than 20 to 30% of the chances of succeeding
and getting and R and R of whether things were supported or not. I think that much greater variance, probably close to 50% plus is whether you’re addressing
an important issue. That important issue can be an issue of replicating something
that we’ve seen in lab with a nice fill intervention because here is sort of, or what not, but I think that accounts for more. And one of the downside of the
open science movement I think is that it over blow the
importance of significant tests. Relative to the broad portfolio of what reviewers and
editors pay attention to. I think it’s important and it certainly is a non trivial portion but I think it’s blown a little out of proportion. That’s one of the thing
that I think and it’s very, you know I actually try
to help Jose Cortina get access to, he wanted to do a study that verify what I just said. For legal reasons, we can’t release papers for him to do that. So that’s, maybe APS can
do that, I don’t know. – [Man] I actually wasn’t
quite done with my comment. – Okay. – [Man] (laughs) Because what could also, and it’s basically, the
question’s just to Joe. What could also happen is
you do that second study and you do replicate it
and then you gotta go, okay, what’s making the difference? And you do the third study
to try to find the moderator and maybe you’ll get at
the end a clean answer, and maybe you won’t and if you don’t, you don’t even try to get it published. And if you do, you try
to get it published. – I see.
– And so, that’s the game, does that
seem like a reasonable game? I’m gonna throw it back, oh you got it. – [Joe] Yeah, I think
that’s a reasonable game it’s just that you don’t have control over whether the original
effect is ever true or not, so. I mean I’ve just done this
game with a student of mine where we got the original
materials from a study in JPSP. The original materials differed
from what they described happened during their
study so then we were like maybe the moderator is
whether they did the study the way they reported or the
study the way they did it. And we were excited to find that because we wanted to
find that as a moderator. We’ve ran the study, nothing happens ever. And then we’ve ran the study
again, they did it on Mturk, we did it on Mturk, they had 100 per cell, we had 300 per cell, it does not happen. It is untrue, I think the policy should be that this paper’s published in JPSP, I think the editors at
JPSP have a responsibility that if we submit this
paper for publication and if we demonstrate that our replication is highly powered and well
designed and well done. After the authors service reviewers and they have a chance
to give their two cents, I think they have an
obligation to publish failures to replicate of the stuff that’s published in their own journals. Otherwise, these findings
will never be corrected. I should say, I say that with, with the fact being that we are not going to submit it to journal, because I don’t want my graduate student to make enemies before he gets a job. – So I have a similar kind of, so I teach a method overview
for all our PhD students, in both strategy and OB
HR and I picked an article that seemed like something
would be very easy to do a direct replication like
a enter kind of study. I’ve been doing it for
four or five years now. Nobody ever replicates it. And when I look at the original article, it didn’t make sense to me,
there’s some flaws in it. Now, I don’t want my students
to start their career by trying to publish it, but I do think that having like a
comment and reply section allows you to do that. Unfortunately that journal
does not have that. There’s another journal, I
have a colleague from strategy who well actually found something wrong in another journal there, and that journal doesn’t have that out so I think that that’s where, if that journal
had a comment reply thing I would probably consider
it because I agree with you. That something that should be done. Now, I would never look at I like what Rich mentioned in terms of it’s not that the original
effect is true or not true. It’s evidence that is refutable, and then you accumulate evidence and you find whether that evidence it’s, you know we keep finding
the same thing or not but I never look at it
as true or not true. I don’t think it’s ever as simple as that but that’s maybe a language kind of thing. But I do think that as a culture it would be very helpful to
do what you suggested, right. But I think that the
alternative to that would be, and that’s something I might
consider do in that case is write an article, more
like the Thomas Schultz, kind of article and discuss
of commitment that explain why it didn’t replicate and show what might actually operates
as something that works. I think that would make
more of a contribution but there could be more
of a comment section for what you propose, right. Let me just talk about
reproducibility a little more. Yeah so, that’s why I have
a little more concrete thing in terms of cultural
statistical reporting. So simple things like, so we have a lot of those
mediator moderation kind of finding in our field now right. A mediator moderator model
of your choice of topics. The earliest requires that
authors report correlation table before they go into model
9 or 11 or 12 in that what is that SAS Macro? (Man speaks answer)
Hayes Model, yeah. Before you show all those process results, show us the regression
table or HLM whatever. So that by time we get
to the complex the simple lead us to understand the complex. Especially with mediation
it’s very easy to do test of indirect effect are
overly powerful for example. Okay, if you don’t have
the correlation table and the regression it’s
very hard to verify that. Stuff like that or report
statistics in regression, HLM, SEM, what not, if you report the unstandardized estimate
and the standard error, I can verify your T value, okay. So that’s a very simple
form of verification. You’d be surprised, I’ve been asking that, since being an associate editor, I’ve always ask authors to report that. I’ve seen quite a few mistakes where people had a star next to an effect that when you report
the B way standard error they don’t find that, okay, but that’s kind of reporting. These are things that are very simple. You have no idea how much
time I spend as an editor working with AE’s to do that. It takes time to build that but I do that. Encourage data sharing, that’s
a really really tricky issue. So we actually, I’m about to
publish an editorial at JAP. It’ll come through the net
when APA finish the production but one of the thing
that we added there is now APA partner with the
open science framework so you could do pre registrate report, you could share data and so forth. We don’t require it though and the reason we don’t require it, it’s really complicated in our field. So if you look at some of
the work Paul has done, with the military or
that Steve is doing now with NASA in the light, these are really sensitive data, okay. To be able to share
everything is very very hard, but if you report a correlation table you already share some of the data that allow reproducibility. Now, if you share the real data that allow reproduction of results,
that’s shared data but it doesn’t share, you’re not giving the whole forum out (audio
stutters) using it, okay. So you see movement on that. I’m very much in favor of doing that but I don’t want us to miss on cases where people have really important finding that for whatever reason they
cannot share the data, okay. So there’s a balance act there. And that’s in line,
I’m not saying anything that is not within APA policy. APA policy has that caveat that yes you should share your data, unless it illegal or other
reasons not to do that, okay. So that’s something
that I think would help to do more and you will
see it more and more. Not just in the APA’s journal
but also in our field, but in our field I think there are a lot of sensitive data out there. The data transparency stuff
that JAP’s doing, okay. All of those kind of things,
you see movement there but it’s a process that we as a field need to struggle with a
little bit so we do it right and we don’t rush into it in a way that could harm us in some ways, Paul. – [Paul] Gilad I was just gonna say. You know it turns out as you alluded to, to be virtually, very easy
just to take someone’s mean standard deviations
and correlation table to reproduce their data, right? As long as you’re only dealing
with the main effects, right. So two concrete things that might be done. One you talked about a little bit, but just require that
they put the cross product in the correlation table
because you can’t use that same technique to
uncover the correlation you have to have that that would help. And the other one would be just potentially to think about requiring that they put three decimal places on some of the correlations ’cause when I’ve done this in the past sometimes to get their
results you have to go back to their correlation table because in fact that half of a point that might
have rounded it up or down actually causes you to be unable
to replicate their things. These are, I mean, what does it matter that you put one more decimal place in. – Yeah so, I’m just thinking
of the amount of work I’m already doing, (laughs) we’re simple. I’ll leave that to the next editor to do, but I agree with you I mean
that was a good suggestion. Now the reason I’m not focusing too much on that interaction
term correlation per say is that if you know the main effect and you see the interaction
and the interaction term you could kinda tell whether it’s in the right direction or not. Not everybody can but that’s something that we should certainly think about. Those are good suggestions. So, I think I’m about out of time but I think that the overall message here that I wanna leave you
with is I don’t think there’s one silver bullet
that makes a difference here. I think it does take a
village in terms of ideas and discussions, I think
that multiple aspects that need to be considered but I hope that I shared some ideas that could be helpful. So thank you, I appreciate it. (applause)

Leave a Reply

Your email address will not be published. Required fields are marked *