Foreword by Jill Macchiaverna, Exaptive Media Specialist:
When we came across this podcast that so beautifully and concisely addressed many of the challenges our collaborators face, we knew we had to find a way to share it. A huge thank you to AIBS and BioScience Talks for allowing us to post their September 12, 2018, episode called “Big Data is Synergized by Team and Open Science.” Listen via the embedded media player or check the show out here, where it was originally published. We’ve also included a transcript below.
James Verdier: Hi, I'm James Verdier and welcome to the American Institute of Biological Sciences’ BioScience Talks which is a forum for integrating the life sciences. On the second Wednesday of each month we discuss the latest bioscience publications. And as a reminder if you'd like to read more point your browser to academic.oup.com/bioscience.
For today's episode, I'm joined by Dr. Pat Sorrano who's a professor with the Fisheries and Wildlife Department at Michigan State University and Dr. Kendra Spence Cheruvelil who's also a professor at Michigan State University with the Fisheries and Wildlife Department, and also with Lyman Briggs College.
They joined me to talk about three subjects that are very near and dear to the podcast and those are data intensive science open science and team science. More specifically though they discussed their article which describes the ways in which those three types of science can produce synergy when they're implemented together. I'll let them explain.
Hi, Kendra and Pat! Thank you very much for joining me today.
Dr. Kendra Spence Cheruvelil: Hi, thank you.
Dr. Pat Sorrano: Hi, great to be here.
James Verdier: Okay before we start talking synergy and about your work in particular, I was hoping you could tell us a little bit about the three main themes that we're going to be discussing today. So those are data intensive science, open science, and team science. And I recognize those are very broad categories that you couldn't possibly explain you know in the short period of time we have here, but if you could just give our listeners a little bit of an introduction to them in the way that we'll be talking about them today.
Dr. Pat Sorrano: Sure, I’ll start with data intensive science. So we define data intensive science is the science that uses large volumes of heterogeneous data and that these data are central to the research question. And you know data intensive science also goes by ‘big data’ and there's a lot of, you know, debate about the value of big data, but the reality in environmental science and ecology right now is increasingly our research is becoming more data intensive. So that's data intensive science.
And then open science is the science that is transparent, reproducible, and inclusive, and that results in publicly available data products and articles that are available to any scientists. And we also argue that open science is becoming increasingly common.
Dr. Kendra Spence Cheruvelil: And finally team science is science that’s done collaboratively, but it's more than just collaborative science. It's that that leverages the expertise of a really wide range of professionals so often very interdisciplinary and incorporates practices to maximize team functioning. So the sort of roots of team science are in fields like organizational psychology and a newer discipline called the science of team science. And so the idea is that ecologists have been doing collaborative science for a long time but more recently this collaborative science has to really explicitly include information about how to help those teams work together to be effective.
James Verdier: You know, what is the kind of the difference between collaborative science and team science, you know, if collaboration is the norm. What new challenges are being handled now that perhaps weren't there in the past?
Dr. Kendra Spence Cheruvelil: Yeah, well, one of the things that’s even more ‘now’ than it used to be, is the fact that the teams that are doing this environmental research are bigger, more and more diverse, and more and more interdisciplinary.
The reality is, is that the scientific questions were trying to answer now really require us to bring together people with very different expertise and backgrounds to be able to tackle them. And so one of the things that we talk about is the fact that team science, there's sort of a gradient of adoption and that not everyone will do sort of the highest levels of team science. But if you were going to do very high levels of team science, you'd be creating a team that's really diverse across many different dimensions and so you'd actually be going out selecting people because of their diversity and backgrounds and experiences.
You'd be writing down and revising and assessing all sorts of team policies and procedures and really practicing and assessing team functioning. These are things that are not typically something that environmental science or an ecological team would be doing. So they're sort of outside the bounds of what we typically think about as a collaborative part of a collaboration and so that's sort of one of the differences between collaboration per se and a collaboration that's using team science.
James Verdier: Okay, so we're seeing this need for team science. You know, is that resulting more from the types of questions that are being required of scientists to answer? Or is that, you know, a function of, say, the data types that are now coming through? You know, or are there other factors at play?
Dr. Kendra Spence Cheruvelil: Yeah, that’s a good question. I think it is a combination of those things. I think that the questions we’re asking are really, really complex and that leads to needing more people with different backgrounds. In fact we talked about the fact that there are lots of links between team science, open science, and data intensive science, and that you're going to get sort of the biggest bang for the buck when you're working in a data intensive way if you're practicing both team science and open science as well. And so really the three of them are very highly linked.
James Verdier: And so what kind of questions are those? You know, I think we’ll probably have some idea that it may involve things like climate change that we so often talk about. But you know what are those areas? What are the questions that are being asked now that perhaps haven't been asked in the past?
Dr. Pat Soranno: I would say a lot of the types of questions that span disciplines that span broad scales of space and time and that are a lot more complex than the type of questions we could ask in the past.
So for example, there's a lot of interest right now in being able to characterize how inland water bodies, lakes and streams, contribute to global carbon cycling. You know, much of the global carbon modeling occurs at the terrestrial and atmospheric interface, kind of ignoring inland waters? But there's a lot of recent research over the past 10-15 years that suggests inland waters are contributing a lot. They process a lot of carbon as water flows from the headwaters and mountains down to the ocean.
And so that's a great example of where we need to study fresh waters in a wide range of different regions and continents to be able to then scale up those site-based studies to be able to say across the globe, “lakes, streams, and wetlands contribute this much carbon to the atmosphere.” That's a really hard question to answer. And it takes a data intensive approach. It takes interdisciplinary scientists and obviously it takes team science to be able to work together to answer that kind of question.
James Verdier: And, you know, what are the challenges that are faced when you conduct that sort of operation? Well, I understand that it hasn't been done before, and that this is a new practice for many scientists, but are there any ingrained you know kind of cultural practices in the way that science is conducted that makes this more challenging than it would otherwise be?
Dr. Pat Soranno: You know, I'll start with just an example. Just about two years ago there was a very prominent article written in the Proceedings of the National Academy of Sciences that argued that no good ideas come from groups of scientists working together. And that we need to return to the good old days of maverick scientists doing science. And that just really frustrated us when we read that just two years ago that this perception that science is done by lone individual geniuses. I think it’s really damaging to how science needs to move forward to answer these really complicated problems. So that's one example of a huge barrier and that kind of high prominent high profile article, you know, reaches out to the public and I think the public still perceives science as a very lone enterprise.
Dr. Kendra Spence Cheruvelil: And I guess I would add to that it's frustrating at times that our, the human condition sort of is that we tend to think in extremes. And so we have people who think but that was just talking about that we need only lone geniuses working in the lab and then people who they talk about people working in huge teams. But the reality is, right, there's everything in between. And the same goes for data intensive science.
So we like to use an analogy that we think is helpful for for people and that is that: you think about all the medical doctors out there treating individual patients. Somebody comes in with a cancer for example and they're focused on that individual person and trying to make sure that they can help them recover and be, live a healthy life. Well there's also this whole group of people called epidemiologists who are working at the very large scale of populations trying to figure out why certain people come down with that cancer, right? And so we do the same thing. Pat and I work on lakes where we work at the population scale trying to understand why some lakes and some groups of lakes respond differently to stressors like acid rain than others. But we're not saying that the medical doctor should go away, right?
The reality is that we need people who work at the individual lake scale and the individual person scale to be working in concert with people like us who are doing the state intensive research and working on the scale of thousands of water bodies. And so that part is frustrating that culture sort of gets in the way because people want to think it's one or the other, when in the reality is is we need all of it and we need people communicating and working together for us to really be able to tackle the environmental problems that we face.
James Verdier: And this may be a little bit less of a question of a culture and more maybe one of practice, but you know how does open science relate to this? You know obviously if the data are not available then you've got a problem. But are there ingrained practices that you know make that more challenging as well?
Dr. Pat Soranno: Yeah, I would say, so another example from our research is where we wanted to study what predicts water quality across half of the United States. And so one way to answer that is to compile lots and lots of databases from small regional studies into one integrated database to be able to answer the question of ‘what predicts water quality at regional and continental scales?’ Well if people are not willing to share their small regional datasets, then there's no way you could answer that question.
And in our research we did this. We compiled 87 data sets from state agencies, citizen scientists, and university researchers. And it took us five years to do it, but people were willing to share their data. And we are able now to study water quality at this very broad scale.
So that's one way that open science is essential to do this continental scale and global scale research. We also argue that it's a way to make our science more inclusive. By making data and research products available to all scientists, at all institutions, no matter how much money they have at their institution or in the country they may work at, it's a way to democratize science a little bit more by leveling the playing field a little bit and making data and tools and models and even research publications available to a wider range of scientists.
Dr. Kendra Spence Cheruvelil: And that's definitely, there's parts of it that are cultural and parts of it like you said that are just sort of practice. And we are seeing changes in both those realms. For example, just a couple weeks ago the NSF put out a, was it a Dear Colleague letter?
Dr. Pat Soranno: No, it was sort of a revised, their requirements for scientists submitting proposals to the Division of Environmental Biology.
Dr. Kendra Spence Cheruvelil: Yeah, and so their revised requirements actually say that in the proposal and people have, there's a section called prior, results of prior support. So when scientists submit a proposal to this division of NSF they have to say well we've had these previous grants and this is these are the outcomes of them. NSF is now going to be requiring people to provide DOIs for the data that have come out of those prior grant proposals. And so that's a big change in practice that will, it's, we're going to be really interested to see what happens as a result of that. And that is a piece. That the practice can inform the culture and vice versa. So they're definitely linked.
James Verdier: So this is a case in which NSF is saying, “you know in your grant proposal you asked to you have to point us to the digital object identifier of the data sets in your prior grants?”
Dr. Pat Sorrano: Yes.
Dr. Kendra Spence Cheruvelil: Yes, exactly.
James Verdier: So presumably your data in that case had better be published and available.
Dr. Kendra Spence Cheruvelil: Exactly.
Dr. Pat Sorrano: Yep.
Dr. Kendra Spence Cheruvelil: It needs to be available in a repository with that digital object identifier.
Dr. Pat Sorrano: And this is the division that funds a lot of ecology research at the U.S. scale. And so this is a really exciting development. And as Kendra said this just happened two weeks ago. So this area's quickly evolving which we find really exciting because I think it will only benefit our science.
James Verdier: You know, that's interesting because it seems to be sort of a coming around of the incentive structure. We've talked in the past at AIBS and on this show about you know data publication and researchers having some fear of being scooped if they publish their data. Or even you know if not being scooped directly others mining that data for research findings that the original collectors may have hoped to have made themselves. You know those sorts of considerations. Are we now building the incentives to actually publish that data in a way that's more robust?
Dr. Pat Sorrano: Well, one thing I would say is you know we're not, we had this idea that you know if we share our data we'll get scooped. I'm not sure how much evidence there is to that. I think it's an area that really should be researched.
But one area that has been researched and has shown that if you your data in a public repository that citations to your article that published those data are higher than if you don't share the data. So there is growing evidence that citations -- which as scientists that's one of the important currency that are important for a career advancement is citations -- so data sharing results in increased citations. That's a really good thing.
James Verdier: And so that's showing you're getting you know this incentive in terms of citations of your your research articles in addition to citations of the data themselves which may not be as noted by you know those in charge of making hiring decisions, etc.
Dr. Pat Sorrano: Right, right. And I would argue, and in fact anytime I'm entering early career scientists I am strongly encouraging them to share their data, because I think it's also a way for other scientists to reach out to them and and potentially ask them to be collaborators. Right? I think there is a lot of people out there who are interested in collaborating with people who collected the data. It's not a requirement with open data but I do think again our culture is changing and I think a lot of scientists would want to include the people who actually have some knowledge of the data.
So I think there's a lot of benefits and we can sort of dwell on this you know looming idea of being scooped. But I I think we should also be thinking about the positive aspects and one you know name recognition for early career scientists, increased citations, potentially growing your collaborative network. And I think those far outweigh any potential for scooping in my opinion.
James Verdier: And that makes a lot of sense. And what you're pointing to you know in terms of the collaboration sounds a little bit like a synergy. So can you talk to us a little bit about you know the potential synergies between these three types of science? You know data intensive, open, and team. You know and kind of how they relate and may help you know feed into even greater and more positive outcomes than you would otherwise get you know if any of these were sort of implemented to what extent would be possible in isolation.
Dr. Kendra Spence Cheruvelil: Yeah, you know, I think the easiest way to talk about that is probably with an example. Pat and I have been collaborating for quite a number of years. And we have slowly over time sort of shifted toward more use of team science. More use of open science. And more of our research program being data intensive science. And so that's how we came to write this article is through our own experiences that were sort of organic, realizing how much they're linked and how much synergy there can be by practicing all three of these sciences together.
So for example Pat mentioned that we built this large database of lake and landscape data and we couldn't have done that without all of the citizens and the citizen science groups and the tribes and the state agency is sharing their data with us. And so that's an example of open science really leading into data intensive science. But what that also meant was when we were pulling together our team of people to work on this the research questions of understanding how lakes changed through space and time for example, we really needed to pull together people from fields like eco-informatics and people who were had large skills in GIS for example geographic information systems. We needed statisticians. We needed people who are experts in data mining. Along with of course the people that we were used to collaborating with which were ecologists.
And so when we started pulling together these teams of people that started out as maybe 12-ish and then went up to about 18 and now we're managing a team of about 25. What happened was we realized, ‘Wow, this is hard.’ [laughs] And we realized that we needed to read literature about how to you know manage effective teams. And so that brought us to sort of articles in the new science of team science field and articles from business and organizational psychology to really understand issues around how teams work in the functioning of teams.
And so we started doing things like having our team of 15 people work through an exercise where we talked about how to facilitate conversations for example. And again this is not exactly the norm for how environmental scientists work. You know we get together for a week and they they were pretty surprised that we were going to spend four hours of that week an hour each day talking about how the team was going to function, right? That's pretty different. But then the benefit of that was just so, so huge. We saw right away that every time we spent time on team functioning and having those sorts of discussions and building skills for teamwork, we saw it come back to us in terms of the scientific productivity that we were experiencing as a team. And so that's sort of an example of how that team science can really catalyze data intensive science.
Dr. Pat Sorrano: And I would just add, as far as open science, we've progressed you know to be increasingly open really just to make our research easier. So for example you know when you have a team of 25 individuals, you need to be able to share data, models, code, methods. And so we've increasingly set up the infrastructure to share within our team, and it was just a matter of a clicking private to public for example on a code-sharing platform called Github to make all of that information now public. And so that transition from sharing sort of within the team to outside the team is being made much easier by technology such as Github.
And so I would say open science has completely facilitated our ability to work together as a team on data intensive questions. And I will say, most of our developments in open science have been initiated by the early career individuals on our team. They're the ones learning these methods. They are very strongly advocating for open science approaches. So it's been really neat seeing the early career scientists on our team really taking the lead and moving our team to be even more open.
James Verdier: Okay, and I'm sure that we've got some you know scientists of various stripes listening to us right now. What advice would you give to them? You know, those who are you know out there and looking to do more collaborative research. What sorts of things should they be looking for? You know, who should they be reaching out to? What should they be doing with their data? Those types of things, you know. Is there a good starting point for them?
Dr. Pat Sorrano: Well, one thing I'll say is to just start small. You know just do it. But don't jump to the end. Just start small on each one. And that's really another reason we wanted to write this article because we have the Level One adoption where you know it's the easy steps you can make to start going down this path. That's one thing I would say.
Dr. Kendra Spence Cheruvelil: And I would echo what Pat said in terms of early career scientists is that I think a lot of us can learn from those who are, who just completed their graduate training for example. And so if someone listening is trying to figure out how to enter into open science for example, I really think reaching out to early career scientists is the way to go. And it helps. It's such a nice mentoring sort of relationship, right? Because the early career scientists can help that person with open science but then that early career person gets a lot in return in terms of if the person is a very established scientist, right? That's going to help them with their network and their career. And so I think it's a really nice example of how we can all learn from each other regardless of career stage.
James Verdier: Okay, so start small build incrementally with an eye toward improving along the way. That sounds like great advice to me, and also a good place to leave it for today. Thank you both very much for joining me today.
Dr. Pat Sorrano: Thank you.
Dr. Kendra Spence Cheruvelil: Thank you, it was fun.
James Verdier: And that concludes this episode of BioScience Talks. Just a reminder, the journal BioScience is published by Oxford University Press on behalf of the American Institute of Biological Sciences, and is made possible by the support of our members and donors.
And just one last note: if you'd like to learn more about what we talked about today, please check out the show notes for a link to AIBS’ Enabling Interdisciplinary and Team Science Professional Development Program.
Thank you and talk to you next time.
- Read the article
- Subscribe to BioScience Talks on iTunes
- Subscribe to BioScience Talks on Stitcher
- BioScience Talks on Twitter
- AIBS's Team Science Event
RELATED: For a look at the Exaptive blog series on Modern Research: Faster Is Different, click here.