The goal: investigate huge amounts of research data in new ways. The pool for teams: neuroscientists, data scientists, and software developers. The result: answering questions we didn’t even know we had.
The Set Up
Exaptive built a software platform: the Cognitive City. Just like physical cities, the Cognitive City has utilities: data and analysis tools any citizen can access. In the same way people make each physical city unique, each Cognitive City is unique due to the mission and expertise of the community. Researchers can even use this shared virtual space to meet, collaborate, and develop new analysis tools. The more work is done in the Cognitive City, the ‘smarter’ it becomes, as algorithms are refined to measure what matters. The Cognitive City then can make suggestions for collaboration partners, tools, and data sets.
|Exaptive CEO Dave King kicks off the hackathon with SMEs and developers|
But until there was a hackathon, this was all in theory. Would it work? Exaptive data scientist Alanna Riederer took point on coordinating all the technical aspects. “What I spent most of the time on was getting some data access components together. And there were some specific data sets that Terri and Dan [Knudsen] had wanted.” Alanna worked with other Exaptive developers to upload the data sets to Amazon S3 so they could be accessed right away.
|Developers listening during the a knowledge transfer session|
We had the software set up. The data had been imported and/or the API calls were ready. The next thing to do was test the theory. How much could be accomplished in eight hours on a topic as complex as neuroscience?
The first step was to get a high-level overview of the problem space: brain science. Terri spent about 90 minutes talking about the goals and challenges of brain science and explaining the data sets that were available for the hackathon. For the purposes of testing the system, we used only publicly available data sets. Dave King, founder and CEO of Exaptive, asked Terri to explain four things about each data set, “ Here’s the study.  Here was maybe the motivation; the sorts of questions [asked initially].  Here’s sort of technically what the data means... And then  tell us everything that’s wrong with the data.”
Terri gave enough context with such great clarity that before the end of the overview, programmers were already eagerly asking questions to refine their new ideas for analyzing the data. The teams self-selected around four people who knew right away which data interested them the most. Each team had at least five members.
A common pattern emerged as soon as the teams started ideating. Teams were forming around the goals of (1) connecting data to (2) algorithms and (3) displaying them with the right visualizations in (4) the context of answering a specific question.
|Dave and Alanna's team whiteboard their ideas|
Dave & Alanna’s Team: Alanna knew right away she was interested in a data set that showed gene expression data in human brains that was collected at different ages. Terri, trained as a neuroscientist, was the subject matter expert in this group.
Team Cheese Tray (self-assigned name because they worked around the table that had the cheese tray.): Data scientist Frank Evans immediately wanted to see if there was another way to investigate the pairing of affected patients to control patients in a data set about traumatic brain injury (TBI). (The controls were +/- TBI and were matched based on age, sex, and PMI.) BRAIN Commons neuroscientist Daniel Knudsen joined this group.
Team Imajen Dragons (self-assigned name inspired by working with OpenSeadragon.): Developer Josh Southerland was ready to tackle the challenges that come with analyzing large image files within software. In this case, the image files were histological images and MRI scans. BRAIN Commons Associate Director of Data Science, Deepti Cole, was this group’s link to brain science.
Team Four: Exaptive information technologist Bob Barstead spent many years in research labs during a previous career as a geneticist. This team was the only one without a member from the BRAIN Commons. After running through several ideas that -- upon further investigation -- seemed to have already been turned into applications, they were able to get feedback from the subject matter experts and focus on creating a modular component that would connect to the Gene Ontology Consortium and retrieve all available information about a specific gene.
Building on Ideation
An incredible example Terri gave of how biologists and data scientists can work together came from the Allen Institute. The team at the Allen collected gene expression using a popular staining technique of all the genes across the entire mouse brain. One of their bioinformaticians suggested integrating all the expression data instead of looking at one gene at a time, to understand what the gene expression signal could reveal about the structure of the brain. Biologists were skeptical until they saw the resulting “Anatomic Gene Expression Atlas.” Terri noted, “The biologists at the Allen generated really extraordinary data, but it was the information [scientist] or the data scientist, the bioinformaticians, who were able to visualize it in a way that the biologists could see it; see something new out of their data - something that explained how the brain was put together, and that wouldn’t have been visible had they just had the disparate data to view.”
|Team Cheese Tray dives into a TBI data set|
As the teams refined their ideas and decided how they would build useful tools, much of the initial conversation was question-and-answer. Team members had to figure out a shared vocabulary that could span programming and brain science. Developers asked brain scientists about what questions they wanted to ask of the data. Brain scientists asked developers to explain the limitations of data science. Terri noted, “Brain scientists understand the brain. They understand their aspect of study in the realm of the wet biology. In the process of doing that, they generate a lot of data. … The entire realm of information science and data is not typically what you learn as a biologist.”
It’s easy to see how the knowledge transfer process between scientists and programmers could get derailed in the everyday operations of a lab. Bob explained the situation really well afterward, “The problem is that the cycle time, the iteration between ‘here’s what I think I want as a subject matter expert’ and then the data scientist goes off and produces something and doesn’t really understand what they’re supposed to produce. And it takes weeks and weeks and weeks and in the meantime people are kind of losing track of what they’ve been attempting to do. They come back to the subject matter expert with this pile of data and try to explain it as best as they can but they don’t really know the subject.
“And what that should lead to is another round, because now we understand it better, and can communicate it better. But because the cycle time is so long, that communication is very difficult. When you have these kinds of dialogues, there’s a shelf life to the conversation. You think you kind of understand what’s going on, and then two weeks later you’ve kind of lost track of that entire conversation. And one of the great things about Exaptive[‘s platform], is that the cycle time for developing a data application is hours or days and not weeks.” With the direct exchanges during the hackathon, team members on all sides were able to get answers so quickly, they could then generate new questions they might never have otherwise advanced to at all, but if so, certainly not nearly so quickly.
|Team Imajen Dragons included two developers who work remotely, one in NH and one in CA|
The devs also soon discovered the challenges of working with brain data. A data set might have hundreds of data points, but the data points may be self-reported traumatic brain injury, which doesn’t include which part of the brain was affected. “The most surprising thing I learned during the hackathon was how little information there actually is with which to work as a brain researcher,” said developer Cory White. Cory was on Dave and Alanna’s team. “For all of the volumes of data or images that you might be able to dig up, there doesn't seem to be much that you can do with it using manual processes. I realized how much of an opportunity there is to produce things that are really valuable in that space.”
After working intensely together for less than 24 hours, the room was practically vibrating with anticipation during the demos. Terri observed later, “Working on the edges of two different disciplines I think is where some of the best, most productive friction happens. I’m not a dev. I’m not a software developer. I don’t know how to build those kinds of tools. But I could direct and I could see things, and there were questions I could ask that had me feel like I could contribute to the ‘magic’ that got produced by the devs.”
Developer Austin Schwinn from Team Cheese Tray revealed the tool built out of the original question of how pairs are made for studies that match a patient with traumatic brain injury to a control patient without traumatic brain injury. They created a minimum viable product (MVP) that allowed for the user to choose a data set, then choose a variable to split the data from one set into two cohorts, and finally choose a second variable to examine how splitting the data on the first variable affected the distribution of the second variable.
|Bob's Team focuses on a tool to find data about genes|
Subject matter experts noted it was very helpful for adding context to the numbers and identifying dependent and independent variables. Developers appreciated their feedback. “It was amazing how much neuroscience information we were able to cover in such a short time and it really helped frame what the data meant and what it could be used for,” said developer and statistician Kent Morgan, also of Team Cheese Tray. “And it was super helpful having the subject matter experts in the room -- or near enough to it -- as we worked to get immediate feedback on what was useful.”
Josh demonstrated Team Imajen Dragon’s two prototypes. The first tool allowed for huge histology files (125+ MB) to be quickly retrieved and viewed with fast and clean zooming and panning capabilities. Their viewer tool performed even better than viewing the image from the local hardrive. Using OpenSeadragon for this tool showed the team that it was possible to do what they wanted to do. Deepti noted the team recognized a more secure way to transfer the images will have to be incorporated going forward to meet the high security standards for medical research data.
The second tool Josh demonstrated allowed for researchers viewing MRI images in a Cognitive City to annotate the images as they made observations. Every observation added a node to the user’s profile. This meant that team members could look at their map of collaborators and see which ones were making observations -- and about which data. “It was really cool to see the process of building things and fitting it with my emerging understanding of where we’re going with BRAIN Commons,” Dan sounded blown away. “Especially in this space of interacting with the data, doing some interesting analysis, but also in people space. It was really cool to see [Josh’s] team connecting people with the annotations on the brain. [That] was not a use-case that I had thought of at all for the Cognitive City. At all. I thought it was all about people and, yeah, search terms that they might use and things like that. But seeing that a particular output of a particular xap was capturing some output that might be relevant to other people and connecting people on that? That was super cool and gives me all sorts of ideas.”
|Exaptive engineer Mark Wissler chuckles at some proposed team names|
“Ours is more of a utility tool, to be integrated into other tools,” developer Stephen Arra drove the demo for Team Four. Bob explained later, “Because we didn’t have anyone from CVB on our team, mostly we addressed problems that I thought would be of interest. Or would have been of interest to my colleagues when I was at the [Oklahoma Medical Research Foundation].” Rather than focus on a tool that better utilized specific data sets available, Team Four created a tool that could supplement any data analysis tool by incorporating the capability to aggregate external data about specific genes.
Dave shared his laptop screen for the final demos. Dave and Alanna’s Team had created a tool that could show gene expression over time by structure as the human brain grows. The tool also clustered gene expression signatures that had the same temporal profile. But all the gene names were short strings of letters and numbers. Wouldn’t it be great if there was a way to get more information about what scientists know about those genes of interest without having to leave the tool? It was a question Dave would never have thought to ask before the hackathon.
Exaptation: from Millennia to Minutes
In biology, exaptation is when a feature that evolved for one purpose starts serving a completely new purpose. The go-to example is feathers. Birds evolved feathers for warmth, but then feathers became important for flight. In software development at Exaptive, exaptation is when someone writes a piece of code or algorithm to do one job, but later someone else finds a way to repurpose it.
|Exaptive data scientist Alanna Riederer coordinated the data ahead of the hackathon|
As Dave finished his part of the demos, he realized he could use the component built by Team Four in the xap built by his team. After an almost imperceptible moment of hesitation, “Let me pause my [screen]share for a second.” Dave decided to try adding the component in real-time while the rest of the room watched, “Yeah, we’re gonna try this live.”
He used the Exaptive Studio to find Team Four’s component and reconfigured his team’s tool to include it. Three minutes and 19 seconds later, the new feature worked exactly the way he wanted it to. Everyone in the room applauded. The relief of having a successful live demo wasn’t the only reason. The cheer was loaded with validation of all the work of the previous seven years that had created a community-informed, data-driven, rapid-prototyping virtual space: the Cognitive City.
We’ll Do This Again
Having the subject matter experts and the developers in the same conversation made all the difference. It gave teams a chance to create several unique tools that improved the ability of researchers to query their data and gain insights that would have otherwise remained hidden. Hackathons are usually exciting because of the competition. Ours ended up being transformative because of the collaboration.
|Neuroscientist & Engagement Director Dr. Terri Gilbert works with a team on the possibilities for new tools|
Terri is already looking to the future. “I could see holding virtual hackathons through the Cognitive City to allow people to meet up with subject matter experts, have developers from all over the world, and have Exaptive tutors as well, to be able to create really extraordinary tools that would forward the whole field,” she said.
Dave noticed having scientists embedded in the teams served as a constant voice, directing developers away from creating novel tools just to be creative. “The coolness of the technology, the size of the data set, the complexity of the algorithms. [Having embedded subject matter experts] turned us away from those things that I think programmers get excited about, and directed us towards what the real excitement is, which is these things can make a difference for people dealing with disease. Or they can advance the science and they can advance the understanding.”
We don’t think this experience has to be unique to brain science. This could be a model for innovation in many areas of research. The Cognitive City is a virtual space where transdisciplinary hackathons could happen at any time. Check back in with us for new articles as we do more hackathons in the future and make more discoveries about how they impact innovation.