Outstanding research by outstanding women
Women in Data Science (WiDS) and Stanford Earth hosted a symposium to highlight the research done by women who use data science to assess a range of topics in the geosciences, including Earth processes, hazards, climate, and sustainability.
When energy resources engineering Professor Margot Gerritsen declined an offer to speak at a 2015 data science conference due to scheduling issues, conference planners said they couldn’t find another woman to fill the spot. That’s when Gerritsen knew she had to do something to make the burgeoning data science research conducted by women more visible. She created a “revenge conference,” Women in Data Science (WiDS), which was first held at Stanford in 2015 and gives women a supportive forum to network and share their research.
Since then, WiDS has held nearly 200 annual or regional conferences in more than 55 countries, a yearly datathon, and a podcast series. On Nov. 1, Gerritsen brought a regional WiDS event to the School of Earth, Energy & Environmental Sciences (Stanford Earth), with the Women in Data Science @ StanfordEarth symposium focused on the Earth sciences. The broader 2020 global conference will also be back at Stanford next March.
The November gathering drew more than 70 participants and featured nine speakers from around the world. The three main areas of research covered were 1) Climate & Sustainability, 2) Geophysics, and 3) Earth & Planetary Sciences. Beyond the technical talks, the day also featured a panel on education, research collaboration, and career planning.
Making room at the table
Data science is becoming critical in every decision made, so “it is a really bad idea to have half of the world population not well-represented,” Gerritsen said. “If you have a group that is similar, you tend to be very narrow in the way that you probe.”
With the current number of women in data and computer science hovering between 10-15 percent, women are sorely underrepresented. “I think once you reach critical mass, these problems begin to solve themselves,” Gerritsen said. “But we aren’t there yet.” So to help move things along, she developed WiDS “to showcase outstanding research in the field of data science being done by outstanding women.”
“We need data with a heart,” said Stanford’s Sally Benson, a professor of energy resources engineering, who gave the opening address. Benson advocated for the translation of data into actionable information for decision makers. “We don’t always know how people make decisions, and data can help us to learn those things.”
Addressing climate and sustainability
Three researchers discussed ways in which data science intersects their work in the areas of climate change and sustainability. Dorit Hammerling, an associate professor of applied mathematics and statistics at the Colorado School of Mines, discussed ways to reduce climate data storage sizes while preserving scientific integrity. “We have so much data available now that has become essential to our work, but the bottleneck is actually data storage,” she said. Imme Ebert-Uphoff, a research faculty member in electrical and computer engineering at Colorado State University, reviewed the opportunities and challenges that machine learning brings to the study of weather and climate, as well as some of the promising strategies she has identified for its use.
Senior research engineer Newsha Ajami of the Stanford Woods Institute for the Environment discussed how she harnesses data to achieve water security. “Climate change is affecting water quality and availability, and we depend on outdated systems to deliver water in the growing global issue that is water scarcity,” she said. “We’re using data to unravel more about this human-water dynamic because obviously how we thought we would use water is not how we have actually used water.”
Understanding natural hazards
Three researchers shared the relationship between data science and their work in geophysics. Lindsey Heagy, a postdoctoral researcher in statistics at the University of California, Berkeley, talked about how geoscientists can keep up with the avalanche of available data by using Project Jupyter, software that enables geoscientists to visualize and collaborate on their research more easily. “If we want to be at the cutting edge of geoscience, we need to figure out ways to create collaborations and knowledge in all of these domains,” she said.
Harvard University Earth and planetary sciences postdoctoral researcher Karianne Bergen, Computational and Mathematical Engineering MS ' 15, PhD '18, covered the ways that big data enables her to better monitor earthquakes. “How do we go back to look at existing seismic data and find the events that seismometers missed?” she asked. “We’re looking at huge data sets over huge time periods to identify more of these microseismic events. It’s like finding a needle in a haystack.” Virginia Tech seismologist Eileen Martin, Geophysics MS '17, Computational and Mathematical Engineering PhD '18, added: “Seismology is changing, spurred by the development of new sensors that can detect new signals, visit new places, and revisit the algorithms used in Earth sciences. In the past we never had as much data as we have now.”
New insights in Earth and planetary sciences
Another three researchers illuminated how data science enables their work in Earth and planetary sciences. Grethe Hystad, an assistant professor of statistics at Purdue University, set the stage for this section: “There are billions and billions of planets and we can only sample a few, so planetary science is a statistical problem by default.” Hystad said data science also enables her to model the natural distributions of the minerals of Earth.
Hannah Kerner, an assistant research professor of Earth and space exploration at the University of Maryland, shared actionable insights from remote sensing as informed by machine learning. “There’s high variability in how crops are planted on Earth, which is hard for data analysis, so we’re using machine learning to train models that can be generalized and adaptable to many contexts,” Kerner said. University of Tasmania geophysics Professor Anya Reading revealed how she maximizes the insights she gains from using machine learning in her geoscientific research. “Most uses of machine learning, like self-driving cars, take in lots of information and expect a single result,” she said. “In research, in geoscience particularly, our goal is much more complicated. We don’t simply expect success or disaster. We’re aiming for robust quantitative metrics that give us insight."
Showing up for the next generation
The last panel of the day focused on education, research, and career planning. While many of the day’s speakers had experience in both computational tools and their own domain knowledge, each acknowledged the importance of their collaborators from other disciplines. One major takeaway: The future of education, research, and the careers of many people in the room would depend on their ability to communicate and collaborate.
“To make discovery in this day and age, we need to reduce the friction of working in interdisciplinary teams … we really are students of each other’s fields even while remaining incredibly skilled in our own,” Heagy said.
The panel included discussion on how to deal with skeptics of data science, new technologies to support collaboration, and the importance organizational structure can have in encouraging interdisciplinary work.
“Data science is biased per definition because it’s the types of questions you’re asking, it’s the type of data you’re going to collect, and it’s then how you’re going to be looking at the data,” said Gerritsen, who is also senior associate dean of educational affairs at Stanford Earth and a senior fellow at the Precourt Institute for Energy. “We don’t just want 50 percent men and 50 percent women. We want people from different cultures and different backgrounds, those who are domain experts and people that are computer science experts. We’re going to change this.”