Pay attention to the man behind the curtain: David Helmers and the servers that keep SILVIS going!

Posted 01/30/14

SILVIS is a lab that depends on the analysis of large datasets. This kind of work requires multiple servers with massive amounts of storage and processing capabilities. David Helmers is the man behind the scenes that keeps the servers running and data flowing to SILVIS researchers far and wide. This is the story of the data that we use at SILVIS, the servers that store and manipulate these data and the guy who keeps it all connected.

Dave Helmers preparing to install a cluster of new file and application servers

What do you do when you have seventy researchers on six continents trying to collaborate on projects, all while working with massive datasets that contain satellite imagery or habitat data that cover multiple countries? The answer is that you use servers, lots of servers, to ensure everyone has the data storage and computational power they need. David Helmers and the IT staff of Russell Labs are the folks responsible for the maintenance and access to the digital infrastructure that keeps SILVIS researchers in touch with their data and with one another. The work SILVIS does requires massive amounts of data storage and processing power to house and manipulate large, complex datasets. Without servers – and a team of folks to keep the servers talking to one another and the researchers that depend on them – it would be impossible for SILVIS to answer the questions we ask.Simply put, a server is a computerized process that shares a resource between users. This resource can be a file, like a word document or an entire nationwide database that multiple users need to access. For SILVIS, this essentially means several very fast, high capacity computers that are kept in a cold room to ensure they don’t overheat. Because SILVIS focuses on the analysis of spatial data and conservation applications, we depend on a variety of datasets such as satellite images, census data, land cover information or maps of soil types for every county of the United States. These data are needed to understand how natural and human communities will respond in the future to the loss of biodiversity and an increasingly altered biosphere. Servers are where we store the data and the programs that manipulate them in a central location so that researchers have access to them. The servers are also essential for researchers to run analyses that would be prohibitively slow or impossible to perform on a standard desktop computer.

Current countries (highlighted in blue) where researchers and collaborators connect remotely to the SILVIS servers.

Established in February of 2013, the current SILVIS network consists of six servers with eighty-four processing cores, 700 gigabytes of processing RAM all running of a range of operating systems. By December of 2013, the server network was essentially full of data. How full is full? Currently, over 160 terabytes of data are stored on the SILVIS servers. To put that in perspective, this is equivalent to SILVIS researchers creating the same amount of data as the entire printed collection of the Library of Congress, 16 times. In ten months. This seems like an absurdly short amount of time to generate so much data until you consider the types of data we work with. A single high-resolution satellite image can be many gigabytes of data. A recent effort to use satellite images to characterize habitats important to avian biodiversity across the contiguous United States by Dr. Patrick Culbert required 14 terabytes of data on SILVIS servers alone. Once it becomes clear the types of demands SIVLIS researchers have, it is easier to see how they can generate so much data in such a short period.

An image of global Dynamic Habitat Index – an example of a large dataset produced using the SILVIS servers.

The SILVIS servers are not just boxes that connect to a data network and then left to hum along on their own. It takes significant effort to ensure different users in different places have reliable access to the data and programs they need for their analyses and ensure that everything is backed up and protected from malicious software or non-authorized users. All of this happens within the broader UW-Madison computing network infrastructure and keeping the data flowing through the intertubes can be a big job. Occasionally servers can malfunction, generating a deluge of frantic emails to David from 40-50 panicked researchers who suddenly lose access to data or processing capabilities. In spite of such infrequent crisis, David insists that his biggest problem is coming up with unique names for the servers. While these names have included the Three Stooges, comic book heroes, Star Trek characters and the names of birds in the past, David hasn’t decided the names for future servers yet.The work SILVIS does is not possible without the servers that give us the data storage and processing capabilities we need to manipulate the large datasets we work with on a daily basis. Without the wonderful wizardry of David and the Russell Labs IT staff working behind the scenes to keep them up and running, these servers would not exist. And without SILVIS servers, researchers from Madison to Moscow and Chile to China would be scrambling to find alternatives.”

Story by James Burnham