Senior Data Engineer 

 
 

The Engineering team plays a crucial role in our organization as they are responsible for building all of our products. We're a engineering first culture and mindset at H1. Anyone joining will be driving the future of our company. 

 

This role will be focused on building SaaS products, which help healthcare and life science companies make more informed scientific decisions through access to critical data. Here’s what we’re looking for:

Overview

Our team is building a suite of machine learning tools to help solve problems in the healthcare and life science space. This includes the classification of researchers and physicians to their scholarly research, simulating how effective drug compounds will be, and much more.

We're growing fast in a field that is also growing fast, so we're looking for people who want to grow fast too.  We think an environment that is supportive, collaborative, and sophisticated is the key to making this happen.

Experience and Skills

Our data engineers do these kinds of things:

  • Pull, clean, augment and master data coming from a variety of public and private sources.  This goes way beyond simple ETL - we’re preparing data sets that will be used to power machine learning algorithms, not just pulling data out of a database to display in a graph.

  • Work within a Spark/Scala environment from notebooks through mature pipelines.  Our data engineers are equally comfortable writing Scala code to normalize names or pulling together disparate notebook functions into a reusable class architecture.

  • Explore new ways to extract signal from data, including novel algorithms and approaches.  Our most challenging problems within data engineering require a good understanding of both computer science and math (especially linear algebra and probability)

  • Build out scalable data pipelines on top of AWS infrastructure.  Understanding Terraform/Ansible is a plus here, but experience with the AWS stack is necessary (especially Elastic Beanstalk and EMR)

Above all, the data engineering we do at H1 is code-centric, not tool-centric.  ETL tools like Talend, Informatica, and the like are all great, but our use case is much more about data analysis than it is about moving and joining data together.  For this reason we’re looking for candidates who are comfortable in the Scala/Spark framework.