Data Engineering plays a key role in insitro’s approach to rethinking drug development, ensuring our biological data factory’s robots and instruments produce high quality data, optimizing storage, queries, and analysis of petabytes of scientific experimental results, and building the infrastructure to train powerful models that solve key problems in the drug development process. You will work closely with a cross-functional team of scientists, bioengineers, and data scientists to identify areas where data engineering can make a difference, to develop data architectures and systems on cutting edge, high throughput platforms, enabling our scientists to be maximally productive. You will design, implement, and deploy novel methods that use a broad spectrum of data engineering approaches, including techniques at the forefront of the field. You will work as part of a team to rigorously design our data platform, identify key architectural performance improvements and support ongoing discovery and automation platforms.
You will be joining as the founding team of a biotech startup that has long-term stability due to significant funding, but yet is very much in formation. A lot can change in this early and exciting phase, providing many opportunities for significant impact. You will work closely with a very talented team, learn a broad range of skills, and help shape insitro’s culture, strategic direction, and outcomes. Join us, and help make a difference to patients!
- BS, MS, or Ph.D. in computer science, statistics, mathematics, physics, engineering, or equivalent practical experience
- Expertise in one or more general-purpose programming languages (such as Python, C/C++, or Go)
- Demonstrated ability to write high-quality, production-ready code (readable, well-tested, with well-designed APIs)
- Familiarity with cloud computing services (AWS or GCP)
- Significant experience with at least one high-end distributed data processing environment (Hadoop, Spark, etc)
- Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions
- Proficiency in Linux environment (including shell scripting), experience with database languages (e.g., SQL, No-SQL) and experience with version control practices and tools (Git, Mercurial, etc.)
- Passion for making a difference in the world
Nice to Have
- Experience with biological data (DNA sequences, RNAseq, proteomics, microscopy images, etc.)
- Experience with medium-sized data sets (100TB+)
- Experience with the SciPy/PyData ecosystem (numpy, pandas, scipy, dask, etc.)
- Demonstrated ability to develop novel data engineering methods that go beyond putting together of existing code, and to apply problem-solving skills to complex issues
- 4+ years of real-world work experience in software development for high-end data processing engines
Benefits at insitro
- Excellent medical, dental, and vision coverage
- Open vacation policy
- Team lunches (catered daily)
- Commuter benefits
- Paid parental leave
insitro is an exciting startup company that aims to take a new approach to drug development: one with big data and machine learning at its core. We plan to build on the ground-breaking innovations that have occurred in life sciences to develop large data sets that are designed from the start to allow machine learning to address fundamental bottlenecks in the drug development process. Our goal is to cure more people, sooner, and at a much lower cost.
We are fortunate to have the strong support from the top investors in both biotech and tech: ARCH Ventures, Foresite Capital, A16Z, GV, and Third Rock Ventures. We are building a remarkable team that embodies a new type of culture, one based on a true partnership between scientists, engineers, and data scientists. Together we are working to define the problems, design experiments, analyze the data, and derive the insights that will lead us to new therapeutics. Join us, and help make a difference to patients!