DataStack Jobs logoBeta
N

Data Engineer

We are a venture-backed health tech startup supported by leading investors and scientists on a mission to improve the lives of the 400 million people around the world who suffer from genetic diseases. The platform we have developed uses artificial intelligence and synthetic biology to decipher how changes in the DNA of humans lead to disease. Through this platform, we enable genetic testing labs to provide clear and fast diagnoses to more patients.


For this, we've gathered one of the best teams of experts in genetic data, machine learning, and business development. Our people aren't just part of a team, they're part of something bigger. As a community of creative thinkers and doers, we're paving the way for a new generation of genetic healthcare. Our people are what make us great. So, finding the best people is everything to us.

 

Your role

We are looking for a Data Engineer to lead the improvement of our data infrastructure and take ownership of our data management. You will be responsible for setting up a data lakehouse infrastructure integrating ETL processes and workflows for different scientific and business applications.


You will be embedded in our Artificial Intelligence team and work closely together with our Computational Genomics, Software and Product teams, creating innovative solutions to handle large datasets with applications in data science pipelines and machine learning analytics. The ability to think strategically and work collaboratively with hands-on mentality are expected from this role.


We foster a flexible work environment and encourage applications from candidates that are either based in Berlin or would work remotely with a willingness to travel to Berlin occasionally.  


Requirements

Essential

  • Degree in a quantitative subject such as computer science, engineering, physics, mathematics or a related discipline
  • Competent Python experience (2+ years of experience)
  • Experience with software engineering best practices, such as version control (e.g. Git) and test-driven development
  • Comfortable working with relational databases such as Postgres
  • Able to write complex SQL queries (e.g. using efficient joins, aggregations and window functions)
  • Experience handling large volumes of data within computational workflows
  • Experience using workflow orchestration tools such as Apache Airflow
  • Experience with AWS services such as S3, Athena, RDS and DMS
  • Experience building end to end data lakehouse style pipelines in AWS
  • Exposure to the 'PyData stack' (e.g. Pandas, NumPy, SciKit-Learn, MatPlotLib and/or Seaborn)
  • Ability to take ownership and responsibility as well as find pragmatic solutions to potentially complex problems
  • Skills in verbal and written communication
  • Highly motivated and able to work in a fast-paced, multidisciplinary and collaborative environment
  • Willingness to learn and openness to feedback

Nice-to-have

  • Interest in data science and machine learning
  • Hands-on experience using Docker and Kubernetes
  • Knowledge on how to manage data and compute services on AWS using infrastructure-as-code (e.g. AWS CloudFormation templates and/or Terraform)
  • Start-up experience


What we value in our team

Our team reflects the interdisciplinary collaboration required to solve this big challenge – ranging from software and data science to genetics and healthcare. We are a proudly diverse, international group of creative problem-solvers and humble learners that care about having a positive impact on society and are also aware of the trust placed in us. This is why we value transparency and kindness, taking ownership and encouraging your personal growth:

  • Develop your personal skills and knowledge with resources like books and courses to learn continuously
  • Dynamic and flexible work environment that you can design (incl. remote work options)
  • Participate in our success with equity options
  • Possibility to grow with the company and influence our direction
  • Regular social activities: participate only if you feel like it
  • Additionally, we offer good coffee, a selection of healthy snacks and company-subsidized public transport in our Berlin-based offices


We see diversity as a core feature of our team and we encourage you to apply especially if you are from an underrepresented group.

Company

Nostos Genomics

Location

WorldwideRemote

Job type

Full-Time

Category

Data Engineering

Tags

AirflowAWSEngineeringETL
DataStack Jobs logo

Copyright © 2021

PrivacyTermsGet in touch