Hiring: Data Engineers, Data Scientists

First things first, check out our way of working.

So here's the deal at the time of this writing: we've got a newly created sales team that is starting to fill up our newly created sales funnel. We still don't know exactly what's going to come out of the end of the funnel but it's very likely we're going to need the some more people in the coming months.

We like to hire generalists so if you've got a skillset that combines Data Science and Data Engineering, you're in. If you've got one or the other, no problem, we can fill in the missing gaps with on-the-job training. So, what are the requirements for each of the roles?

Minimum requirements

Although we may hire people with only a few years of experience, we can't really go much less than that. The reason is that as soon as you're hired, you start working. The more junior you are, the more supervision you will get but our business model doesn't allow for us to spend months teaching a person the basics of, for example, programming or statistical analysis. If you apply for a job role, you should have at least a few years of experience in it.

Job description: Data Scientist

We define the field of DS as an interdisciplinary applied field that combines software engineering, statistical analysis, data visualization, and machine learning. We call it applied because it's not likely that you'll develop ML models from scratch or come up with novel ways of analyzing data that aren't already available in one of the many widely available open source packages.

You should be able to receive some business requirements and some data and take it from there. You'll pick and choose (with help of course) what libraries to use, how to analyze the data, how to present the conclusions to the client, whether or not to train a model, how to evaluate said models, and ship high quality and tested code implementing all of this.

The main tools that we use for data science tasks are sklearn, pandas, and dash boarding tools such as metabase. We may prototype something a bit more visually interactive using streamlit or dash. We also must take GREAT care in communicating our results to the client which involves the usual GSuite tools to create high quality presentations and documents.

Since we are a services company and like to work with a wide variety of companies and use cases, it's difficult to say exactly what fields you will be working in. Sometimes it may be image processing, other times it may be nlp, but most of the time the data is tabular and structured. Whatever we have available, you'll have a choice as to whether or not you want to work on it.

Job description: Data Engineer

There is not Data Science without Data Engineering. Clean, accurate, and accessible data is the foundation for anything data driven. Everything from dashboarding to ML models require documented and accessible data sets. Data Engineering is how we make these datasets available.

The vast majority of this job is implementing ETLs. Implementing ETLs at DareData requires knowledge of cloud infrastructure, infrastructure as code, basic networking, SQL, and CI/CD. We use mainly tools such as digdag and airflow for scheduling. We will work with any of the big cloud providers (GCP, Azure, AWS) though we are most experience with AWS. Snowflake is our data warehousing tool of choice.

Common to both roles

There's a few things that are common across both roles. The most important one is Python. We are largely a python shop. Most data engineering tools, ML algorithms, and any utilities required to put them into production are available in Python. Don't care what all the hipsters say about high brow theoretical or speed advantages that other languages, we are here to deliver and Python is the language that las the libraries to do just that.

We expect everyone to be (or become very quickly) an excellent software engineer. Code is the medium by which we implement our ideas so the code delivered to the client must be created in a systematic way that increases the chances of success and decreases the chances of bugs making it to the client. Linting, testing, documentation, review processes, and may others are how we do this and you are expected to use all of these. There is no place at DareData for the ML researcher that thinks they are too good to write a few lines of code.

Application process

Phase 1: Make contact

Send me an email to sam@daredata.engineering with your github, gitlab, linkedin, CV, and anything else that might describe your professional self. I'll take a look to decide if we should move to the next phase.

Phase 2: Ensure alignment

The next phase is that we do a call. We will discuss our way of working to make sure that it's a good fit for you. You will describe to me what you want out of your professional life and how DareData can help you achieve it. If we both feel that the relationship is likely to be mutually beneficial, we can move to the next phase.

Phase 3: Technical assessment

This part depends on your experience level. The less experienced you are, the more likely we will give you a coding challenge that will take a few days. The more senior you are, the more examples of your work you should have available to show us.

Then you'll get put on a real live project. Doing paid client work is considered part of your assessment. If for any reason the project you are on doesn't work out, you will be replaced with someone else and we will decide if and how to move forward from there. If it doesn't work out, no hard feeling whatsoever. It's just business.

If it does work out, then the next steps should make sense. We don't have a defined career ladder to climb so how we move forward can be tailored to achieve what works best for both parties.

Hope to hear from you soon!