10 Technical Questions Asked in a Junior Data Analyst Interview

Introduction

photo source: Picpedia.org

In order to meet the requirement, first you can dig into my personal website to have some ideas for building a personal brand. Then, it comes to the topic: 10 technical questions were asked in my junior data analyst interview.

10 Technical Questions

  1. What is your check list when you receive a new quantitative dataset?
  2. Explain the difference between list and tuple in Python
  3. Explain the difference between set and dict in Python
  4. Explain the difference between variance and bias in statistics/machine learning
  5. According to 4, how to reduce the gap between variance and bias in machine learning
  6. Briefly explain the job description of data analyst
  7. Briefly explain how data analysts create the actual value in a company
  8. Provide a data analysis project you had conducted/involved in
  9. Some following questions for the limiting factors of this project (question 8)
  10. Propose at least three question related to the role and the company.

Scroll down if you want to know how did I ask them resulting in getting the position

photo source: https://pixabay.com/

Details & Answer to 10 Technical Questions

1. Check list for a new quantitative dataset?

1. Check the data type and open in a data analysis tool. E.g. Excel, R, Python…

2. Check the data type of each column. If they are any mistakes, redefine the data type. E.g. a column of money exchange but it is defined as string

3. View the whole dataset or tail and head depends on the size, summarise the dataset, and make graphs to detect any possible, unreasonable outliers.

4. Find if there are any missing value in each column, and fill them depends on the different scenarios. E.g. air temperature can fill with average temperature,

5. Find the inconsistent name or labels such as misspelling, extra space, and modify them. E.g. NicolasCage, NicolasCave, Nicolas Cage

2. Python: the difference between List and Tuple

Thus, the answer to this question can be easily explained by one key sentence

List is mutable and Tuple is immutable, which means List can be modified but Tuple can’t

Of course, there are other characteristics different and Pros & Cons following them. You can check online sources like Real Python.

3. Python: the difference between set and dict

Dict is the version of Set with a key-value pairing, which requires input value. Then both Set and Dict can have unordered but not two identical keys.

4–5. Machine learning: variance and bias

Source: https://en.wikipedia.org/

bias-variance is a classical trade-off in machine learning, also related to overfitting problems. The answer to question 4 is like blew:

Bias: an indicator of how close the prediction to the actual value inside the training data used

Variance: an indicator of how the prediction to the actual value out of the training data used

Then, to reduce the gap according to the upper graph, the best model complexity achieve the relatively low total error, variance and bias (not the best for each so it called trade-off…) The approach to question 5 is

To divide the whole dataset into training, validation, and test sets. Use training and validation set to train model, and test set to determine the final performance after training.

6. Job description of data analyst

By detecting, analysing and explaining the hidden facts behind any kind of data, data analyst enables to execute corresponding actions and then create actual value for the company.

7. Actual value created by data analyst

Data analysts usually cooperate with other departments in the company. We make daily/weekly/monthly reports for the executive team. These reports provide crucial information for decision-making, avoiding mistakes and fixing problems.

8–9. A data analysis project you have done

The project I introduced is a machine learning application to micro-climate real-time forecast. Some of the detail you can check from my Github.

Of course, to verify if I have really done this project, the follow-up questions for the detail are the next. At this point, I was asked about the limitation, the difficulties, the problems I met and my corresponding solution.

The most important to answer this question is to react to the doubts from the recruiter with confidence. It is not to prove that you can do things perfectly. It is here to show you have understood in the real-life things won’t go well usually and you know how to cope with various situations properly and logically.

10. Three questions from you

1. What kinds of programming tools will I use in this role, so that I recap a bit or promote my skills in certain programming tools? (Indicate your willingness to learn)

2. What will be the daily work like? How will I interact with different departments in the company?

3. How can I expect the training as a junior data analyst?

Reflection

In addition, The courses of data analysis are not necessary, but it helps you to acquire a bigger picture and maybe cover some knowledge you haven’t used in your past projects. It is beneficial to take courses in data analysis just before the interview. They cover all of the basic entry-level technical questions and made me feel confident and less nervous. However, you should consider your memory. “Do not do them too early and you would only have fuzzy memory in your interview!”

Last but not least, the advice for those who did not have any related work experience in data analysis but have complete training of it. “Start you build your profile!” It is the key to be invited to job interviews where will be your runway.

Fashion Runway, photo source: https://fashion.luxury

--

--

Programmer | Agriculturist | Environmentalist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store