When I talk to SharpestMinds mentors, I like to ask them about the mistakes they see aspiring data scientists make. One common answer: too much focus on tools and state-of-the-art algorithms and not enough on the fundamentals.
As far as I can tell, the fundamentals for data-related roles can be grouped into four broad categories:
SQL
Statistics / model development
Programming / engineering
Communication / business acumen
The importance—and the necessary depth—of each category will depend on the type of data role. For Machine Learning Engineers and Data Engineers, for example, the clue is in the titles—both require more of an engineering skill-set.
This is part 1 in a series going though each of the categories above, with advice sourced from the SharpestMinds community.
SQL
Some knowledge of SQL is basically a must-have for all data related roles. Structured Query Language is the most popular method for accessing data in databases. If you can’t access the data, you can’t do anything useful with it. Hence it’s importance.
But just telling someone to learn SQL is not very helpful. “It depends on what you want to use it for,” SM mentor, Marwan, told me recently, “How do you know if you have grasped enough to be industry-ready?” Marwan is a BI Engineer at Amazon where he works with petabytes of data. At that scale, being able to write efficient SQL queries is essential.
Your first goal should be to understand the relative importance of SQL in the jobs you’re looking for. Typically, the more your work touches databases, the more important SQL will be. It’s typically on the top of the list of requirements for data engineers and data analysts. But you should confirm by looking at job descriptions that interest you and seeing how often SQL comes up. Augment this by chatting with folks in the industry.
Of course, there will be exceptions. Not every company has petabytes of data to work with, and some data types (like images) aren’t best served by relational databases. Ray Phan, a computer vision engineer, says he rarely uses SQL on the job, and when he does he usually has to go back and refresh his memory.
Step two is to practice. Then practice some more. According to Alex Strick, a machine learning engineer, learning SQL is as much about muscle memory as it is about understanding concepts (though you need both). “Make sure—however you're learning—to do so with lots of actual typing out of SQL queries.” Alex’s advice is to pick one ‘flavour’ and stick to it, “PostgreSQL usually seems like the best choice for a first flavour. That or SQLite.”
Joey Berkwowitz is an analytics engineer (a relatively new job title which sits somewhere between data analyst and data engineer). He uses SQL at his role more than any other language or technology. He shared some tips in the SharpestMinds Slack that helped him level up his SQL skills beyond the basics:
Start writing common table expressions (CTEs). They'll help you organize your more complex queries into digestible bits and pieces, keep your SQL consumable for other people, and allow you to tie together a number of techniques in succession. CTEs > sub-queries all day every day.
Once you've got the intermediate techniques mastered (aggregations, case statements and joins), start practicing window functions. Window functions will help you elevate the kinds of analysis you can do with SQL. They also come up a lot on tech interviews for analytics-focused roles, in my experience.
Here are some resources that the SharpestMinds community recommends for learning and mastering SQL:
SQL Fiddle - a playground environment that let’s you create tables and run SQL queries in the browser
SQL Bolt - An interactive tutorial great for beginners
Select Star SQL - An interactive tutorial
SQL Murder Mystery - For intermediate/advanced SQL. Solve a murder mystery by running SQL queries
SQL Indexing for Devs - Indexing is an important concept for making SQL queries more efficient. This blog series provides a good introduction
SQL Zoo - Another interactive tutorial
The SQL Tutorial for Data Analysis - Another great tutorial that segments topics by beginner, intermediate, and advanced
Thanks to Joey, Alex, Ray, Marwan, and Amber for helping inspire and craft this post!
More from the SM Community
Active Learning in Machine Learning Explained by Vatsal Patel
Object detection made easier with IceVision - Meghal Darji