The 5 tools a data analyst can’t do without
In this post, we will be discussing the 5 tools that an analyst needs to know to succeed in the field. These tools will help any analyst be faster and better when it comes to analysing data, problem solving and providing insights.
Across all industries, Excel is the tool that sits on the Iron Throne as the most popular amongst analysts and decision makers. It’s so widely used because:
- Everyone has Excel on their machine
- It provides great functionality for non-technical people
- Data can be manipulated to suit the needs of the user
- It’s generally simple to learn and use
So, data analysts will need to be masters of Excel as they are likely to be using it daily. Some practical examples are:
- Running / Refreshing / Creating new reports
- Where you need additional data for your analysis and the data is an Excel file within the Marketing team – > You will need to get that file and input it in your Database
- You need to send out a quick ad hoc request that shows the sales per day – > you can quickly summarise data using Excel
- You want to create a lookup from scratch (product or store lookup for example) -> you can create it in Excel and then input it in your database
Therefore, I suggest you make Excel your best friend if you want to pursue a career in analytics.
- SQL (or other databases)
SQL is the place where your data will likely be stored so you will need to know how to use it if you want to work with data. SQL (Structured Query Language) is a programming language which data analysts use to interact with their data. Usually, there is a database administrator who is responsible for the maintenance of the Database; making sure the new data is correctly stored in tables and all updates happen flawlessly. If you join a smaller organisation you might have to do this yourself (heads up!); don’t worry though, it’s an opportunity for you to understand how databases work in general.
Data analysts will be using SQL for the following reasons:
- Data input – all the relational raw data will be inputted into SQL
- Data cleaning – this involves removing empty columns/rows, removing duplicate values, updating data, creating error checks, triggers & deleting unnecessary data
- Data transformation – this involves bringing multiple different data sources together or pivoting the data in different ways
- Data modelling – This involves creating rules for your data or creating calculated fields
- Data storing – saving the output of your analysis or the raw data into tables
- Creating views – these views will be a set of queries that bring the data together, transform it and model it for a specific problem (sales Vs target for example). Then you can connect these views with your visualizations tool or Excel and create an automated dashboard
There are more things you can do with SQL, but the above list has the basics that an analyst will need. SQL will also help you in the future when the data will be saved in the cloud (big data tools) because it integrates very well with the major cloud services such as Azure, AWS and GCP.
- Visualisations tools (Power BI, Tableau and Qlik Sense)
Visualisation tools such as those above will be relatively new to most businesses, but they are starting to gain traction. As an analyst, these tools will help you communicate your results in a more modern and graphically rich environment verses Excel. Additionally, these tools provide more functionality to Excel such as:
- More table / graph options to use
- Are more interactive and interconnected
- Offer more advanced analytics by integrating with Python & R
- Have the ability to handle big data better
The biggest mistake I see when using these tools is that they are mistakenly taken for databases because they have an option to create a ‘data model’ with multiple connections (joins) and calculated fields. If you have medium to big data and try to do the above, your visualisation tool will be slow and clunky – the result being people reverting to Excel. Best practise is to build your model in SQL or a Big Data database and then bring 1 table/view into your visualisation tool – yes, it will take more space, but the report will be fast and usable.
- Python and/or R
Python and R will help you when it comes to machine learning (ML) projects such as making predictions, forecasts and run rates. Usually, these projects are run by more advanced data analyst or data scientists.
Personally, I believe that the urge to learn ML will come having mastered the basics mentioned above. In order to start learning, I recommend:
- Setting up your Python or R environment (it’s free) using Anaconda and Jupyter Notebooks
- Start by running simple statistics like Person Correlations – PLEASE try to interpret the results
- Then try running basic unsupervised models like Linear Regression, Logistic Regression, Naïve Bays and Decision trees which are easy to understand
- Then learn some unsupervised models like K Means or Hierarchical Clustering
- If you master the above, move to Deep Learning models and start using TensorFlow, Keras or PyTorch in your GPU or Virtual Machines.
Fun right? The last bullet point is maybe a bit advanced for an analyst but nothing that cannot be achieved with practical practise (and lots of googling). And you’ve always got the data analyst/scientist community who are amazing at sharing knowledge and always happy to help.
Please note that the ability to learn new software fast is critical as it might be Python and R today but could quickly be something else tomorrow.
- Big Data tools like Microsoft Azure, Amazon Web Services (AWS) and Google Cloud Platform (GCP)
Traditional on-premise databases cannot handle the ‘big data’ organisations now generate, thus the need to move to cloud-based solutions like Azure, AWS and GCP. You will still be using SQL, Python and R as the big data tools as they integrate very well together. Typical challenges that exist:
- Your organisation has a data lake where all the data is stored, and you need to know how to get there or how to connect to it. Or maybe you want to store data there yourself.
- Maybe you want to create a cluster (virtual machine) to run a big data query or model. You will need to know how to configure the cluster yourself which is relatively straightforward; select the processing power you need & the RAM and you are good to go.
The point is that, as we move to the world of big data, there will be new ways of doing things that as an analyst you should be the first to know how to use and apply. Plus, these big data cloud skills are very high in demand!
I hope you enjoyed this read! If you did, then please consider subscribing to my YouTube channel as I will be sharing more content on data analytics. If you have different views or there is something you want to comment on or even just ask me a question, please let me know in the comments section.