Learning how to analyze data can allow your business to make the most profitable, risk-controlled decisions.
What is Data Analysis?
Data analysis is the process of drawing meaningful conclusions from large quantities of data and presenting the information to businesses to help them make valuable decisions.
The process of data analysis involves:
highlighting outliers/important data points
looking at data distribution
finding patterns/trends and
observing correlations in the data
Why is Data Analysis Important?
Data analysis is important for two reasons: research purposes and business purposes. Businesses can make changes or decisions to maximize their revenue/profits whilst researchers can make objective interpretations of the data. A more in-depth explanation with examples:
Why is Data Analysis Important in Business?
Data analysis is important for businesses because it allows for increased profits and money-saving.
Performing market research and analyzing the data can allow businesses to identify market gaps and mitigate risks when it comes to making a big important decision.
Example of how analysis can benefit a company:
Imagine you run a food chain store that sells burgers, scallops and fish & chips.
Each week, you buy hundreds of dollars of ingredients.
You find that most people who visit your store are only interested in the fish and chips and you quickly run out of ingredients for it, causing you to lose profits.
The ingredients for the burger and scallops went to waste because there were few customers interested in them.
Next time, you buy more fish and chips and spend less on burgers & scallops to save money.
In a nutshell, this is how data analysis can be useful to businesses: by saving time, money, and increasing profits.
Why is Data Analysis Important in Research?
Data analytics is a core component of the “scientific research methodology” because it allows researchers to make objective interpretations of the data without relying on gut feeling and false assumptions.
Imagine an experiment with 2 groups:
GROUP A: The control group
GROUP B: the experimental group.
You administer a drug to group B and give group A the placebo pill.
Afterwards, both groups do a test. On average, Group B scored 75/100 whilst Group A scored 70/100.
Did Group B actually do better than Group A? Or was that due to luck? The only way to make a conclusion is through data analysis with t-tests. A t-test can provide objective statements like: There is a 95% chance that there is no statistical significant difference between Group A and Group B.
Sometimes data can be misleading so it’s important for researchers to conduct analysis to form objective statements about their findings.
Types of Data Analysis
There are 4 types of data analysis:
Descriptive Analysis: “What Happened?”
Diagnostic Analysis: “Why did it happen?”
Predictive Analysis: “What will happen?
Prescriptive Analysis: “How can we make it happen?”
1. Descriptive Analysis
Descriptive analysis is the core of analyzing data. It’s the most common form of data analysis and deserves the most attention.
What is descriptive analysis?
Descriptive analysis asks: “what happened.” It is a process of extracting insights from data and describing what the data says.
Types of descriptive analysis:
Measures of frequency: How frequently an event occurs (e.g. count, frequency)
Measures of central tendency: Looking at the mean (average), mode and median
Measures of dispersion: How the data is distributed (e.g. standard deviations)
Measures of position: Looking at percentiles and quartiles
Descriptive analysis is mostly done through data visualization. Bar charts, scatter plots and bullet graphs are the most common techniques for this.
For businesses: KPI dashboards and monthly revenue analysis are common use cases of descriptive analysis.
Example of Descriptive Analysis
Imagine you run a food chain store. A descriptive analysis can look at frequency: to see how many burgers were sold vs. scallops vs. fish & chips. It can also be used to identify how many customers were male vs. female.
A descriptive analysis can also look at central tendency: the average age of your customers, the average number of items sold each day, the average profit each week etc.
Finally, it can also look at dispersion and position: seeing the age distribution of customers, their money spend distribution and so on.
2. Diagnostic Analysis
Diagnostic analysis is the next step to analyzing data.
What is diagnostic analysis?
Diagnostic analysis looks at “why” something happened. Basically, a diagnostic analysis takes the insights found from the descriptive analysis and tries to find the root cause of these outcomes.
Example of Diagnostic Analysis
So you ran a food chain store and noticed your fish & chips were selling very well, but burgers were selling poorly. Why did your food chain store sell so many “fish & chips” but not many burgers?
Perhaps it was the pricing? Perhaps the images of the burger didn’t look appealing. Or perhaps people around the area weren’t interested in burgers.
A diagnostic analysis looks into things like examining market demand, trying to explain customer behaviour and identifying internal issues.
3. Predictive Analysis
So you just performed a diagnostic analysis and found out that the images for the burger didn’t look appealing and it was priced too highly - that’s why it wasn’t getting much sales. The third step towards analyzing data is predictive analysis.
What is predictive analysis?
Predictive analysis looks at the future: “what will happen?” It uses existing historical data to try and predict future outcomes by forecasting trends and creating mathematical models.
Example of Predictive Analysis
You forecasted sales would be higher during the holiday season due to the place getting more customers. You predict sales will be up by 50% based on last year's data of similar food chains.
Now you're ready to move onto the next step:
4. Prescriptive Analysis
The final step is to perform a prescriptive analysis.
What is a Prescriptive Analysis?
A prescriptive analysis looks at “what should we do next?” and tries to figure out the optimal course of action.
Example of Prescriptive Analysis
You forecasted sales will go up by 50% during the holiday season. You also anticipate burger sales will go up if you lowered prices and created more appealing images. Now it’s time to figure out the next plan of action.
To accommodate your predictions, you buy more ingredients for your store: more fish & chips, more scallops and a lot more burgers.
You also add in an extra table to the shop as you’re expecting more customers at once.
The end result: Hopefully everything went as planned and your sales and profits went up. However, there'll often be unexpected circumstances and it's important to keep re-evaluating the data by going through this process.
Data analysis isn't a one time thing, but a continuous process.
Types of Data
There are 4 types of data. Remember the acronym NOIR:
Nominal data is data that’s not related to numerical measurements and has no ordering. For example: Colors.
Red, blue, green aren’t numbers and have no ordering. You can’t say red is greater than blue, therefore it is considered nominal data.
Examples of nominal data include: colors, gender, car models and countries.
Ordinal data have an order, but no value. An example are grading tests: A+, A, B+, B, C+, C etc.
You can order these, but they have no numeric value.
For ordinal data, you can perform median calculations on them, but not mean. You can’t find the average of a bunch of grades, but you can find the middle value.
Examples of ordinal data include: grades, income brackets (e.g. $50-100k), age groups, and likert scale surveys (very satisfied, satisfied, neutral, unsatisfied, very unsatisfied).
Interval data have a numerical value and are evenly spaced apart, allowing us to perform mean calculations. However, we can’t perform ratio calculations on interval data because it doesn’t have a true “zero point.”
Examples include temperature in Celsius/Fahrenheit. There isn’t a true “zero point” because you can go below 0 degrees.
We can’t say the same for Celsius or Fahrenheit because 10 degrees Celsius is NOT twice of 5 degrees Celsius. Also 0 degrees Celsius is not the “true” zero point since you can go below it into negative degrees.
Examples of interval data include: Temperature in Celsius/Fahrenheit, IQ scores, time of each day, voltage.
Ratio data is like interval data, except it has a true “zero point.”
For example, temperature measured in a Kelvin scale is considered ratio data. The “zero point” represents a total lack of thermal energy and you can’t go below it.
This allows us to make ratio comparisons between the data. E.g. 10 Kelvins is twice as much as 5 Kelvins.
Examples of ratio data include: size of land, volume of water, age, soccer goals: values that are evenly spaced apart, but can’t go below zero.
10 Data Analysis Techniques
A list of useful data analysis techniques and their use cases:
1. Data Discovery
Polymer Search’s Auto-Insights is a data discovery tool used for finding top performing combinations in business and marketing data.
Let’s say you ran some Facebook Ads and wanted to find the most profitable target audience.
Polymer’s Auto-Insights tool will generate a report of all the different factors (age, gender, country, device, targeting type) that influences your profits. It’ll find the best combinations for you without having to do hours of work.
2. Pivot Tables
Pivot tables can be created using Excel, Google Sheets or Polymer Search. Pivot tables allow you to aggregate data and quickly answer questions you have about the data.
A t-test is a hypothesis testing technique used to determine if there’s a significant difference between 2 groups.
It is often used in scientific research, but can also be used for business data.
An example is AB testing. You changed the copy of your landing page and want to test if it converts better than the previous. You notice that the new page converts 7% better, but how do you know this wasn’t due to chance?
A t-test can tell you how likely this is due to chance by looking at the data distribution. It might come up with a conclusion like “There is a 8% chance your hypothesis was true” meaning it’s unlikely Page B converted better than Page A.
SPSS is the go-to tool for t-testing.
4. Factor Analysis
Factor analysis is a technique to break down incredibly large datasets. Think: breaking 100 column datasets into 10-20 columns.
Imagine you conducted a marketing survey where you asked 100 questions such as “what is your household income?” and “how much are you willing to spend on organic food each month?”
Looking at every column and comparing them isn’t practical.
Instead, the smarter thing to do is find covariance between the columns and reduce them to a single factor.
Covariance means you’re looking for variables that strongly correlate to each other e.g. household income and willingness to spend might strongly correlate.
Once you find these, you can group these variables into a single factor “purchasing power.”
The end result is you can reduce 100 columns into 10-20 columns which will make the data analysis process much more practical.
5. Bar Charts
The bread and butter of data analysis. I’ve been analyzing data for over 10 years and there has not been a single dataset where a bar chart wasn’t used.
Bar charts are a data visualization technique useful for seeing the connection between 2 variables: a “categorical” variable and a “numerical” variable.
Categorical variables are things like: color, gender, country and age brackets.
Numerical variables are “continuous measurements” e.g. cost, conversion value, profit and age (not age brackets).
6. Scatter Plots (Linear Regression)
Scatter plots are also very common in data analysis. They allow you to find correlations between two numerical variables (e.g. age vs. income).
They’re also useful for spotting clusters, outliers and making predictions by drawing a line of best fit.
7. Time Series
A time series is used for seeing trends over time. In a time series, the x-axis is always “time” (usually measured daily, weekly, monthly, quarterly or yearly). Occasionally it’ll be measured hourly to see trends at different times of the day.
The y-axis is a numerical variable such as “number of sales.” This will show the trend of sales over a time period.
Heatmaps show where all the volume is coming from by representing numerical data using colors.
A common use for heatmaps are seeing where the user clicks on a website. Tools like Hotjar are great for this.
9. Distribution Analysis (Standard Deviation)
A standard deviation tells you how much variance exists in the data.
A low SD means the data is grouped closely together and each number is close to the mean/average (e.g. 4, 4, 4.2, 4.5).
A high standard deviation means the data is spaced apart (4, 9, 9, 20, 25).
Excel can be used to calculate standard deviation. Standard deviation is important for seeing how the data is distributed and can be used for t-testing:
10. Sentiment Analysis
Sentiment analysis is an advanced data mining technique used by data scientists to analyze qualitative data.
Not all data will come in the form of spreadsheets (rows and columns). Sometimes, you get data by conducting in-person interviews, asking for long answer responses, and scraping the web for comments left by users.
This is where sentiment analysis can be useful. Sentiment analysis is used to classify the emotions within the text. Types of sentiment analysis:
Fine-grained sentiment analysis: Looks at breaking down sentences to try identify a topic (target) of a sentiment. With fine-grained sentiment analysis, you can identify who talks about a product, and how they perceive it (negative, neutral, positive).
Emotion detection: Emotion detection is a branch of sentiment analysis that tries to extract emotions from text using advanced emotion detection models which tries to identify key words associated with certain emotions such as anger, jealousy, frustration, and excitement.
Aspect based sentiment analysis: Imagine the sentence “The customer service was good, but the product was poor.” An aspect based sentiment analysis can not only recognize the sentiment, but which object that sentiment is directed to e.g. customer service = good, product = bad.
How to Analyze Data (5 Steps)
Step 1: Define your goals
Step 2: Clean & reformat the data
Step 3: Choosing the Right Tools
Step 4: Perform data analysis techniques
Step 5: Interpret & Present the Data
Step 1: Define Your Goals
Before diving into analyzing data, it’s important to define what you want to achieve from the data analysis.
Create a list of questions you want answered. Very often, you’ll have some questions that you’re dying to find the answer to. Writing down these questions will make the data analysis process so much easier.
Define your end goals. Ultimately what are you trying to achieve? What problems are you trying to fix?
Define ways to measure these goals.
Perhaps you found out your burgers aren’t selling very well and want to make them more popular. In this case, you’ll need to collect data to find out why. The easiest way would be to collect data through a customer survey, designed around your end goals and questions, then analyzing the data.
Step 2: Clean & Reformat the Data
Data is messy. Before you can start analyzing data, you’ll need to inspect it to make sure everything is properly formatted.
Example of data cleaning:
You have some data on residential addresses. Problem is, everybody enters addresses in a different format. Sometimes it’s like this:
80 Street, Suburb, Postcode, State
Other times it’s like this:
Street, 80, State, Country
And quite often, people will use abbreviations such as “St, Rd, Ave” which could make the same address seem like different ones.
Analyzing this kind of data is impossible without cleaning it by putting everything into the same format.
Only when everything is ordered correctly and formatted the same way, then we can use analysis tools to start splitting this information into different columns for “street” “suburb” and more.
Step 3: Choosing the Right Tools
The best tools for data cleaning are:
Best tools for data analysis:
Polymer Search is primarily designed for analyzing marketing, sales and business data. It’s an upgrade from using simple spreadsheets - everything is interactive and it provides you with powerful data analysis features that even a beginner can learn in minutes. Excel is a well-rounded tool for data manipulation and data analysis. Both of these are great beginner tools.
SPSS is great for analyzing sampled data - usually surveys or scientific data. It’s very popular among social sciences and designed for users with intermediate statistics knowledge.
R and Python are more advanced tools for data scientists and allows you to perform more complex analysis like text mining.
Step 4: Perform Data Analysis Techniques
The type of analysis you perform will depend on what kind of data you’re dealing with.
Usually I like to start with data exploration: Seeing how the data is distributed, creating bar charts for each column, finding outliers which often reveals important information about the dataset.
If I have any questions about the dataset, I'll use pivot tables, bar charts and scatter plots to answer them.
Next, I move onto data discovery, using Polymer's Auto-Insights features which helps me discover the relationships between variables. Then I create bar charts to visualize the relationship.
Learning to distinguish between the different types of data will help you decide which data analysis techniques will be best.
Step 5: Interpret & Present the Data
The most important part of being a data analyst is being able to interpret the data and communicating it to your team & stakeholders. What separates a “good” data analyst from a “great” one is the ability to visualize data and tell a story with it.
Experienced data analysts know how to use color theory to highlight points within their data and understand the importance of being able to communicate their findings to non-data-savvy people. They also know how to create interactive dashboards rather than just presenting their data using static graphs.
Although building interactive dashboards and graphs might sound time consuming, there are several tools out there which basically automate this process for you.
With Polymer Search, you can just connect your spreadsheet to the web tool and it’ll automatically turn your data into an interactive web application within seconds. No need to waste time configuring a bunch of settings.
There are many kinds of different datasets out there, and each one will have a different approach.
The great thing about analyzing data is that you can choose how deeply you want to analyze the data. A beginner might just be looking for valuable insights on the surface level which doesn't require any stats knowledge, whilst a data scientist can look into the same datasets and dig deeper for complex information.
The most common types of datasets you'll come across are:
Sales data is defined as any data involving the sales process, whether it be restaurant food sales, software sales or real estate.
Learning how to analyze sales data can benefit businesses by improving team efficiency, finetuning the sales process, and allowing businesses to plan their resources accordingly through sales forecasts.
Polymer Search can provide valuable insights from your sales data through it's artificial intelligence and powerful data analysis features. Here's an interactive tutorial on how to perform a sales analysis - complete beginner friendly.
If your business runs any kind of paid advertising: Facebook Ads, Twitter Ads, Google Ads or paid media, your ultimate goal is to increase conversions whilst reducing costs.
You need to analyze dozens of factors to find the optimal strategy for your campaigns: demographics (age, gender, country, device, interests), ad creative, ad placements, bidding strategy etc.
With so many variables to analyze, it might seem like a daunting task, but here's a great guide on how to utilize Polymer's Auto-Insights to analyze your marketing data.
Surveys are a powerful tool for businesses to understand their target personas and conduct market research. They're cheap to administer and allow you to get a fairly big sample size.
Here's a guide to analyzing surveys where I break down each type of survey question you can ask (multiple choice, long answers, linear scales, ranking questions etc.) and show you how to analyze each one.
Is data analysis hard?
Data analysis is extremely broad, but most types of data analysis isn’t hard. Most business/marketing data is quite easy to analyze, especially when tools like Polymer Search and Excel exist.
On the other hand, if you’re trying to build predictive models or do data mining in R or Python, then it will require some advanced knowledge that will take a few months to learn.
Tools like SPSS, used for scientific research and some business applications, require an intermediate knowledge of statistics to operate.
How long does it take to become a data analyst?
You can become a data analyst capable of analyzing most types of data within 3 months of self-teaching.
Learning how to analyze business and marketing data might just take only 1 week by picking up tools like Excel and Polymer Search which are quick to learn.
Studying how to analyze scientific data can take several weeks if you’re self-taught, although universities tend to drag this process out several years - usually offering 1 statistics course each semester.
3 months of learning R and Python should be enough for most people to be able to do advanced data mining tasks and build predictive models in R and Python.
Can Data Analysis be Automated?
Data analysis can be automated by programming languages such as Python and VBA (Visual Basic).
If you only spend 30 minutes a week analyzing spreadsheets, then picking these up might not be worth it. However, if you spend an hour each day analyzing data, then learning VBA or Python can save a lot of time.
The easiest way to learn data analysis is to dive in and get started analyzing data. Once you get started practicing and experimenting with stuff, you’ll realize data analysis isn’t that difficult!
Sign up to Polymer Search and you’ll be equipped with powerful, easy-to-use data analysis tools, created by a former Tech Lead of Machine Learning at Google Adwords, as well as some sample datasets for you to practice on.