So you just collected all this valuable data, but now what? How do you analyze it to draw conclusions and find correlations in the data? This guide will teach you how to analyze your dataset even with zero prior knowledge about stats. Let's get started!
Survey data analysis is the process of extracting meaning from the dataset you've gathered. Using survey analysis techniques, one can find correlations, patterns, trends and other insights that can be useful for businesses to guide their decision making process.
For example: You run an ecommerce store and conducted a survey asking your customers why they bought your product. You find out that most of your customers were on the fence about buying the product since there weren't any reviews online, but a Reddit post convinced them to buy it. You then invest more money into online reputation management & PR.
Surveys are important for 4 main things:
They are relatively cheap, easy to administer and allows you to get a good sample size quickly.
In marketing/sales, surveys are a great tool for understanding audience personas, and conducting market research. They can also be used to monitor trends over time. Analyzing survey data can be closely related to analyzing marketing data.
Surveys are also a key component in social sciences to study human behaviour. For example, you want to find out how happy people are living with their partner and how that differes from country to country.
Sometimes surveys can be used for academic or personal reasons. Example: You want to find out what were people's favorite characters in Game of Thrones. And does this differ depending on the demographic?
Surveys can provide really important insights to businesses that give them a market edge.
The easiest (and often the best) way to analyze survey data is by using univariate and bivariate analysis.
Univariate analysis is the analysis of one variable. (One spreadsheet column).
Bivvariate analysis is the analysis of two variables. (Two spreadsheet columns).
Let's take a look at an example:
To get a feel for your data, it's best to start off with a univariate analysis, by seeing how many males, females and other genders participated in the survey. To perform univariate analysis, bar charts are your best friend!
Univariate analysis provides a good starting point to your data and often is a good way to show what demographics took your survey. However, bivariate analysis is where all the interesting analysis happens.
Analyzing survey data is a matter of cross-checking how different variables interact with each other, for instance:
These are all examples of bivariate analysis. The graphs you use will differ depending on the types of data you have.
How do I know which variables to cross check? Oftentimes, intuition is the best way, but there are some methods which I'll show you later.
The first step towards analyzing survey data is identifying the type of data you're dealing with.
There are 3 types of survey data: numerical, categorical and sentences.
These can be as long as several paragraphs or even several documents long.
Numeric answers are the easiest to analyze, followed by categorical answers and long answers are the most difficult to analyze.
The most common types of survey questions are:
Multiple choice example: What is your gender?
Multiple choice questions provide you with categorical answers.
Linear scale example: On a scale of 1-5, how severely do you experience VR motion sickness?
It might seem confusing, but answers on a linear scale can be considered both categorical and numerical. This is because there are only 5 possible answers: 1, 2, 3, 4, 5 and therefore these can be classified into categories.
This type of data is known as ordinal data, meaning 3 is higher than 2, but the distance between them is unknown i.e. we can't say that 3 is 50% higher than 2.
Ranking question example: Rank your most used VR locomotion styles:
Checkboxes example: What do you use VR for? Select all that applies:
Sometimes these questions may be in the form of "select up to 5." Checkboxes provide you with categorical answers, but the format is different to multiple choice questions. The output of your answers will look like this:
Analyzing these types of "list data" can be tricky, but we’ll show you a neat little trick that makes it extremely easy!
Short answer responses example: What VR experience causes the worst motion sickness for you?
“Damn rollercoasters! I nearly vomited one time because of them. Also driving vehicles in VR.”
Short answer responses (and also longer answer responses) can fall under unstructured data. We’ll need to convert these to categorical answers.
Dates: What year were you born?
Dates are a weird one. They don’t fall into either categorical or numerical data. Just treat them as ‘dates.’
Geographic: What country are you from?
Geographic data is categorical, but can also be visualized different than other data types (e.g. geographical heatmaps).
Sentence answers are incredibly difficult to analyze. That's why we often convert them into categorical answers whenever possible. Here's how to do that:
How do you know which variables to compare? And which tools should you be using to analyze the data?
Analyzing survey data is a matter of cross-checking every variable against each other and seeing which ones make sense to analyze.
If your survey only has 10 question and under, it's fine to manually do this yourself, however, for larger surveys,Polymer's Auto-Explainer feature can speed up this process greatly.
Now you’ll have a good idea at what variables you’re trying to compare. Here are the techniques you can use for each of them:
In order to make this step easier, we'll be using an online web tool called Polymer Search which allows us to generate AI insights about our data.
Analyzing surveys is a matter of comparing variables against each other. There are many different combinations for comparison e.g. comparing "gender vs. height" or "IQ vs. income vs. gender"
Polymer Search allows us to see all the different combinations and automatically ranks them from highest variability to lowest. More on how to use the tool later.
After doing conversions, we should have:
In general, these are the tools at your disposal:
For this example, we’ll be analyzing VR Heaven’s survey on motion sickness which contains all the different types of questions you’ll see.
The questions asked:
To clean your data, first address 'null' or 'missing' fields.
Sometimes the respondents don't answer some questions or other times there are data collection errors.
Due to this being an online survey, there were a few people who never used VR before i.e. they weren't qualified for the survey. Delete these rows entirely. Their answers don't matter.
Next, there'll be answers like 'N/A' or 'Not applicable' or just a dash (-). These values can be deleted and left blank, whilst keeping the rest of the row intact. Leaving them empty can make the analysis step simpler.
To delete these, press CTRL + F -> Replace -> Find 'Not Applicable' and leave the replace field empty.
There are other methods of dealing with missing data, and it entirely depends on the situation. Sometimes you delete the entire row, sometimes you leave it blank and other times it's appropriate to get an estimate for that value.
Quantifying qualitative data will make the analysis step tenfold easier!
This particular survey was designed around the MSSQ-S, a series of questions that researchers use to determine motion sickness susceptibility score. Following the research, we're able to transform the qualitative measurements into quantitative measurements:
This allows us to calculate the VR sickness score and get an average, something we were unable to do before.
Using Excel, we can easily edit this data by pressing CTRL + F -> 'Replace' tab. Find all instances of 'Never' and replace it with the number 0. Find 'Rarely' and replace it with the number 1. Do the same for 'sometimes' and 'frequently.'
The current column names are too long. This was because it was imported from Google Forms, which uses the survey question as the column name. Longer column names are harder to analyze, so we make them short and sweet:
"Do you experience motion sickness in VR?" -> VR sickness
Do the same for the rest of the columns.
Adding more columns = adding more dimensions for analysis.
Whilst it's useful to have scores for motion sickness in vehicles (cars, boats, planes), it'll be more useful to have a score that provides an overall motion sickness value.
We followed the steps in the MSSQ-S and created a new column called 'susceptibility score' using some basic Excel formulas which you can find online:
This score tells us the person's overall susceptibility to motion sickness in vehicles and will be a crucial component to analyzing this dataset.
Here's another example of how Alex Almedia creates more dimensions in his dataset to find top converting Facebook ads.
A question like "how many people experience VR motion sickness?" is a good starting point. Pie charts are great for yes/no answers:
It's also a good idea to get to know who your demographics are - so find out the age and gender of the people who took your survey.
Head over to the 'visualize' tab in Polymer and input 'gender' and 'age group' into the y-axis (which is reserved for categorical variables):
Cross-check every variable against each other, using some logic to see what makes sense.
Start by cross-checking categorical variables against numeric variables.
We’ve identified the categorical variables are:
Whilst the numerical variables are:
So cross-checking these, we find these are the useful ones:
Bar charts are your best friend for categorical vs numerical variables.
Again, head over to the ‘visualize’ section in Polymer and let’s set up a bar chart for age vs. VR sickness.
It’s immediately apparent that age is a big influencer in how often someone experiences VR sickness.
Now let’s do the same for gender:
Again, there’s a big discrepancy between males and females whilst “other” remains in between.
If you only care about seeing male and female, you can filter out results by using the left sidebar.
Conclusion: Men experience less VR sickness than women.
Overall: Bar charts are your best friend when it comes to survey analysis! Use them well!
Let's say you asked the question: "What do you use VR for? Select all that applies:"
And you want to compare these answers to "age group" and "gender" to see whether different demographics have different uses for VR.
Analyzing this can be a real pain in Excel and can be tricky even to professional data analysts, but Polymer Search makes analyzing this ultra simple. Polymer automatically recognizes that the answers are separated by commas, so all you have to do is put this data into the 'visualize' section (just like we did before) and you'll have your answer!
See for yourself how fast and easy it is to create visualizations, build dashboards, and unmask valuable insights in your data.Start for free