10 minutes

How to Analyze Survey Data: Step-by-Step Guide (with Example)

So you just did a survey and collected all this valuable data, but now what? How do you analyze it to draw conclusions and find correlations in the data? This guide will teach you how to analyze your dataset even with zero prior knowledge about stats. Let's get started!

How to Analyze Survey Data

We recently wrote a 7 step process where you can analyze almost any dataset without any technical knowledge.

If you have not read that post yet, you can check it out here: how to analyze data [7 step process]

But to summarize, these are the important steps for survey data analysis:

  1. Identify the survey data types
  2. Clean the data
  3. Convert the data
  4. Propose a question and find the answer
  5. Explore the data
  6. Visualize

Types of survey data:

There are 3 types of survey data: numerical, categorical and sentences.

Numerical Answers:

  • 53 kg
  • 170 cm
  • 100 IQ
  • Score: 50 points

Categorical answers: 

  • Yes/no
  • Male/female
  • USA, Canada, UK, Australia
  • Age ranges: 16-25, 25-34, 35-24
  • Never, rarely, sometimes, frequently

Sentence answers:

  • "I think the app can be improved if it loads faster."
  • "Customer service was rude."
  • "I loved the app!"

Numeric answers are the easiest to analyze, followed by categorical answers then sentence answers (which are difficult to analyze).

Converting Sentence Answers into Categorical Answers

Sentence answers are incredibly difficult to analyze. That's why we often convert them into categorical answers whenever possible. Here's how to do that:

survey data analysis

  1. First, go through each answer one by one and take note of any commonalities.
  2. Next create a new column.
  3. Lastly, label each sentence into a specific category.

Sometimes, it isn't possible to categorize a bunch of sentences so you'll need to use text mining in order to analyze the data. Text mining is a very advanced techniques for data scientists and is beyond the scope of this article.

Converting Categorical Answers into Numerical Answers

Sometimes, it's even possible to convert categorical answers into numerical ones.

This is usually only possible in scientific surveys which follow a research method. For instance, the 'Big 5 Personality test' asks the subject 120 multiple choice questions where they can answer: strong disagree, disagree, neutral, agree or strongly agree.

These answers are scored accordingly to a certain criteria:

big 5 personality traits radar chart

Survey Data Analysis

In order to make this step easier, we'll be using an online web tool called Polymer Search which allows us to generate AI insights about our data.

Analyzing surveys is a matter of comparing variables against each other. There are many different combinations for comparison e.g. comparing "gender vs. height" or "IQ vs. income vs. gender"

Polymer Search allows us to see all the different combinations and automatically ranks them from highest variability to lowest. More on how to use the tool later.

After doing conversions, we should have:

  • numerical data
  • categorical data

In general, these are the tools at your disposal:

  • Pivot tables -> Allows you to quickly get answers about your data
  • Polymer's Auto Explainer tool -> Allows you to generate AI insights about your data to find top rankings, anomalies and generate summaries.
  • Bar charts -> for comparing categorical data vs. numerical data.
  • Scatterplots -> for comparing numerical data vs. numerical data.
  • Heatmaps -> See where all the volume is coming from.
  • More about data visualization techniques.

How to Analyze Survey Data (Example to Learn From)

Today, we'll be having a look at this survey, a survey about the prevalence of virtual reality motion sickness, which got featured in many big publications & magazines including VentureBeat. If you're a marketing person, learning how to analyze data and being able to present the information is an ultra valuable skill to have.

This survey is classified as structured data, collected from Google Forms. It contains a combination of qualitative and quantitative metrics, but mostly qualitative ones.

The questions asked:

1) How often do you experience motion sickness in VR?

A) Never

B) Rarely

C) Sometimes

D) Frequently

2) How often do you experience motion sickness in cars/boats/planes?

A) Never

B) Rarely

C) Sometimes

D) Frequently

These are an example of qualitative measurements. It's subjective and therefore we can't perform the usual mathematical calculations to them i.e. it doesn't make sense to say 'sometimes' multiplied by 2 = frequently.

Another question was:

3) Did the motion sickness go away as you got used to VR?

A) Yes

B) No

C) Not applicable

This measurement was also qualitative.

4) What is your age and gender?

Gender = qualitative (Male/Female/Other)

Age = quantitative measurement

The step step to analyzing this data is to cleanse and re-organize it:

Step One: Clean the Data

To clean your data, first address 'null' or 'missing' fields.

Sometimes the respondents forget or refuse to answer one of the questions. Other times there are data collection errors, and lastly, there are 'null answers:'

clean data

Due to this being an online survey, there were a few people who never used VR before i.e. they weren't qualified for the survey.

Delete these rows entirely. Their answers don't matter.

Next, there'll be answers like 'N/A' or 'Not applicable' or just a dash (-). These values can be deleted and left blank, whilst keeping the rest of the row intact. Leaving them empty can make the analzying step simpler.

To delete these, press CTRL + F -> Replace -> Find 'Not Applicable' and leave the replace field empty.

There are other methods of dealing with missing data, and it entirely depends on the situation. Sometimes you delete the entire row, sometimes you leave it blank and other times it's appropriate to get an estimate for that value.

Step Two: Convert Qualitative Data into Quantitative (if possible)

Quantifying qualitative data will make the analysis step tenfold easier! 

This particular survey was designed around the MSSQ-S, a series of questions that researchers use to determine motion sickness susceptibility score. Following the research, we're able to transform the qualitative measurements into quantitative measurements:

  • Never = 0 points
  • Rarely = 1 point
  • Sometimes = 2 points
  • Frequently = 3 points

This allows us to calculate the VR sickness score and get an average, something we were unable to do before.

Using Excel, we can easily edit this data by pressing CTRL + F -> 'Replace' tab. Find all instances of 'Never' and replace it with the number 0. Find 'Rarely' and replace it with the number 1. Do the same for 'sometimes' and 'frequently.'

convert qualitative data into quantitative

Rename columns:

The current column names are too long. This was because it was imported from Google Forms, which uses the survey question as the column name. Longer column names are harder to analyze, so we make them short and sweet:

"Do you experience motion sickness in VR?" -> VR sickness

Do the same for the rest of the columns.

Step Three: Add More Columns

Adding more columns = adding more dimensions for analysis.

Whilst it's useful to have scores for motion sickness in vehicles (cars, boats, planes), it'll be more useful to have a score that provides an overall motion sickness value.

We followed the steps in the MSSQ-S and created a new column called 'susceptibility score' using some basic Excel formulas which you can find online:

add columns

This score tells us the person's overall susceptibility to motion sickness in vehicles and will be a crucial component to analyzing this dataset.

Here's another example of how Alex Almedia creates more dimensions in his dataset to find top converting Facebook ads.

Step Four: Test Hypothesis

If you don't have a hypothesis or any relevant questions, skip to step 5.

Hypothesis: women are more likely to be affected by VR motion sickness than men.

Identify the key variables from the hypothesis. These are:

  • Gender
  • VR sickness score (averaged across genders)

Head into your Polymer Search dashboard and enter the above data into the 'Smart Pivot' section:

how to analyze data from a survey

Results:

In two clicks, you're able to get these results:

  • The average motion sickness susceptibility score for women is 1.34 whilst men are 0.78
  • Other non-binary genders fall somewhere in the middle.

The #COUNT# column tells us our sample size: There were 127 women, 15 other and 144 men in the dataset which is a more than sufficient sample size.

In conclusion: women are much more likely to experience motion sickness than men.

Hypothesis solved!

Alternative Method (Visualization)

Another way of hypothesis testing is to just visualize the data by creating a bar chart or scatterplot.

Hypothesis: VR motion sickness is correlated to real life vehicle sickness (cars, boats, planes).

Very similar to the pivot table method, we just input these variables into Polymer's "visualize" feature and we get our results:

bar chart correlation

There seems to be a perfect correlation here: On average, the higher your susceptibility score, the higher your VR sickness score will be i.e. if you get car sick, sea sick, plane sick, you'll be more likely to get VR sickness.

Step Five: Data Digging

To draw more conclusions from your data, it's a useful skill to know how to dive into the data.

Identify the main variable you're trying to measure - usually, it's one that you're trying to maximize or minimize.

We're trying to analyze the "motion sickness scores" to see who has the highest (or lowest - it doesn't really matter).

Head over to the auto-explainer tool:

  • Input the key variable (motion sickness score) into 'metric to maximize'
  • Set the 'operation' to either SUM or AVERAGE (depending on your case). Here we don't want to choose SUM, because it'll add up the total motion sickness scores and whichever groups have the highest sample size will result in a larger score. We want the AVERAGE.

Choosing MIN or MAX will allow you to find outliers.

The program will output results where there is a good chance of correlation with other variables.

The results at the top will have the highest chance of having some pattern/correlation whilst the results at the bottom have the lowest chance.

data mining

Auto-explainer put 'age' at the top, followed by 'gender.'

Click on 'See details' for each one.

Looking at them closely, we can see some correlation!

We already found that gender correlates to VR sickness, but age was a new discovery we didn't know about. It turns out, older people are more prone to motion sickness than younger people!

finding correlations

The average VR sickness score increases the higher the age group. For a clearer picture, you can plot these results using a bar chart.

Using this method, we also discovered that people who experienced greater motion sickness were also less likely to develop their 'VR legs' - meaning they were unable to overcome their VR sickness over time.

Breakdown by Segments

We can take this analysis even further using 'breakdown by segments.' This feature allows us to look at multiple variables at the same time.

Say we want to see the interaction between gender and developing 'VR legs' (ability to overcome VR sickness). We can do this:

multi-variable correlation

Conclusions:

  • Women are less likely to overcome VR sickness than men (looking at the no. of results).
  • Women who answered 'no' experience the highest degree of motion sickness (average score 2.42) followed by men who said 'no.'
  • Other genders lie in the middle between men and women.

And that's how you analyze survey data!

Posted on
December 22, 2021
under Blog
December 22, 2021
Written by
Ash Gupta
Co-Founder & CEO of Polymer Search. Previously Tech Lead for Machine Learning at Google AdWords and a quant developer on Wall Street.

Latest Stories