10 min read

How to analyze any dataset (with zero programming)

Data analysis can help executives solve problems and make smarter strategic business decisions in a controlled environment, but can be time consuming and expensive. Here we propose a simple solution for analyzing almost any kind of data with zero coding required.

Data analysis is a key driver in decision-making for businesses. It can help executives solve problems and make smarter strategic decisions with controlled risks.

There are many kinds of data in this world: business data, survey data, scientific data, sales/marketing data, and analyzing them can be overly convoluted due to the many different approaches to analyzing data and the technical skills required.

Here we propose a simple solution for analyzing almost any kind of data with zero coding required:

7 Steps to analyzing data

There are 7 steps to analyzing data:

  1. Clean & re-organize your data
  2. Classify data types
  3. Convert qualitative data into quantitative (if possible)
  4. Test hypothesis
  5. Discover other patterns/trends/correlations (data exploration)
  6. Visualize results
  7. Interpret results: correlation = causation?

Step One: Clean & re-organize data

Cleaning and re-organizing your data is the first thing you should do before diving into analysis. There are dozens of ways of cleaning data, but here are three useful tips:

First, Address missing fields: often datasets will have missing fields due to various reasons: Maybe you conducted a survey and not all respondents answered the question. Or maybe it was due to human errors during data entry tasks. Whatever the case, you need to get these missing values addressed.

excel missing data

How to address missing fields:

  1. Delete the whole row entirely - the simplest way, but often not the best way.
  2. Make/keep the field empty. Make sure it doesn't say anything like 'N/A.'
  3. Predict the value based on averages (this depends on the circumstance)

Option 1 can work if you have a large enough dataset where you can afford to delete some rows. Sometimes, there are too many missing values for this method to work.

Option 2 & 3 are the preferred methods, but it depends on what you're analyzing.

Rename columns

It's important to have good naming conventions for columns: generally, you want them to be short (1-3 words) and easily understandable.

BAD: "What is your favorite color?"

GOOD: "Favorite Color."

Add More Dimensions for Analysis

Although it seems counterintuitive, adding more columns can actually make data analysis easier. Example: When you're analyzing marketing data, you might have columns for 'amount of sales' and 'cost of each time.' It'll be beneficial to create a column called 'total cost of products sold' by multiplying both columns.

Adding more columns also allows you to derive important conclusions from the data that weren't there before.

You can add more dimensions to your data by classifying existing information into columns. For, example, if you have a list of grocery sales (ribeye, butterheads, arrowroots), you can create another column that classifies these foods into food types (meats, vegetables, cookies).

Alex Almeida, a PPC specialist, uses this method to find high converting Facebook ad creatives: See how he does it here.

Step Two: Classify Data Types

structured vs unstructured data

Is the data structured or unstructued? 

Which measurements are qualitative, which are quantitative? This will determine what types of analyses you should perform.

Are the measurements nominal, ordinal, interval, or ratio? This will determine what kinds of calculations you can perform.

I've covered the different data types in detail in this blog post, but to summarize:

  • Structured = organized into rows/columns (e.g. Excel datasets), unstructured = not organized into rows/columns (e.g. a collection of emails/documents).
  • Quantitative = numeric measurements (e.g. 62 kg, 100 points), qualitative = non-measureable qualities e.g. male/female, red/blue, happy/sad.

Structured and quantitative (numeric) data are the easiest data types to analyze.

Try converting unstructured data to structured, and qualitative measurements into quantitative.

Otherwise, unstructured, qualitative data will require advanced analysis techniques like text mining which will require programming knowledge. It's something that's beyond the scope of this article.

Step Three: Convert Qualitative Data into Quantitative Data

There are two ways you can make qualitative data easier to analyze: by converting it to quantitative data or by finding ways to quantify it.

1. Converting qualitative in quantitative:

It's not always possible to convert qualitative into quantitative, and usually only happens in scientific research.

Scientific surveys/studies usually have scoring criteria for answers that are non-quantities.

For example, an HR manager might use personality tests to help find suitable candidates for the job. They might ask you questions like:

"What would you do in this situation?" A, B, C, or D? 

With a scoring criteria:

  • A = 1 point in openness
  • B = 1 point in extraversion
  • C = 1 point in agreeableness
  • D = 1 point in conscientiousness

Whether you agree with these results or not, this type of conversion is possible because there's been scientific research linking these traits to workplace performance.

Before designing the survey, make sure to do some research to see if this type of conversion is possible.

2. Quantifying qualitative data:

If you aren't doing a scientific survey, usually this approach is best.

Let's say you conducted a customer satisfaction survey and asked them:

  • Rate Customer Service 1-10
  • Reason for Rating

Rating = quantitative (e.g. 8/10)

Reason = qualitative (e.g. "they didn't solve my issue")

Here's how to quantify these statements:

  1. Read through each stat and start taking note of common themes (i.e. reason for their unhappiness). The common themes in this case might be stuff like "slow customer service and "unhelpful customer service"
  2. Start labeling each statement into these common themes.
  3. There'll be some statements that don't fall under either any categories. Put them as 'other' or 'uncategorized.'

Aftermath:

convert qualitative data into quantitative

Although you won't be able to perform calculations like sum, divide, multiply, average on the category, you can calculate the frequency at which it occurs.

Use Polymer's visualize feature to graph these results for a better understanding:

interactive bar chart data visualization

Results:

  • Most cases of customer dissatisfaction is due to 'other' reasons
  • We also see there are many cases of customer service being unhelpful and slow at resolving issues. Perhaps better training is needed.

Instead of showing the frequency of these cases, we can also map the x-axis with the (average) score:

This tells us bad manners/etiquette and unhelpful customer support result in the lowest customer satisfaction score.

Step Four: Test hypothesis

Often, you'll begin analyzing your data with a hypothesis or question.

Let's say we're running some Google ads.

Question: Where are our conversions coming from?

It doesn't matter if you have a hypothesis or question, as long as you have a starting point. I usually just start with a question I have about my data.

To test this, we can either:

  • Graph the results (like above) or
  • Use Smart Pivot, a tool within Polymer which allows us to get our answers in a few clicks.

To use Smart Pivot, all we have to do is input these two variables: "country user" and "conversion value" into Smart Pivot and it'll give us our answer:

creating pivot tables in polymer

Results: Most of the conversions are coming from Indonesia ($134,835,983 total, which is 450.19% above average), followed by Brazil and United States. The middle column (#COUNT#) tells us our sample size.

We can take this further and see how multiple variables interact with each other. If we want to look at both gender and country, we just input both those variables in:

pivot table

Step Five: Data exploration - Discovering patterns/trends/correlations

After testing your hypothesis, you might want to draw other conclusions from your data.

This is where Polymer's auto-insights feature comes in handy. It allows you to dig into your data to learn more about it.

Using auto-insights can save you hundreds of hours on Excel.

How to use auto-insights:

Input your main variable into the "Metric to Maximize" field:

How to use auto explainer

Examples of what goes in this field are: products sold, conversion rate, customer satisfaction scores, i.e. key metrics you want to maximize

Auto-explainer will display other variables such as age, gender, country, device that will influence the main variable (conversion value). It will be ordered from highest variability (highest chance of correlations) to lowest variability.

Auto Explainer

Since 'country user' is at the top, there is high variability in 'country user' - meaning country seems to have a big impact on our conversion value. Second highest variability is campaign bid strategy type and the list goes on...

"See Details" allows you to see the full list of countries.

It's highly recommended that if you find an interesting correlation that you plot these results onto a graph (this will be demonstrated later).

See how multiple variables interact with each other

The above method only tells you how variable A correlates with variable B (e.g. correlation between conversion value and country).

If you want to look at the interaction between multiple variables, you'll need to use 'breakdown by segments.'

Imagine if you ran Facebook ads and your Excel data contains: the number of conversions, gender, country, age, device type, audience interests and cost of acquisition. The goal would be to find which combinations provide the highest return on investment (ROI).

This type of analysis is a nightmare to do in Excel; but only takes a few clicks on Polymer.

Like before, input the 'conversion value' into the 'metric to maximize' field. But this time, we want to calculate the ROI, so also add in the cost of these ads into the 'metric to minimize' field.

Then put the demographics info into 'breakdown by segments' field:

We've sorted the ROI from highest to lowest, so we can see which demographics bring in the highest ROI.

Step Six: Visualize the Results

Visualizing data can help us see correlations more clearly and allow us to present the data to clients.

With Polymer Search, you can make beautiful-looking graphs/charts/tables which are fully interactive with no coding required - just a few clicks. Many agencies use Polymer just for reporting purposes since it's a great way to show off to clients!

Here's how to do it:

Create Bar Chart
  • Bar charts are a good way to visualize pivot tables
  • Scatterplots are a good way to visualize correlations
  • Time series are good for finding trends over time
  • Heatmaps are good for finding the 'hotspots' where there is high volume

Presenting the Data

You've now analyzed your dataset, it's time to present it.

There are 2 ways to share your visualizations on Polymer:

  1. Creating a shareable link - kind of like sharing Google docs. People can just click on the link and they can see the bars/graphs you created.
  2. Embedding the results on your own website (for branding purposes).

These options should be available at the top right:

scatterplot correlation

Step Seven: Interpret the Results:

Maybe you've noticed targeting Canada brings a higher return on investment than the US. Or maybe you noticed certain products sell better than others.

After seeing all the correlations/trends the data has to offer, it's time to decide: which correlations are meaningful?

This is where some thinking is required.

One common mistake is seeing a correlation between two variables, but not accounting for other possible reasons for why this occurs. Example: Canada had a higher ROI than the US because the ads were mostly shown to people living in rich cities in Canada. If you notice a correlation, always be on the lookout for other variables that might influence the correlation.

To test for causality, you'll need to perform more experiments whilst controlling variables.

Examples to Learn From:

The best way to learn is through example.

If you have any questions, feel free to reach out to hello@polymersearch.com

Get started with Polymer.


Posted on
under Blog
Written by
Ash Gupta
Co-Founder & CEO of Polymer Search. Previously Tech Lead for Machine Learning at Google AdWords and a quant developer on Wall Street.

Latest Stories