How to load MailChimp's data into Tables and DataFrames

Email marketing campaigns using a SaaS platform like Mailchimp generate a large number of data that can be used to:

  1. Better understand our customers
  2. Monitor and optimize our email marketing campaigns

But to do that, we first need to understand the data we have to deal with. In this post we will go through the data of Mailchimp, what we can pull out of it using its API, understand and properly model the data. Usually, when data are exposed through a web API, it is optimized for this particular job, so when data is pulled, it first has to be properly re-modeled for the database or analytics system that we plan to use. To do that, we need to both understand the data we are working with but also the requirements of the system we are going to use. For this post we’ll consider a relational database for storing the data and thus the data will be modeled accordingly. Also, if you work with a language like R you can consider each table as a DataFrame, the structure and the process for preparing the data will be the same.

Mailchimp resources & main entities

The resources that Mailchimp exposes and that you should replicate on your database are the following:

Resource

Description

Campaign

The core concept of Mailchimp is a Campaign, whenever you implement an email campaign, a Campaign is created. From this resource you get all its related data.

Conversations

Conversations contains all the responses you got from recipients of your email campaigns.

Messages (Conversation sub-resource)

The messages that were exchanged as part of the conversations.

Lists

In order to execute an email campaign in Mailchimp you need a pair of a Campaign and a List of recipients where the mails will be delivered.

Members (lists sub-resource)

The members of the lists, here you will find the actual information of your recipients like emails and demographic information.

Reports

Campaign performance related reports.

Campaign Abuse (Reports sub-resource)

Information about campaign abuse complaints.

Campaign Advice (Reports sub-resource)

Feedback based on campaign’s statistics.

Click Reports (Reports sub-resource)

Reports related to the URLs you have included in your email campaigns.

Domain Performance (Reports sub-resource)

Performance of your campaigns, useful in cases that you are sending mails from different domains.

Email activity (Reports sub-resource)

Probably the most important report, it contains all the interactions of your users with your emails.

Unsubscribes (Reports sub-resource)

Reports related to the recipients you have unsubscribed from your lists.

Ideally, each one of the above resources will end up as a table in your database. But before we load the data into our tables we need to consider a few things as we will see below.

Data format & serialization

Mailchimp delivers its data in JSON and for performance reasons some of the responses contain nested objects, something that makes it difficult to directly store the data in a tabular form. As in the case of the Intercom data we have to consider the following cases:

  • data type conversions
  • Nested objects
  • Arrays

Data types are usually straight forward to convert, special care should be taken for dates and timestamps but the rest are quite easy to implement. Nested objects, if they do not contain any other objects or arrays are also easy to flatten out into a number of columns, where each sub_field is merged with the name of the parent object to form a column in your table or data frame.

Arrays are a bit more difficult to handle and they usually fall into one of the following categories:

  • Arrays of primitive types
  • Arrays of objects

For the first case a possible solution is to flatten out the array into one value by concatenating the array members. The best strategy here is to create one string with a known delimiter and generate only one value. Now in the case of an Array of objects things are a bit more complicated, you will actually have to extract the array and create a new table where each array member will be mapped to one table row. In this last case, make sure that you include information from the parent object as a foreign key, for example an ID of the parent. Now let’s see for each of the Resource mentioned earlier how to handle the delivered data.

Mapping Mailchimp resources to relational tables

The Campaign table

The Mailchimp Campaign entity has the following properties that can be mapped into table columns.

Name

JSON Type

RDBMS Column Type

id

String

Text

type

String

Text

create_time

String

Text

archive_url

String

Text

long_archive_url

String

Text

status

String

Text

emails_sent

Integer

Numeric

send_time

String

Text

content_type

String

Text

recipients

Object

Can be flattened out if some fields are removed.

settings

Object

It can be flattened out.

variate_settings

Object

Can be flattened out if some fields are removed.

tracking

Object

Can be flattened out.

rss_opts

Object

Can be flattened out.

ab_split_opts

Object

Can be flattened out.

social_card

Object

Can be flattened out.

report_summary

Object

Can be flattened out.

delivery_status

Object

Can be flattened out.

Recipients: field contains a total of 5 fields from which 4 are of primitive types and can be directly converted in an appropriate data type and one field named segment_opts which is of type Object needs to be flattened out appropriately. This last field contains a sub-field named conditions which is an Array and which makes things a harder in representing the resource in a table. There are two options, the first one is to remove the conditions field completely, this will result in a much simpler representation of the resource in our database and it can be done when it is known that the information it carries will not be used. The second option, is to extract the field and create a new table called condition_types where each element of the array will be presented as a row. In this case which adds some additional complexity to our data model, we guarantee that we maintain all the information we can get from Mailchimp. As the embedded array does not include any fields that associate its elements with the parent resource, during the extraction and preparation of the data you need to make sure that you will add information from the parent record to each one of the array elements, a good candidate for this is the list_id field.

Settings: can be easily mapped into a relational table, it contains 16 fields where 15 of them are of primitive types and can be easily converted while one of them the auto_fb_post when present, is an array that can be directly flattened out by concatenating its values.

Variate_settings: is more similar to the case of Recipients. It included 11 fields from which 6 of them are Arrays, 5 of these arrays can be flattened out by concatenating their values but the field combinations needs to either be removed or represented as an additional table as it contains objects. Again if you choose to create an additional table, make sure that you have included an id from the parent resource to use as a foreign key for joining the data.

Tracking: contains a total of 10 fields, where 7 of them are of primitive types while the rest named salesforce, capsule & highrise when present are objects containing only additional fields of primitive types, so it is easy to flatten out these fields and store them as additional columns in the table.

Rss_opts: contains 5 fields, from which only one is not a primitive type, named schedule and which contains another object inside with only fields of primitive types. Although a bit deep as a hierarchical resource, the fact that every object is not contained in an array and it contains only primitive types, makes it easy to flatten out all the fields and store them in one table.

Ab_split_opts: contains 14 fields, all of them of primitive types so mapping into table rows is a trivial task.

Report_summary: contains 7 fields, where one of them is an object containing only primitive types, so it will require to be flattened out and embedded in the original table.

The conversations table

The conversations entity has the following structure:

Name

JSON Type

RDBMS Column Type

id

String

Text

message_count

Integer

Numeric

campaign_id

String

Text

list_id

String

Text

unread_messages

Integer

Numeric

from_label

String

Text

from_email

String

Text

subject

String

Text

last_message

Object

Can be flattened out

The last_message object contains only sub-fields of primitive types so everything is quite straight forward on how to convert conversations into a table.

The Messages tables

Messages are a part of conversations, if you check the API documentation of Mailchimp you will see that messages are exposed as a sub-resource of conversations. The structure of this resource is the following:

Name

JSON Type

RDBMS Column Type

conversation_messages

Array

Array, see below

id

String

Text

conversation_id

String

Text

list_id

Integer

Numeric

from_label

String

Text

from_email

String

Text

subject

String

Text

message

String

Text

read

Boolean

Boolean

timestamp

String

Text

Apart from the conversation_id field, everything else is included in objects inside the conversation_messages array. By having an array it means that each response contains multiple messages that we need to extract and save them as separate rows in our table. So each member of the conversation_messages array will become one row in the Messages table. The rest of the fields have a one to one mapping to columns.

The List table

Lists are the second most important resource of Mailchimp, it actually contains information related to your recipients (customers). It has the following structure.

Name

JSON Type

RDBMS Column Type

id

String

Text

name

String

Text

contact

Object

Can be flattened out

permission_reminder

String

Text

use_archive_bar

Boolean

Boolean

campaign_defaults

Object

Can be flattened out

notify_on_subscribe

String

Text

notify_on_unsubscribe

String

Text

date_created

String

Text

list_rating

Integer

Numeric

email_type_option

Boolean

Boolean

subscribe_url_short

String

Text

subscribe_url_long

String

Text

beamer_address

String

Text

visibility

String

Text

modules

Array

Can be flattened out.

stats

Object

Can be flattened out.

List is a bit more complex as an entity and it contains a number of nested objects that we need to take care of.

Contact, campaign_defaults and stats can be flattened out as they do not contain any nested objects nor any arrays, all their fields are primitive types.

Modules: As an array it can be flattened out by concatenating its value and maintaining its name as a column name with data type Text.

Members table

List members are the way of grouping together recipients, or your customers, for fine tuning your email campaigns. Members are actually a sub-resource of Lists.

Name

JSON Type

RDBMS Column Type

id

String

Text

email_address

String

Text

unique_email_id

String

Text

merge_fields

Object

Can be flattened out

interests

Object

Can be flattened out

stats

Object

Can be flattened out

ip_signup

String

Text

timestamp_signup

String

Text

ip_opt

String

Text

timestamp_opt

String

Text

member_rating

Integer

Numeric

last_changed

String

Text

language

String

Text

vip

Boolean

Text

email_client

String

Text

location

Object

Can be flattened out

last_note

Object

Can be flattened out

list_id

String

Text

list_id

String

Text

All the fields of type Object do not contain any arrays or nested objects and can be easily flattened out.

The reports table

The Reports resource contains information about the performance of your campaign and it has the following structure.

Name

JSON Type

RDBMS Column Type

id

String

Text

campaign_title

String

Text

type

String

Text

emails_sent

Integer

Numerical

abuse_reports

Integer

Numerical

unsubscribed

Integer

Numerical

send_time

String

Text

bounces

Object

Can be flattened out

forwards

Object

Can be flattened out

opens

Object

Can be flattened out

clicks

Object

Can be flattened out

facebook_likes

Object

Can be flattened out

industry_stats

Object

Can be flattened out

list_stats

Object

Can be flattened out

ab_split

Object

Can be flattened out

timewarp

Array

Array

timeseries

Array

Array

share_report

Object

Can be flattened out

ecommerce

Object

Can be flattened out

delivery_status

Object

Can be flattened out

All the fields of type object contain simple sub-fields and thus it is straightforward on how to flatten them out and include them in the initial table. The only exception is ab_split which has two sub-objects that contain simple sub-fields, again it is easy to flatten out if you consider one additional level for your objects.

Timewarp & timeseries: are Arrays of objects and thus you have two options, one to remove them if you do not intend to use these fields or to create separate tables for each one. Their objects are simple so nothing special on how to handle them but you will have to include an Id from the reports table to make the reference between the different tables.

The Campaign Abuse report table

This report contains information about abuse mentions for your campaign. Its structure is the following.

Name

JSON Type

RDBMS Column Type

id

Integer

Numerical

campaign_id

String

Text

list_id

String

Text

email_id

String

Text

email_address

String

Text

date

String

Text

This report does not contain any nested objects or arrays and thus there’s a one-to-one mapping to a database table fields and the object fields.

The Campaign advice table

This resource contains advices given from Mailchimp regarding your campaigns and how you can improve their performance, the structure of the table will look like this:

Name

JSON Type

RDBMS Column Type

advice

Array

Numerical

type

String

Text

message

String

Text

campaign_id

String

Text

The API response always contains an array of advices so you will have to parse it and create one row on the table for each response found. The rest of the fields do not need anything special to map them directly into columns.

The Click Reports table

This resource contains information about the interaction between your recipients and any links that you might have included inside your emails. It has the following structure.

Name

JSON Type

RDBMS Column Type

urls_clicked

Array

Array

id

String

Text

url

String

Text

total_clicks

Integer

Number

click_percentage

Number

Number

unique_clicks

Integer

Number

unique_click_percentage

Number

Number

last_click

String

Text

ab_split

Object

Can be flattened out

campaign_id

String

Text

Each response contains an array of URLs that you will have to extract and store as a new row in the table. All the fields have a straight forward one-to-one mapping to a table column except ab_split which can be flattened out just as it was described in the case of the Campaign Report earlier.

The Domain Performance table

This resource contains information about the performance of your domains in the case where you send mails from more than one different domains. It has the following structure.

Name

JSON Type

RDBMS Column Type

domains

Array

Array

domain

String

Text

emails_sent

Integer

Number

bounces

Integer

Number

opens

Integer

Number

clicks

Integer

Number

unsubs

Integer

Number

delivered

Integer

Number

emails_pct

Number

Number

bounces_pct

Number

Number

opens_pct

Number

Number

clicks_pct

Number

Number

unsubs_pct

Number

Number

Each response of the API delivers an array of objects, you need to parse the array and create one table row per object. The object has a simple structure that is easily mapped to table columns.

The Email activity table

Probably one of the most interesting resources of Mailchimp as it contains an “event log” of all the different interactions of your recipient with your emails, so it contains information about their behavior. Useful data for understanding how your customers behave against your marketing campaigns. It has the following structure.

Name

JSON Type

RDBMS Column Type

campaign_id

String

Text

list_id

String

Text

email_id

String

Text

email_address

String

Text

activity

Array

Array

The really valuable information inside this report is inside the activity array, which contains one entry for each interaction of your user. For this reason the best approach is to create one row on your table for each entry of the activity array and enrich its objects with the data coming from the report like the list_id and the email_address. All the fields of the activity array have a simple one-to-one mapping to table columns.

The Unsubscribes Report table

This report contains information related to the requests of your recipients to be excluded from any future campaigns you might run. Its structure is the following:

Name

JSON Type

RDBMS Column Type

unsubscribes

Array

Array

email_id

String

Text

email_address

String

Text

timestamp

String

Text

reason

String

Text

campaign_id

String

Text

list_id

String

Text

You should create one table row for each entry of the “unsubscribes” array which contains the fields that are presented in the rest of the above table. Their mapping to table columns is quite straight forward and they do not require any kind of preprocessing.

Final thoughts

Mailchimp offers a rich dataset that we can use to perform our own analysis to better understand our customers and to optimize our marketing campaigns. Richness comes with a price though, the data model is quite complex and thus it requires to invest some time in better understand the data and how we can represent them more efficiently. Hopefully the above post will help you to understand the data faster and offer you a possible representation, especially if you plan to work with databases or Dataframes (tabular data).

If you would like to automate the process of pulling your data consistently from Mailchimp and having it always at your disposal for analysis, do not forget to check Blendo.

Would you like a demo on how to use Blendo? Reach out to us and we’ll show you around!