AppsFlyer Cost ETL to S3 and breaking it down further

AppsFlyer, one of the most popular MMP, provides access to APIs to pull attribution and analytics reports from your code. This is particularly useful when you have internal analytics tools and want to gain deeper insights into some metrics. While the reports provide an aggregated view, AppsFlyer also lets you view a granular breakdown of each transaction, in the form of Cost ETL



You can setup Cost ETL and push to S3 as shown in the link above and the batches get pushed to your bucket automatically each day after the integration is complete. What is tricky is to partition by date and be able to process it. Hopefully this script in Python allows you to do that or at-least help you get started with it. It reads a parquet and partitions by date. As new reports get pushed, the partition date is overwritten to reflect the new data as recommended by AppsFlyer.

import pandas as pd
import os,sys, json

def partitionRawFilesIntoDateParquets():
# Assuming the fourth batch is available
# And you have downloaded the raw files to your local directory
s3Path = "cost_etl/v1/dt="+fromDate+"/b=4/channel/"
dateFrames = None

arr = os.listdir(s3Path)
for file in arr:
if file.endswith('.parquet'):
df = pd.read_parquet(s3Path+"/"+file)

dates = df["date"].unique()
if dateFrames == None:
dateFrames = dict(list(df.groupby('date')))
else:
currentFrames = dict(list(df.groupby('date')))
for date in currentFrames:
dateFrames[date] = dateFrames[date].append(currentFrames[date])

for date in dateFrames:
content = dateFrames[date]
datePath = "/datePartitions/"+date
if not os.path.exists(datePath):
os.makedirs(datePath)

content.to_parquet("/datePartitions/"+date+"/gz.parquet",compression='gzip')

Comments

Popular posts from this blog

Authoritative Server - MMO

Code dependencies