Posts

Showing posts from February, 2023

AppsFlyer Cost ETL to S3 and breaking it down further

Image
AppsFlyer, one of the most popular MMP, provides access to APIs to pull attribution and analytics reports from your code. This is particularly useful when you have internal analytics tools and want to gain deeper insights into some metrics. While the reports provide an aggregated view, AppsFlyer also lets you view a granular breakdown of each transaction, in the form of Cost ETL .  You can setup Cost ETL and push to S3 as shown in the link above and the batches get pushed to your bucket automatically each day after the integration is complete. What is tricky is to partition by date and be able to process it. Hopefully this script in Python allows you to do that or at-least help you get started with it. It reads a parquet and partitions by date. As new reports get pushed, the partition date is overwritten to reflect the new data as recommended by AppsFlyer. import pandas as pd import os,sys, json def partitionRawFilesIntoDateParquets (): # Assuming the fourth batch is availabl...

S3 Functions in Python using Boto

 1.  Copy from one folder in a bucket to another using Boto. Configure Boto AWS as mentioned in the documentation  here import boto3 import os,sys bucketName= "YOUR BUCKET NAME" # set this to True if you want the objects in copyToPathDirectory to be deleted first # set to False if the object being copied has the same name deleteObjectsFirst = False # Assuming Boto is setup s3 = boto3.resource( 's3' ) s3Client = boto3.client( 's3' ) bucket = s3.Bucket(bucketName) copyToPathDirectory = "the/directory/youwant/to/copyto/" copyFromPathNames = [ "path1/to/copyfrom/" , "path2/to/copyfrom/" , "and/so/on/" ] # WARNING: This will delete all objects in your folder # Please check your path before running if deleteObjectsFirst: delete_key_list = [] for copyFromPath in copyFromPathNames: for obj in bucket.objects. filter (Prefix = copyFromPath): deleteKey = copyToPathDirectory + obj.key delete_key_list...