AWS S3 Upload large excel file

Could anyone help me out, to upload a large excel file to s3 in multipart upload and read the data from uploaded multipart file.

Currently I am doing like upload the file in multipart by splitting the file bytes by converting them into pandas dataframes using numpy array. Is this the way is correct or is there any alternatives. because it is taking much time to convert the bytes in to pandas to do the multipart upload.

Following is my code:
def multipart_upload(self, filename: str, user_settings: UserSettings, model: RootModel, location: str, content_buffer: Any,
content_df: pd.DataFrame, content_type: str):

    s3_client = get_boto3_s3_client()
    chunksize = 5 * 1024 * 1024

    part_number = 0
    chunk: pd.DataFrame
    parts_info = []

    key_name = S3UtilsBase().prepare_s3_key_path(filename=filename,
                                                 location=location,
                                                 model=model,
                                                 user_settings=user_settings)

    multipart_upload_resp = s3_client.create_multipart_upload(
        Bucket=settings.AWS_S3_BUCKET, Key=key_name)

    for chunk in np.array_split(content_df, len(content_buffer.getvalue()) // chunksize):
        buffer = io.BytesIO()
        part_number = part_number + 1
        excel_file_types = S3UtilsBase().get_excel_types()
        if content_type == 'csv':
            chunk.to_csv(buffer)
        elif content_type in excel_file_types:
            chunk.to_excel(buffer)

        chunk_resp = s3_client.upload_part(Bucket=settings.AWS_S3_BUCKET,
                                           Key=multipart_upload_resp['Key'],
                                           PartNumber=part_number,
                                           UploadId=multipart_upload_resp['UploadId'],
                                           Body=buffer.getvalue())

        parts_info.append({
            'PartNumber': part_number,
            'ETag': chunk_resp['ETag']
        })

    parts_info = sorted(parts_info, key=lambda x: x["PartNumber"])
    cmp_multipart_upload_resp = s3_client.complete_multipart_upload(Bucket=settings.AWS_S3_BUCKET,
                                                                    Key=multipart_upload_resp['Key'],
                                                                    UploadId=multipart_upload_resp['UploadId'],
                                                                    MultipartUpload={"Parts": parts_info})

    paths = cmp_multipart_upload_resp["Key"].split("/")
    separator = "/"
    prefix = separator.join(paths[:-1])
    return AttachmentResponseFromS3(
        Prefix=prefix,
        Error=False,
        Message='',
        Version=cmp_multipart_upload_resp["VersionId"],
        Host=f'{AWS_S3}://{cmp_multipart_upload_resp["Bucket"]}',
        FileName=paths[-1],
        Parts=part_number
    )

DEV Community

AWS S3 Upload large excel file

Top comments (0)

Read next

Azure Verified Modules using Terraform

Python Day 10 - Meme Magic 😍: Building a Custom Meme Generator with Python

The refactors I did to stop my Jetpack Compose LazyColumn from constantly recomposing

How to switch themes in Flutter using BLoC