Why Refactoring?
Refactoring Guru from their website said: "Refactoring is a systematic process of improving code without creating new functionality that can transform a mess into clean code and simple design." I need to refactor my existing code because the process business and the web API project are in one project. I want to extend the functionality to support more storage providers. Previously, the PDFMerger only supported Google Drive, and I want to use AWS S3 to provide the automation. After refactoring, I can change the storage provider easily. If you want to know about my project, I have the GitHub Repository, and it's open source. Feel free to give any feedback. Please see my initial changes in this PR.
Refactoring Steps
I have three steps in this application.
- Download the files from a folder and store into Memory
- Merge the PDFs from the list of files in memory
- Upload the merged PDF into the target directory
I move those functions into the new project. I will have the "interface" or the contract in a project and the integration in another project. For example, I have IDownloader
to provide a download function, IPdfMerger
merge PDFs, and IUploader
the upload function. Besides those contracts, I have IMerge
that will be implemented by the business process.
My previous application will download the merged PDF into the local computer and needs to log in using OAuth2.0. Log in using OAuth2.0 holds me to do automation.
I decided to create a new console project that will use Amazon S3 to read and write into the bucket. Honestly, it's quite challenging because we also add some functionality after the refactoring process.
I'm going to add functionality to download the files concurrently. Currently, the application will download the files sequentially.
More Details
Let's focus on the Amazon S3 implementation. I use AWS SDK to integrate with Amazon S3. For the download process, the application will list all files (the file id or key), so we use the ListObjectsV2
API. I only provide BucketName
and Prefix
, the Prefix
will provide the folder location. Since it might have another page, I also iterate the list and read the NextContinuationToken
to make sure all files are already downloaded.
After getting the list, the application will download each file using the GetObject
API. We store it in the list of MemoryStream.
How about the Upload? It's simple. The application uses the PutObject
API to upload the MemoryStream.
So, what do you think? Again, if you have any feedback, feel free to share it in the comment section.
Permissions
I'm using these permissions for the policy. I create a new policy to fulfill the application function.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:PutObjectRetention",
"s3:GetObjectVersionTagging",
"s3:GetObjectAttributes",
"s3:GetObjectTagging",
"s3:ListBucket",
"s3:GetObjectVersionAttributes",
"s3:GetObjectVersion"
],
"Resource": "*"
}
]
}
Cron Functions
I use Github Action to have the cron function. I'm going to have a cron function in the AWS. If you have any recommendations, feel free to comment on this post. I plan to use Lambda with container images and trigger the Lambda using Amazon EventBridge. If you want to know the cron in Github Action, feel free to visit this page and the source code.
Thank you
Thank you for reading. I will not go deep dive into the code. I just want to share my experience and the process of migrating. It might be stressful if I rewrite the code in this post.
Oldest comments (1)