DEV Community

Cover image for Part 4b: Problem solving and debugging
Simon Green
Simon Green

Posted on • Edited on

Part 4b: Problem solving and debugging

Included in this chapter...

  • Objective
  • How I resolve technical issues...
  • Issue 1: API Gateway Internal Server Error
  • Issue 2: Malformed Lambda proxy response
  • Issue 3: Git push, file exceeding limit
  • Issue 4: GitLab CI Variables
  • Issue 5: CloudFront - Failed to contact the origin
  • Issue 6: API Gateway: Missing Authentication Token
  • Issue 7: CSS intermittently not loading
  • Conclusion

Objective

This post is the second part of my previous post (Part 4a: Terraform IaC, GitLab CI and automated testing) where I explain seven key technical issues I faced during the development of the AWS / Terraform IaC part of the challenge and I explain how I was able to resolve each.


My process for resolving technical issues...

Issues can arise when adding a new implementation or as a result of making changes to an existing configuration.

My approach to handling an issue goes something like this:

  • Gather the available error logs
  • Understand the information flow and where in that flow the error is being generated
  • Research
  • Testing
  • Documenting

Research can take the form of reading the output logs from various sources (or creating them if necessary), reading official documentation, blogs, community forums etc. The latter options often times can be old and it’s good to understand that the technology in question is changing fast, however they can often times provide solutions or clues that can be investigated further.

When resolving issues, I typically make one change at a time and test frequently to help identify the cause of the issue, and take steps to be able to roll back to a previous state if necessary.

The research and testing steps are typically a continuous cycle until the problem is resolved. Sometimes breaking the issue down into smaller elements for testing works, sometimes I find that a different approach is needed to achieve the required outcome, but "all" issues can be fixed with persistence and patience.

With that said, here are some issues I experienced...


Issue 1: API Gateway Internal Server Error

With the API Gateway and Lambda created as IaC, I tried to invoke the Lambda function from the API Gateway console using the ‘Test Resource’ feature and was continuously receiving a “500 Internal Server Error” response code, as seen below:

Internal server error message 1

Internal server error message 2

The logs indicate there are Invalid permissions on the Lambda function, indicating a possible IAM issue to resolve in the Terraform config.


Issue 1: Solution

From the Terraform documentation:

execution_arn - Execution ARN part to be used in lambda_permission's source_arn when allowing API Gateway to invoke a Lambda function, e.g., arn:aws:execute-api:eu-west-2:123456789012:z4675bid1j, which can be concatenated with allowed stage, method and resource path.

After some testing I was able to resolve this issue by simply concatenating a ‘/*’ to the end of the API Gateway ARN output value, with this change it now allows invocation from any API Gateway stage, method or resource path.

Before:

output "api_gateway_source_arn" {
 value = aws_api_gateway_rest_api.api_gateway_tf.execution_arn
}
Enter fullscreen mode Exit fullscreen mode

Corrected:

output "api_gateway_source_arn" {
  value = join("",[aws_api_gateway_rest_api.api_gateway_tf.execution_arn,"/*"])
}
Enter fullscreen mode Exit fullscreen mode



With this updated API Gateway could successfully invoke the Lambda function however this then led to issue 2, detailed below.


Issue 2: Malformed Lambda proxy response

With this issue again when trying to invoke the Lambda function I was receiving a 502 Bad Gateway, configuration error due to a *Malformed Lambda proxy response.

From the logs I could see that the request was being sent successfully (as it returned the ‘count’ value) however it was failing at the response.

Image description


Issue 2: Solution

This took quite a bit of trial and error to find the solution but I managed to resolve this by updating the api_gateway.tf file and under the aws_api_gateway_integration resource by changing the ‘type’ argument/configuration from type = ‘AWS_PROXY’ (for Lambda proxy integration) to type = ‘AWS’ (for AWS services).

With this updated this returns the updated visitor count without any error.

resource "aws_api_gateway_integration" "integration" {
  rest_api_id                   = aws_api_gateway_rest_api.api_gateway_tf.id
  resource_id                   = aws_api_gateway_resource.resource.id
  http_method                   = aws_api_gateway_method.method.http_method
  integration_http_method       = "POST"
  # type                        = "AWS_PROXY"       # <--- OLD
  type                          = "AWS"         # <--- UPDATED
  uri                           = var.lambda_function_arn
}
Enter fullscreen mode Exit fullscreen mode

NOTE: I have later updated this config to use integration type "AWS_PROXY", to resolve the issue I needed to update the lambda/Python response to match the requirements of the API Gateway integration requirements. I now understand that this is an ideal way to receive a response in API Gateway when invoking Lambda.


Issue 3: Git push, file exceeding limit

For Terraform to work correctly with AWS it downloads in the background an AWS Provider Binary file (409MB) to the local repo. When pushing the latest updates to GitLab this file exceeds the upper limit of what you can upload.


Issue 3: Solution

Using the Terminal I searched for large files within the project folder using this command:

find cloud-resume-challenge -type f -size +200M

I then added these files to the .gitignore file, essentially preventing the upload of the specified files.

If you were to clone the repo onto another machine, these binary files would automatically be downloaded anyway.


Issue 4: GitLab CI Variables

Whilst configuring the GitLab automation pipeline in the ´.gitlab-ci.yml´ file, within the GitLab CI/CD settings → Variables settings I had configured the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY however I was continually seeing errors like:

upload failed: ./foo.txt to s3://gitlab-test-bucket-sg/foo.txt An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.

…or…

upload failed: ./foo.txt to s3://gitlab-test-bucket-sg/foo.txt Unable to locate credentials

Issue 4: Solution

I found the solution was simply to uncheck the “Protect Variable” option within the GitLab UI in the Variable configuration. The documentation states:

“Protect variable Optional. If selected, the variable is only available in pipelines that run on protected branches or protected tags.”

With the "Protect variable" option unchecked the pipeline worked 🙂

Image description


Issue 5: CloudFront - Failed to contact the origin

To help test the functionality of the CloudFront configuration, The CloudFront console provides a 'Distribution domain name' URL that should, in my case open the 'index.html' file in the browser.

Image description

In my case, this Terraform created CloudFront service was opening an error page, indicating that it has Failed to contact the origin.

Image description


Issue 5: Solution

The resolution to this issue was a two-fold solution.

For the first I found there was a syntax error in the formatting of the local value in cloudfront.tf file, before the declared region variable I had used a hyphen, whereas this should be formatted with a period, see below for the example.

Issue 1 of 2

locals {
    # Before and after:
    # s3_domain_name = "${var.bucket_name}.s3-website-${var.region}.amazonaws.com"
    s3_domain_name = "${var.bucket_name}.s3-website.${var.region}.amazonaws.com"
}
Enter fullscreen mode Exit fullscreen mode



Issue 2 of 2

With this syntax error corrected I was still unable to open the index.html file using the '​​Distribution domain name', so after further investigation I found out that for the aws_cloudfront_distribution resource configuration, the origin_protocol_policy should be set to http-only and not https-only.

resource "aws_cloudfront_distribution" "cloudfront_dist_tf" {
  origin {
    origin_id       = local.s3_origin_id
    domain_name        = local.s3_domain_name
    custom_origin_config {
      http_port              = 80
      https_port             = 443
      # origin_protocol_policy = "https-only" # <--- Before
      origin_protocol_policy = "http-only" # <--- Updated
      origin_ssl_protocols   = ["TLSv1"]
    }
  }
Enter fullscreen mode Exit fullscreen mode

With this second issue now corrected, the CloudFront Distribution domain name now successfully opened the index.html file.


Issue 6: API Gateway: Missing Authentication Token

With the Terraform structure mostly created, the API Gateway and Lambda were good and working well, however when trying to open the API Gateway Invoke URL in the browser, I was receiving the error {"message":"Missing Authentication Token"}.

Image description

Image description

In my case the expected result should return the raw, updated visitor count number into the browser, this is what is passed to the index.html file, see below for the expected result after Invoking the URL:

Image description


Issue 6: Solution

After a fair amount of investigation and trial and error with all types of API Gateway configuration I figured out that the provided 'Invoke URL' provided by API Gateway simply needed to be updated to match my Terraform settings.

In creating the Terraform aws_api_gateway_resource, I named the path_part as "resource".


resource "aws_api_gateway_resource" "resource" {
  path_part   = "resource"
  parent_id   = aws_api_gateway_rest_api.api_gateway_tf.root_resource_id
  rest_api_id = aws_api_gateway_rest_api.api_gateway_tf.id
}
Enter fullscreen mode Exit fullscreen mode

So to be able to open the Invoke URL, I needed to concatenate '/resource' to the end of the URL for the output, which I did in the following way:

output "invoke_url_full" {
  value = "${aws_api_gateway_deployment.deployment.invoke_url}/resource"
}
Enter fullscreen mode Exit fullscreen mode



In addition, I also configured Terraform to output this updated URL locally to a text file which would be uploaded together with the other website files to the GitLab repo. Like this, as this URL is expected to change, the index.html file refers to this text file for the most recent Invoke URL, even if the Terraform resource is destroyed and re-created.

With these two changes made, the updated URL finally returns the count in the browser.

Image description


Issue 7: CSS intermittently not loading

This one was a front end issue I was experiencing where after uploading the website files to S3, opening the domain in the browser, sometimes it initially open with CSS but subsequent refreshes it would load with no CSS.

Testing locally it worked fine, confirming the styles.css was good.

Testing online and assuming it was a caching issue, I tried hard page reloads and testing in other browsers but it still resulted in the html page with no CSS. Also checking the console output in Chrome Developer Tools, it showed that the CSS file was loaded with status code = 200, however no CSS was being loaded.


Issue 7: Solution

Digging further, I did a curl request that confirmed the styles.css file type was “text/html”, as this is how it was configured when uploaded to the bucket by Terraform.

curl -I https://www.simon-green.uk/styles.css | grep -i content-type


In Terraform I updated the file type for the styles.css file to be 'text/css' when uploaded to S3, and thankfully this resolved the issue.


Output before:
content-type: text/html

Output after:
content-type: text/css


However it's worth noting that to make this a more robust flow, I ended up updating the Terraform configuration so that it no longer uploads any website files to the bucket, instead Terraform creates an empty bucket and the files are then added using Git (after passing Cypress testing).


Conclusion

In this post, I've shared my approach to resolving technical issues I encountered during the AWS/Terraform Infrastructure as Code (IaC) part of challenge. It's all about methodical problem-solving, starting with gathering information, researching, and testing solutions.

From API Gateway errors to CloudFront misconfigurations, each challenge required persistence and patience. By making one change at a time and testing frequently, I was able to pinpoint the root causes and implement effective solutions.

Ultimately, navigating technical hurdles demands a blend of technical know-how and perseverance. By embracing a systematic approach and leveraging available resources, we can overcome any obstacle in our development journey.

Top comments (0)