DEV Community

Yasuhiro Matsuda for AWS Community Builders

Posted on • Edited on

Story about migrating Keycloak using Fargate

This article is about migrating Keycloak on Fargate, but it also describes how to scale out Fargate according to load and how to scale out according to the time of day, controlling scale out on Fargate It is also a reference for those who want to.

What is Keycloak

Open source identity and access management software for single sign-on and API access authentication and authorization control.

What is Fargate

It is an AWS management service that can execute containers. There are also ECS on EC2 that executes containers on EC2 and AWS EKS, which is a Kubernetes management service, but AWS Fargate is recommended for environments that can be executed simply without maintaining the container execution infrastructure.

About this story

This story describes the story of migrating Keycloak services built on ECS on EC2 to Fargate and migrating Keycloak from v6 to v19.


The story of migrating Keycloak's services that were built on ECS on EC2 to Fargate

There are three advantages of migrating ECS on EC2 to Fargate, and we migrated it before the version upgrade described below.

  • No need to maintain container infrastructure
  • Minimizes the cost of scaling out
  • Faster startup time during scale-out, making it easier to follow spikes

The story of migrating Keycloak from v6 to v19

Keycloak basically only needs to be started with a new version because it has a function to automatically migrate at startup, but there are some incompatibility problems due to DB constraints. Therefore, each time an error occurs, you need to investigate how to respond and adjust for inconsistencies.

In the following example, SELECT REALM_ID, NAME, COUNT() FROM KEYCLOAK_GROUP WHERE PARENT_GROUP IS NULL GROUP BY REALM_ID, NAME HAVING COUNT() > 1; can detect duplicate group names.

ERROR [org.keycloak.connections.jpa.updater.liquibase.conn.DefaultLiquibaseConnectionProvider] (ServerService Thread Pool -- 67) Change Set META-INF/jpa-changelog-9.0.1.xml::9.0.1-KEYCLOAK-12579-add-not-null-constraint::keycloak failed. Error: Duplicate entry 'school- -ks' for key 'SIBLING_NAMES' [Failed SQL: UPDATE authdbdev.KEYCLOAK_GROUP SET PARENT_GROUP = ' ' WHERE PARENT_GROUP IS NULL]
FATAL [org.keycloak.services] (ServerService Thread Pool -- 67) java.lang.RuntimeException: Failed to update database
Enter fullscreen mode Exit fullscreen mode

It is also important to note that the environment variables to be set have changed due to the migration from WildFly to Quarks.

WildFly Quarks
DB_DATABASE KC_DB_URL_DATABASE
DB_HOST KC_DB_URL_HOST
DB_PASSWORD KC_DB_PASSWORD
DB_USER KC_DB_USERNAME

When configuring a multi-node cluster with Infinispan defined in standalone-ha.xml in Wildfly, the following environment variables must be set in Quarks after v17.

KC_CACHE="ispn"
KC_CACHE_CONFIG_FILE="cache-ispn-jdbc-ping.xml"

The cache-ispn-jdbc-ping.xml performs the following description (when MySQL is selected for RDS): owners sets the number of nodes on which the cache is kept.

If you are scaling out while running with at least two nodes to maintain availability, you must determine the number of nodes while considering the number of nodes that will degenerate simultaneously when scaling in. (Since you cannot control the nodes when scaling in, you need to devise a way to prevent the cache from being lost by deleting the nodes that hold the cache all at once.)

Also, realms and users max-count affect performance. If you keep a session that exceeds max-count, communication with the DB will occur, so it is better to increase max-count as much as memory allows.

However, when starting in Duplicated mode instead of Replicated mode, it is necessary to thoroughly test with a load test tool using Distributed Load Testing on AWS, etc. so that the cache is rebalanced when scaling in, resulting in out-of-memory. For details of the parameters, see Configuring Infinispan caches and urn:infinispan:config:11.0.

<?xml version="1.0" encoding="UTF-8"?>
<infinispan
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="urn:infinispan:config:11.0 http://www.infinispan.org/schemas/infinispan-config-11.0.xsd"
    xmlns="urn:infinispan:config:11.0">

  <jgroups>
    <stack name="jdbc-ping-tcp" extends="tcp">
      <JDBC_PING connection_driver="com.mysql.cj.jdbc.Driver"
                 connection_username="${env.KC_DB_USERNAME}" connection_password="${env.KC_DB_PASSWORD}"
                 connection_url="${env.KC_DB_URL}"
                 initialize_sql="CREATE TABLE IF NOT EXISTS JGROUPSPING (own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, ping_data VARBINARY(255), constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name));"
             info_writer_sleep_time="500"
                 remove_all_data_on_view_change="true"
                 stack.combine="REPLACE"
                 stack.position="MPING" />
    </stack>
  </jgroups>

  <cache-container name="keycloak">
    <transport lock-timeout="60000" stack="jdbc-ping-tcp"/>
    <local-cache name="realms">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <memory max-count="10000"/>
    </local-cache>
    <local-cache name="users">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <memory max-count="10000"/>
    </local-cache>
    <distributed-cache name="sessions" owners="3">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="authenticationSessions" owners="3">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="offlineSessions" owners="3">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="clientSessions" owners="3">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="offlineClientSessions" owners="3">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="loginFailures" owners="3">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <local-cache name="authorization">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <memory max-count="10000"/>
    </local-cache>
    <replicated-cache name="work">
      <expiration lifespan="-1"/>
    </replicated-cache>
    <local-cache name="keys">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <expiration max-idle="3600000"/>
      <memory max-count="1000"/>
    </local-cache>
    <distributed-cache name="actionTokens" owners="3">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <expiration max-idle="-1" lifespan="-1" interval="300000"/>
      <memory max-count="-1"/>
    </distributed-cache>
  </cache-container>
</infinispan>
Enter fullscreen mode Exit fullscreen mode

The Dockerfile is as follows:

FROM quay.io/keycloak/keycloak:19.0.3
COPY conf/keycloak.conf /opt/keycloak/conf/keycloak.conf
COPY conf/cache-ispn-jdbc-ping.xml /opt/keycloak/conf/cache-ispn-jdbc-ping.xml

RUN /opt/keycloak/bin/kc.sh build --cache-config-file=cache-ispn-jdbc-ping.xml
WORKDIR /opt/keycloak

ENTRYPOINT [ "/opt/keycloak/bin/kc.sh" ]
Enter fullscreen mode Exit fullscreen mode

The definition of ECS in Terraform is as follows: Please understand that the part marked with _xxxx_ is a constant passed in a variable.

resource "aws_ecs_cluster" "keycloak" {
  name     = "clustername"
  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

resource "aws_ecs_service" "keycloak" {
  cluster                            = aws_ecs_cluster.keycloak.id
  deployment_maximum_percent         = 200
  deployment_minimum_healthy_percent = 100
  desired_count                      = _keycloak_desired_count_min_
  enable_ecs_managed_tags            = false
  enable_execute_command             = true
  health_check_grace_period_seconds  = 180
  name             = _servicename_
  platform_version = "LATEST"
  propagate_tags      = "TASK_DEFINITION"
  scheduling_strategy = "REPLICA"
  task_definition     = aws_ecs_task_definition.keycloak.arn

  capacity_provider_strategy {
      capacity_provider = "FARGATE"
      base              = 2
      weight            = 1 // After the third unit, it will be started with FARGATE at a rate of 25%
  }

  capacity_provider_strategy {
      capacity_provider = "FARGATE_SPOT"
      base              = 0
      weight            = 3 // After the third unit, it starts with FARGATE_SPOT at a rate of 75%
  }

  deployment_circuit_breaker {
    enable   = false
    rollback = false
  }

  deployment_controller {
    type = "ECS"
  }

  load_balancer {
    container_name   = "keycloak"
    container_port   = aws_alb_target_group.keycloak.port
    target_group_arn = aws_alb_target_group.keycloak.arn
  }

  network_configuration {
    assign_public_ip = true
    security_groups = [
      aws_security_group.keycloak.id
    ]
    subnets = _cluster_subnets_
  }

  timeouts {}
  lifecycle {
    ignore_changes = [desired_count]
  }
}

resource "aws_ecs_task_definition" "keycloak" {
  container_definitions = jsonencode(
    [
      {
        cpu               = 0
        command           = ["start --optimized"]
        disableNetworking = false
        portMappings = [
          {
            containerPort = aws_alb_target_group.auth.port
            hostPort      = aws_alb_target_group.auth.port
            protocol      = "tcp"
          }
        ]
        environment = [
          {
            name  = "KC_DB_URL_DATABASE"
            value = _KC_DB_URL_DATABASE_
          },
          {
            name  = "KC_DB_URL_HOST"
            value = _KC_DB_URL_HOST_
          },
          {
            name  = "KC_DB_URL"
            value = _KC_DB_URL_
          },
          {
            name  = "KC_DB_PASSWORD"
            value = _KC_DB_PASSWORD_
          },
          {
            name  = "KC_DB_USERNAME"
            value = _KC_DB_USERNAME_
          },
          {
            name  = "JAVA_OPTS"
            value = _JAVA_OPTS_
          },
          {
            name  = "KC_CACHE"
            value = "ispn"
          },
          {
            name  = "KC_HOSTNAME"
            value = _keycloak_fqdn_
          },
          {
            name  = "KC_HOSTNAME_STRICT_BACKCHANNEL"
            value = "true"
          },
          {
            name  = "KC_CACHE_CONFIG_FILE"
            value = "cache-ispn-jdbc-ping.xml"
          },
        ]
        essential = true
        healthCheck = {
          command = [
            "CMD-SHELL",
            "curl -f http://localhost:${_keycloak_port_}/auth/ || exit 1",
          ]
          interval = 30
          retries  = 3
          timeout  = 5
        }
        image       = _ecr_repo_url_
        stopTimeout = 120
        logConfiguration = {
          logDriver = "awslogs"
          options = {
            awslogs-group         = aws_cloudwatch_log_group.keycloak.name
            awslogs-region        = "ap-northeast-1"
            awslogs-stream-prefix = "ecs"
          }
        }
        mountPoints = []
        name        = "keycloak"
        volumesFrom = []
      },
    ]
  )
  cpu                = _keycloak_cpu_
  task_role_arn      = aws_iam_role.ecs_task_role.arn
  execution_role_arn = aws_iam_role.execution_role.arn
  family             = _service_name_
  memory             = _keycloak_memory_
  network_mode       = "awsvpc"
  requires_compatibilities = [
    "FARGATE",
  ]
}

resource "aws_alb_target_group" "keycloak" {
  deregistration_delay          = "115"
  load_balancing_algorithm_type = "round_robin"
  name                          = _clustername_
  port                          = _keycloak_port_
  protocol                      = "HTTP"
  protocol_version              = "HTTP1"
  slow_start                    = 0
  target_type                   = "ip"
  vpc_id                        = _cluster_vpc_id_

  health_check {
    ...
  }

  stickiness {
    cookie_duration = 86400
    enabled         = false
    type            = "lb_cookie"
  }
}

resource "aws_iam_role" "execution_role" {
  name                = "ecs-execution-role"
  managed_policy_arns = ["arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"]
  assume_role_policy = jsonencode({
    "Version" : "2008-10-17",
    "Statement" : [
      {
        "Sid" : "",
        "Effect" : "Allow",
        "Principal" : {
          "Service" : "ecs-tasks.amazonaws.com"
        },
        "Action" : "sts:AssumeRole"
      }
    ]
  })
}

resource "aws_iam_role" "ecs_task_role" {
  name = "ecs-task-role"
  assume_role_policy = jsonencode({
    "Version" : "2012-10-17",
    "Statement" : [
      {
        "Sid" : "",
        "Effect" : "Allow",
        "Principal" : {
          "Service" : "ecs-tasks.amazonaws.com"
        },
        "Action" : "sts:AssumeRole"
      }
    ]
  })

  inline_policy {
    name = "SessionManagerRoleForECS"
    policy = jsonencode({
      "Version" : "2012-10-17",
      "Statement" : [
        {
          "Effect" : "Allow",
          "Action" : [
            "ssmmessages:CreateControlChannel",
            "ssmmessages:CreateDataChannel",
            "ssmmessages:OpenControlChannel",
            "ssmmessages:OpenDataChannel"
          ],
          "Resource" : "*"
        }
      ]
    })
  }
}

resource "aws_cloudwatch_log_group" "keycloak" {
  name              = "/ecs/${_keycloak_service_name_}"
  retention_in_days = 180
}
Enter fullscreen mode Exit fullscreen mode

Scaling policies can be realized by defining them as follows.

resource "aws_appautoscaling_target" "keycloak" {
  service_namespace  = "ecs"
  resource_id        = "service/${aws_ecs_cluster.keycloak.name}/${aws_ecs_service.keycloak.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  min_capacity       = _keycloak_desired_count_min_
  max_capacity       = _keycloak_desired_count_max_
  lifecycle {
    ignore_changes = [min_capacity, max_capacity]
  }
}

resource "aws_appautoscaling_policy" "keycloak_scale_out" {
  name               = "keycloak_scale_out"
  policy_type        = "StepScaling"
  service_namespace  = aws_appautoscaling_target.keycloak.service_namespace
  resource_id        = aws_appautoscaling_target.keycloak.id
  scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension

  step_scaling_policy_configuration {
    adjustment_type         = "ChangeInCapacity"
    cooldown                = 30
    metric_aggregation_type = "Maximum"

    step_adjustment {
      metric_interval_lower_bound = 0
      metric_interval_upper_bound = local.KeycloakCpuHightThreshold
      scaling_adjustment          = _keycloak_desired_count_scaleout_policy_
    }

    step_adjustment {
      metric_interval_lower_bound = local.KeycloakCpuHightThreshold
      scaling_adjustment          = _keycloak_desired_count_scaleout_policy_ * 2
    }
  }
}

resource "aws_appautoscaling_policy" "keycloak_scale_in" {
  name               = "keycloak_scale_in"
  policy_type        = "StepScaling"
  service_namespace  = aws_appautoscaling_target.keycloak.service_namespace
  resource_id        = aws_appautoscaling_target.keycloak.id
  scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension

  step_scaling_policy_configuration {
    adjustment_type         = "ChangeInCapacity"
    cooldown                = 60
    metric_aggregation_type = "Average"

    step_adjustment {
      metric_interval_upper_bound = 0
      scaling_adjustment          = -1
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

When scaling in advance by time zone, it can be realized by defining the following.

resource "aws_appautoscaling_scheduled_action" "keycloak_time_scaling_start" {
  name               = "keycloak_time_caling_start"
  service_namespace  = aws_appautoscaling_target.keycloak.service_namespace
  resource_id        = aws_appautoscaling_target.keycloak.id
  scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension
  schedule           = _keycloak_desired_count_time_scaling_start_

  scalable_target_action {
    min_capacity = _keycloak_desired_count_min_ * _keycloak_desired_count_time_scaling_scale
    max_capacity = _keycloak_desired_count_max_ * _keycloak_desired_count_time_scaling_scale
  }
}

resource "aws_appautoscaling_scheduled_action" "keycloak_time_scaling_stop" {
  name               = "keycloak_time_caling_stop"
  service_namespace  = aws_appautoscaling_target.keycloak.service_namespace
  resource_id        = aws_appautoscaling_target.keycloak.id
  scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension
  schedule           = _keycloak_desired_count_time_scaling_stop_

  scalable_target_action {
    min_capacity = _keycloak_desired_count_min_
    max_capacity = _keycloak_desired_count_max_
  }
  depends_on = [aws_appautoscaling_scheduled_action.keycloak_time_scaling_start]
}
Enter fullscreen mode Exit fullscreen mode

That's all you can control with Keycloak with Fargate.

Top comments (0)