This article is about migrating Keycloak on Fargate, but it also describes how to scale out Fargate according to load and how to scale out according to the time of day, controlling scale out on Fargate It is also a reference for those who want to.
What is Keycloak
Open source identity and access management software for single sign-on and API access authentication and authorization control.
What is Fargate
It is an AWS management service that can execute containers. There are also ECS on EC2 that executes containers on EC2 and AWS EKS, which is a Kubernetes management service, but AWS Fargate is recommended for environments that can be executed simply without maintaining the container execution infrastructure.
About this story
This story describes the story of migrating Keycloak services built on ECS on EC2 to Fargate and migrating Keycloak from v6 to v19.
The story of migrating Keycloak's services that were built on ECS on EC2 to Fargate
There are three advantages of migrating ECS on EC2 to Fargate, and we migrated it before the version upgrade described below.
- No need to maintain container infrastructure
- Minimizes the cost of scaling out
- Faster startup time during scale-out, making it easier to follow spikes
The story of migrating Keycloak from v6 to v19
Keycloak basically only needs to be started with a new version because it has a function to automatically migrate at startup, but there are some incompatibility problems due to DB constraints. Therefore, each time an error occurs, you need to investigate how to respond and adjust for inconsistencies.
In the following example, SELECT REALM_ID, NAME, COUNT() FROM KEYCLOAK_GROUP WHERE PARENT_GROUP IS NULL GROUP BY REALM_ID, NAME HAVING COUNT() > 1; can detect duplicate group names.
ERROR [org.keycloak.connections.jpa.updater.liquibase.conn.DefaultLiquibaseConnectionProvider] (ServerService Thread Pool -- 67) Change Set META-INF/jpa-changelog-9.0.1.xml::9.0.1-KEYCLOAK-12579-add-not-null-constraint::keycloak failed. Error: Duplicate entry 'school- -ks' for key 'SIBLING_NAMES' [Failed SQL: UPDATE authdbdev.KEYCLOAK_GROUP SET PARENT_GROUP = ' ' WHERE PARENT_GROUP IS NULL]
FATAL [org.keycloak.services] (ServerService Thread Pool -- 67) java.lang.RuntimeException: Failed to update database
It is also important to note that the environment variables to be set have changed due to the migration from WildFly to Quarks.
WildFly | Quarks |
---|---|
DB_DATABASE | KC_DB_URL_DATABASE |
DB_HOST | KC_DB_URL_HOST |
DB_PASSWORD | KC_DB_PASSWORD |
DB_USER | KC_DB_USERNAME |
When configuring a multi-node cluster with Infinispan defined in standalone-ha.xml in Wildfly, the following environment variables must be set in Quarks after v17.
KC_CACHE="ispn"
KC_CACHE_CONFIG_FILE="cache-ispn-jdbc-ping.xml"
The cache-ispn-jdbc-ping.xml performs the following description (when MySQL is selected for RDS): owners sets the number of nodes on which the cache is kept.
If you are scaling out while running with at least two nodes to maintain availability, you must determine the number of nodes while considering the number of nodes that will degenerate simultaneously when scaling in. (Since you cannot control the nodes when scaling in, you need to devise a way to prevent the cache from being lost by deleting the nodes that hold the cache all at once.)
Also, realms and users max-count affect performance. If you keep a session that exceeds max-count, communication with the DB will occur, so it is better to increase max-count as much as memory allows.
However, when starting in Duplicated mode instead of Replicated mode, it is necessary to thoroughly test with a load test tool using Distributed Load Testing on AWS, etc. so that the cache is rebalanced when scaling in, resulting in out-of-memory. For details of the parameters, see Configuring Infinispan caches and urn:infinispan:config:11.0.
<?xml version="1.0" encoding="UTF-8"?>
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:11.0 http://www.infinispan.org/schemas/infinispan-config-11.0.xsd"
xmlns="urn:infinispan:config:11.0">
<jgroups>
<stack name="jdbc-ping-tcp" extends="tcp">
<JDBC_PING connection_driver="com.mysql.cj.jdbc.Driver"
connection_username="${env.KC_DB_USERNAME}" connection_password="${env.KC_DB_PASSWORD}"
connection_url="${env.KC_DB_URL}"
initialize_sql="CREATE TABLE IF NOT EXISTS JGROUPSPING (own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, ping_data VARBINARY(255), constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name));"
info_writer_sleep_time="500"
remove_all_data_on_view_change="true"
stack.combine="REPLACE"
stack.position="MPING" />
</stack>
</jgroups>
<cache-container name="keycloak">
<transport lock-timeout="60000" stack="jdbc-ping-tcp"/>
<local-cache name="realms">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<memory max-count="10000"/>
</local-cache>
<local-cache name="users">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<memory max-count="10000"/>
</local-cache>
<distributed-cache name="sessions" owners="3">
<expiration lifespan="-1"/>
</distributed-cache>
<distributed-cache name="authenticationSessions" owners="3">
<expiration lifespan="-1"/>
</distributed-cache>
<distributed-cache name="offlineSessions" owners="3">
<expiration lifespan="-1"/>
</distributed-cache>
<distributed-cache name="clientSessions" owners="3">
<expiration lifespan="-1"/>
</distributed-cache>
<distributed-cache name="offlineClientSessions" owners="3">
<expiration lifespan="-1"/>
</distributed-cache>
<distributed-cache name="loginFailures" owners="3">
<expiration lifespan="-1"/>
</distributed-cache>
<local-cache name="authorization">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<memory max-count="10000"/>
</local-cache>
<replicated-cache name="work">
<expiration lifespan="-1"/>
</replicated-cache>
<local-cache name="keys">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<expiration max-idle="3600000"/>
<memory max-count="1000"/>
</local-cache>
<distributed-cache name="actionTokens" owners="3">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<expiration max-idle="-1" lifespan="-1" interval="300000"/>
<memory max-count="-1"/>
</distributed-cache>
</cache-container>
</infinispan>
The Dockerfile is as follows:
FROM quay.io/keycloak/keycloak:19.0.3
COPY conf/keycloak.conf /opt/keycloak/conf/keycloak.conf
COPY conf/cache-ispn-jdbc-ping.xml /opt/keycloak/conf/cache-ispn-jdbc-ping.xml
RUN /opt/keycloak/bin/kc.sh build --cache-config-file=cache-ispn-jdbc-ping.xml
WORKDIR /opt/keycloak
ENTRYPOINT [ "/opt/keycloak/bin/kc.sh" ]
The definition of ECS in Terraform is as follows: Please understand that the part marked with _xxxx_
is a constant passed in a variable.
resource "aws_ecs_cluster" "keycloak" {
name = "clustername"
setting {
name = "containerInsights"
value = "enabled"
}
}
resource "aws_ecs_service" "keycloak" {
cluster = aws_ecs_cluster.keycloak.id
deployment_maximum_percent = 200
deployment_minimum_healthy_percent = 100
desired_count = _keycloak_desired_count_min_
enable_ecs_managed_tags = false
enable_execute_command = true
health_check_grace_period_seconds = 180
name = _servicename_
platform_version = "LATEST"
propagate_tags = "TASK_DEFINITION"
scheduling_strategy = "REPLICA"
task_definition = aws_ecs_task_definition.keycloak.arn
capacity_provider_strategy {
capacity_provider = "FARGATE"
base = 2
weight = 1 // After the third unit, it will be started with FARGATE at a rate of 25%
}
capacity_provider_strategy {
capacity_provider = "FARGATE_SPOT"
base = 0
weight = 3 // After the third unit, it starts with FARGATE_SPOT at a rate of 75%
}
deployment_circuit_breaker {
enable = false
rollback = false
}
deployment_controller {
type = "ECS"
}
load_balancer {
container_name = "keycloak"
container_port = aws_alb_target_group.keycloak.port
target_group_arn = aws_alb_target_group.keycloak.arn
}
network_configuration {
assign_public_ip = true
security_groups = [
aws_security_group.keycloak.id
]
subnets = _cluster_subnets_
}
timeouts {}
lifecycle {
ignore_changes = [desired_count]
}
}
resource "aws_ecs_task_definition" "keycloak" {
container_definitions = jsonencode(
[
{
cpu = 0
command = ["start --optimized"]
disableNetworking = false
portMappings = [
{
containerPort = aws_alb_target_group.auth.port
hostPort = aws_alb_target_group.auth.port
protocol = "tcp"
}
]
environment = [
{
name = "KC_DB_URL_DATABASE"
value = _KC_DB_URL_DATABASE_
},
{
name = "KC_DB_URL_HOST"
value = _KC_DB_URL_HOST_
},
{
name = "KC_DB_URL"
value = _KC_DB_URL_
},
{
name = "KC_DB_PASSWORD"
value = _KC_DB_PASSWORD_
},
{
name = "KC_DB_USERNAME"
value = _KC_DB_USERNAME_
},
{
name = "JAVA_OPTS"
value = _JAVA_OPTS_
},
{
name = "KC_CACHE"
value = "ispn"
},
{
name = "KC_HOSTNAME"
value = _keycloak_fqdn_
},
{
name = "KC_HOSTNAME_STRICT_BACKCHANNEL"
value = "true"
},
{
name = "KC_CACHE_CONFIG_FILE"
value = "cache-ispn-jdbc-ping.xml"
},
]
essential = true
healthCheck = {
command = [
"CMD-SHELL",
"curl -f http://localhost:${_keycloak_port_}/auth/ || exit 1",
]
interval = 30
retries = 3
timeout = 5
}
image = _ecr_repo_url_
stopTimeout = 120
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-group = aws_cloudwatch_log_group.keycloak.name
awslogs-region = "ap-northeast-1"
awslogs-stream-prefix = "ecs"
}
}
mountPoints = []
name = "keycloak"
volumesFrom = []
},
]
)
cpu = _keycloak_cpu_
task_role_arn = aws_iam_role.ecs_task_role.arn
execution_role_arn = aws_iam_role.execution_role.arn
family = _service_name_
memory = _keycloak_memory_
network_mode = "awsvpc"
requires_compatibilities = [
"FARGATE",
]
}
resource "aws_alb_target_group" "keycloak" {
deregistration_delay = "115"
load_balancing_algorithm_type = "round_robin"
name = _clustername_
port = _keycloak_port_
protocol = "HTTP"
protocol_version = "HTTP1"
slow_start = 0
target_type = "ip"
vpc_id = _cluster_vpc_id_
health_check {
...
}
stickiness {
cookie_duration = 86400
enabled = false
type = "lb_cookie"
}
}
resource "aws_iam_role" "execution_role" {
name = "ecs-execution-role"
managed_policy_arns = ["arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"]
assume_role_policy = jsonencode({
"Version" : "2008-10-17",
"Statement" : [
{
"Sid" : "",
"Effect" : "Allow",
"Principal" : {
"Service" : "ecs-tasks.amazonaws.com"
},
"Action" : "sts:AssumeRole"
}
]
})
}
resource "aws_iam_role" "ecs_task_role" {
name = "ecs-task-role"
assume_role_policy = jsonencode({
"Version" : "2012-10-17",
"Statement" : [
{
"Sid" : "",
"Effect" : "Allow",
"Principal" : {
"Service" : "ecs-tasks.amazonaws.com"
},
"Action" : "sts:AssumeRole"
}
]
})
inline_policy {
name = "SessionManagerRoleForECS"
policy = jsonencode({
"Version" : "2012-10-17",
"Statement" : [
{
"Effect" : "Allow",
"Action" : [
"ssmmessages:CreateControlChannel",
"ssmmessages:CreateDataChannel",
"ssmmessages:OpenControlChannel",
"ssmmessages:OpenDataChannel"
],
"Resource" : "*"
}
]
})
}
}
resource "aws_cloudwatch_log_group" "keycloak" {
name = "/ecs/${_keycloak_service_name_}"
retention_in_days = 180
}
Scaling policies can be realized by defining them as follows.
resource "aws_appautoscaling_target" "keycloak" {
service_namespace = "ecs"
resource_id = "service/${aws_ecs_cluster.keycloak.name}/${aws_ecs_service.keycloak.name}"
scalable_dimension = "ecs:service:DesiredCount"
min_capacity = _keycloak_desired_count_min_
max_capacity = _keycloak_desired_count_max_
lifecycle {
ignore_changes = [min_capacity, max_capacity]
}
}
resource "aws_appautoscaling_policy" "keycloak_scale_out" {
name = "keycloak_scale_out"
policy_type = "StepScaling"
service_namespace = aws_appautoscaling_target.keycloak.service_namespace
resource_id = aws_appautoscaling_target.keycloak.id
scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension
step_scaling_policy_configuration {
adjustment_type = "ChangeInCapacity"
cooldown = 30
metric_aggregation_type = "Maximum"
step_adjustment {
metric_interval_lower_bound = 0
metric_interval_upper_bound = local.KeycloakCpuHightThreshold
scaling_adjustment = _keycloak_desired_count_scaleout_policy_
}
step_adjustment {
metric_interval_lower_bound = local.KeycloakCpuHightThreshold
scaling_adjustment = _keycloak_desired_count_scaleout_policy_ * 2
}
}
}
resource "aws_appautoscaling_policy" "keycloak_scale_in" {
name = "keycloak_scale_in"
policy_type = "StepScaling"
service_namespace = aws_appautoscaling_target.keycloak.service_namespace
resource_id = aws_appautoscaling_target.keycloak.id
scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension
step_scaling_policy_configuration {
adjustment_type = "ChangeInCapacity"
cooldown = 60
metric_aggregation_type = "Average"
step_adjustment {
metric_interval_upper_bound = 0
scaling_adjustment = -1
}
}
}
When scaling in advance by time zone, it can be realized by defining the following.
resource "aws_appautoscaling_scheduled_action" "keycloak_time_scaling_start" {
name = "keycloak_time_caling_start"
service_namespace = aws_appautoscaling_target.keycloak.service_namespace
resource_id = aws_appautoscaling_target.keycloak.id
scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension
schedule = _keycloak_desired_count_time_scaling_start_
scalable_target_action {
min_capacity = _keycloak_desired_count_min_ * _keycloak_desired_count_time_scaling_scale
max_capacity = _keycloak_desired_count_max_ * _keycloak_desired_count_time_scaling_scale
}
}
resource "aws_appautoscaling_scheduled_action" "keycloak_time_scaling_stop" {
name = "keycloak_time_caling_stop"
service_namespace = aws_appautoscaling_target.keycloak.service_namespace
resource_id = aws_appautoscaling_target.keycloak.id
scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension
schedule = _keycloak_desired_count_time_scaling_stop_
scalable_target_action {
min_capacity = _keycloak_desired_count_min_
max_capacity = _keycloak_desired_count_max_
}
depends_on = [aws_appautoscaling_scheduled_action.keycloak_time_scaling_start]
}
That's all you can control with Keycloak with Fargate.
Top comments (0)