Testing Background
GitLab is a globally popular source code management tool. In earlier versions, users could choose to use either MySQL or PostgreSQL, but since version 12.1.0, the official support for MySQL has been dropped completely.
Many of the features in the new version of GitLab are based on PostgreSQL, which is the benchmark for many products that use PostgreSQL as the underlying data store.
Imagine a scenario where a large group is divided into divisions and each division or even a small team may maintain its own GitLab, making it tricky to manage these repositories from the group level. For example.
- Versioning issues (open source and commercial versions, high and low versions)
- Fine-grained permission control
- Data backups
- Infrastructure utilization
Having a unified GitLab environment with good scalability and high availability would certainly be the best solution. But the traditional standalone PostgreSQL database does not meet the above needs, so can we consider running GitLab on a distributed database?
CockroachDB and YugabyteDB are relatively well-known new open source distributed databases that implement the PG protocol. This series of review articles is used to compare how well these two databases support GitLab, and to a certain extent reflects the compatibility with standard PostgreSQL.
In the previous article "System Initialization", we concluded that CockroachDB could not be started because it could not automatically create database schema through GitLab's setup program, while YugabyteDB could start GitLab normally with all normal initialization steps.
In this test, we first imported a standard GitLab library and the underlying data into these two databases to see if GitLab could be started normally, and then we selected a part of GitLab's core business scenarios to do a comparison test to see how compatible they are.
Test Environment
- CockroachDB
defaultdb=# select version();
version
------------------------------------------------------------------------------------
CockroachDB CCL v22.1.0 (x86_64-pc-linux-gnu, built 2022/05/23 16:27:47, go1.17.6)
(1 row)
- YugabyteDB
gitlab=# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 11.2-YB-2.13.2.0-b0 on x86_64-pc-linux-gnu, compiled by clang version 12.0.1 (https://github.com/yugabyte/llvm-project.git bdb147e675d8c87cee72cc1f87c4b82855977d94), 64-bit
(1 row)
- GitLab
GitLab information
Version: 12.1.0-ee
Revision: 1f2e6f3f6d8
Directory: /home/git/gitlab
DB Adapter: PostgreSQL
Test Scenarios
Scene Type | Scene name |
---|---|
read (9) | - Project List - Project View - Repository View - Branch List - Issue List - Issue View - Merge Request List - Merge Request View - Project Members |
write (8) | - New Project - GitLab Import - New Commit - Create Branch - Create Issue - Create Merge Request - PR Merge - Add Project Member |
Testing Process
To keep it simple, let's do the data migration directly using pg_dump.
First, export the schema and data from the standard library to the sql file.
pg_dump --host 10.3.70.132 --port 32298 --user postgres --no-owner -W gitlabhq_production > /root/gitlabhq_production.sql
1. CockroachDB Data Migration
Here the psql client is used to import the backed up sql, if an error occurs during execution it will be automatically skipped and the error message will be printed out as follows.
psql --host 10.3.70.189 --port 26258 --user root gitlab -f /root/gitlabhq_production.sql > pg_import_crdb.log
Observations from the output error messages contain the following two main categories.
Description: source SQL:
CREATE EXTENSION IF NOT EXISTS pg_trgm WITH SCHEMA public
^
Tip: You have attempted to use a feature that is not yet implemented.
See: https://go.crdb.dev/issue-v/74777/v22.1
psql:/root/gitlabhq_production.sql:30: ERROR: at or near "pg_trgm": syntax error: unimplemented: this syntax
Description: source SQL:
COMMENT ON EXTENSION pg_trgm IS 'text similarity measurement and index searching based on trigrams'
The error reported above still says that extension is not compatible with the problem.
Description: You have attempted to use a feature that is not yet implemented.
See: https://go.crdb.dev/issue-v/47420/v22.1
psql:/root/gitlabhq_production.sql:31396: ERROR: at or near ".": syntax error: unimplemented: this syntax
Tip: source SQL:
CREATE INDEX index_issues_on_description_trigram ON public.issues USING gin (description public.gin_trgm_ops)
This error is due to the fact that CockroachDB does not support operator class yet, but these two errors are related to indexes and are not expected to have much impact on DML operations, so ignore them for now.
Look at the database situation after the sql file is imported.
gitlab=# select C.relkind,count(C.relname) from pg_class C left join pg_namespace n on n.oid = C.relnamespace where n.nspname = 'public' group by C.relkind;
relkind | count
---------+-------
r | 249
i | 890
S | 231
(3 rows)
Everything works except for about 10 indexes short. At this point, point the GitLab database to this new repository, start the program and see if the page opens:
sudo -u git -H editor config/database.yml
sudo /etc/init.d/gitlab restart
source=rack-timeout id=oMeadFm1kN1 timeout=60000ms state=ready
Started GET "/users/sign_in" for 10.3.74.126 at 2022-05-31 16:19:18 +0800
Processing by SessionsController#new as HTML
Completed 200 OK in 55ms (Views: 32.3ms | ActiveRecord: 9.7ms | Elasticsearch: 0.0ms)
source=rack-timeout id=oMeadFm1kN1 timeout=60000ms service=291ms state=completed
From the logs, we can see that the login page jumps normally without errors. Then use the existing user to see if the login is successful.
2. YugabyteDB Data Migration
Import the sql file into YugabyteDB in the same way as before.
psql --host 10.3.70.189 --port 5434 --user postgres gitlab -f /root/gitlabhq_production.sql > pg_import_ygdb.log
About the implementation of more than 1 hour, the whole process did not report errors, check the database object.
gitlab=# select C.relkind,count(C.relname) from pg_class C left join pg_namespace n on n.oid = C.relnamespace where n.nspname = 'public' group by C.relkind;
relkind | count
---------+-------
S | 231
i | 903
r | 249
(3 rows)
Consistent with the standard postgreSQL Schema.
Modified the database connection and restarted GitLab, then tried to open the page to log in to the existing user, and found that the login was successful and jumped to the home page.
3. Scene Comparison
Scene Type | CockroachDB | YugabyteDB | Result |
---|---|---|---|
Project List | Both are supported | ||
Project View | Both are supported | ||
Repository View | Both are supported | ||
Branch List | Both are supported | ||
Issue List | Both are supported | ||
Issue View | Both are supported | ||
Merge Request List | Both are supported | ||
Merge Request View | Both are supported | ||
Project Members | Both are supported | ||
New Project | CockroachDB: Go to the Create Project page and return 500 exceptions, logging the error message "ActionView::Template::Error (PG::UndefinedColumn: ERROR: column "namespaces.rowid" does not exist") YugabyteDB: It keeps loading after submitting the import request, observing the log without error, and jumping to the created project page after some time. |
||
GitLab Import | CockroachDB: Can't go to the new project page, can't import project. YugabyteDB: It keeps loading after submitting the import request, and no error is reported in the log. I suspect it is a gitlab permission problem, I restarted the gitlab program with root user and imported successfully. |
||
New Commit | Both are supported | ||
Create Branch | Both are supported | ||
Create Issue | Both are supported | ||
Create Merge Request | Both are supported | ||
PR Merge | CockroachDB and YugabyteDB have the same situation. After submitting the merge request, the page continues to load, and after a period of time, the page shows an error message and cannot submit merge again, and there is no exception in the log. |
||
Add Project Member | Both are supported |
Test Conclusion
1、CockroachDB has 3 final failures in all tested scenarios, which are New Project, Import Existing GitLab Project, PR Merge.
2、YugabyteDB has 1 final failure in all the tested scenarios, namely PR Merge.
From the results of this test, YugabyteDB is more compatible with GitLab. It is necessary to further investigate whether the PR Merge error is related to the database.
The Next Step
The next step will be to analyze and locate the problems found in this test, and then try to minimally modify the Gitlab source code to see if it is compatible with the test failure scenario.
Top comments (1)
Hi, You mentioned one hour to create the schema. By default, indexes are created online (like
concurrently
in PostgreSQL) but this waits between DDL to get it synchronized between the cluster nodes. This is ok when creating an index on large table but when creating 900 indexes on empty table, this takes long in total. Bettercreate index nonconccurently
for that as in:dev.to/yugabyte/create-index-in-yu...
I'm curious about the PR Merge. The cause can be performance and then looking at the execution plan may help. YugabyteDB has an optimization for distributed Nested Loop that can be enabled by
set yb_bnl_batch_size=100
as in dev.to/yugabyte/the-best-indexes-f...Note that if there are range scan involved, you should range-shard the primary key or indexes because YugabyteDB have both hash and range sharding