DEV Community

dcopensource
dcopensource

Posted on

Compatibility of GitLab on CockroachDB and YugabyteDB (II) - Read and Write Scenario Testing

Testing Background

GitLab is a globally popular source code management tool. In earlier versions, users could choose to use either MySQL or PostgreSQL, but since version 12.1.0, the official support for MySQL has been dropped completely.

Many of the features in the new version of GitLab are based on PostgreSQL, which is the benchmark for many products that use PostgreSQL as the underlying data store.

Imagine a scenario where a large group is divided into divisions and each division or even a small team may maintain its own GitLab, making it tricky to manage these repositories from the group level. For example.

  • Versioning issues (open source and commercial versions, high and low versions)
  • Fine-grained permission control
  • Data backups
  • Infrastructure utilization

Having a unified GitLab environment with good scalability and high availability would certainly be the best solution. But the traditional standalone PostgreSQL database does not meet the above needs, so can we consider running GitLab on a distributed database?

CockroachDB and YugabyteDB are relatively well-known new open source distributed databases that implement the PG protocol. This series of review articles is used to compare how well these two databases support GitLab, and to a certain extent reflects the compatibility with standard PostgreSQL.

In the previous article "System Initialization", we concluded that CockroachDB could not be started because it could not automatically create database schema through GitLab's setup program, while YugabyteDB could start GitLab normally with all normal initialization steps.

In this test, we first imported a standard GitLab library and the underlying data into these two databases to see if GitLab could be started normally, and then we selected a part of GitLab's core business scenarios to do a comparison test to see how compatible they are.

Test Environment

  • CockroachDB
  defaultdb=# select version();
                                        version
  ------------------------------------------------------------------------------------
   CockroachDB CCL v22.1.0 (x86_64-pc-linux-gnu, built 2022/05/23 16:27:47, go1.17.6)
  (1 row)
Enter fullscreen mode Exit fullscreen mode
  • YugabyteDB
  gitlab=# select version();
                                                                                           version
  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   PostgreSQL 11.2-YB-2.13.2.0-b0 on x86_64-pc-linux-gnu, compiled by clang version 12.0.1 (https://github.com/yugabyte/llvm-project.git bdb147e675d8c87cee72cc1f87c4b82855977d94), 64-bit
  (1 row)
Enter fullscreen mode Exit fullscreen mode
  • GitLab
  GitLab information
  Version:        12.1.0-ee
  Revision:       1f2e6f3f6d8
  Directory:      /home/git/gitlab
  DB Adapter:     PostgreSQL
Enter fullscreen mode Exit fullscreen mode

Test Scenarios

Scene Type Scene name
read (9) - Project List
- Project View
- Repository View
- Branch List
- Issue List
- Issue View
- Merge Request List
- Merge Request View
- Project Members
write (8) - New Project
- GitLab Import
- New Commit
- Create Branch
- Create Issue
- Create Merge Request
- PR Merge
- Add Project Member

Testing Process

To keep it simple, let's do the data migration directly using pg_dump.

First, export the schema and data from the standard library to the sql file.

pg_dump --host 10.3.70.132 --port 32298 --user postgres --no-owner -W gitlabhq_production > /root/gitlabhq_production.sql
Enter fullscreen mode Exit fullscreen mode

1. CockroachDB Data Migration

Here the psql client is used to import the backed up sql, if an error occurs during execution it will be automatically skipped and the error message will be printed out as follows.

psql --host 10.3.70.189 --port 26258 --user root gitlab -f /root/gitlabhq_production.sql > pg_import_crdb.log
Enter fullscreen mode Exit fullscreen mode

Observations from the output error messages contain the following two main categories.

Description:  source SQL:
CREATE EXTENSION IF NOT EXISTS pg_trgm WITH SCHEMA public
                                            ^
Tip:  You have attempted to use a feature that is not yet implemented.
See: https://go.crdb.dev/issue-v/74777/v22.1
psql:/root/gitlabhq_production.sql:30: ERROR:  at or near "pg_trgm": syntax error: unimplemented: this syntax
Description:  source SQL:
COMMENT ON EXTENSION pg_trgm IS 'text similarity measurement and index searching based on trigrams'
Enter fullscreen mode Exit fullscreen mode

The error reported above still says that extension is not compatible with the problem.

Description:  You have attempted to use a feature that is not yet implemented.
See: https://go.crdb.dev/issue-v/47420/v22.1
psql:/root/gitlabhq_production.sql:31396: ERROR:  at or near ".": syntax error: unimplemented: this syntax
Tip:  source SQL:
CREATE INDEX index_issues_on_description_trigram ON public.issues USING gin (description public.gin_trgm_ops)
Enter fullscreen mode Exit fullscreen mode

This error is due to the fact that CockroachDB does not support operator class yet, but these two errors are related to indexes and are not expected to have much impact on DML operations, so ignore them for now.

Look at the database situation after the sql file is imported.

gitlab=# select C.relkind,count(C.relname) from pg_class C left join pg_namespace n on n.oid = C.relnamespace where n.nspname = 'public' group by C.relkind;
 relkind | count
---------+-------
 r       |   249
 i       |   890
 S       |   231
(3 rows)
Enter fullscreen mode Exit fullscreen mode

Everything works except for about 10 indexes short. At this point, point the GitLab database to this new repository, start the program and see if the page opens:

sudo -u git -H editor config/database.yml
sudo /etc/init.d/gitlab restart


source=rack-timeout id=oMeadFm1kN1 timeout=60000ms state=ready
Started GET "/users/sign_in" for 10.3.74.126 at 2022-05-31 16:19:18 +0800
Processing by SessionsController#new as HTML
Completed 200 OK in 55ms (Views: 32.3ms | ActiveRecord: 9.7ms | Elasticsearch: 0.0ms)
source=rack-timeout id=oMeadFm1kN1 timeout=60000ms service=291ms state=completed
Enter fullscreen mode Exit fullscreen mode

From the logs, we can see that the login page jumps normally without errors. Then use the existing user to see if the login is successful.

Image description

2. YugabyteDB Data Migration

Import the sql file into YugabyteDB in the same way as before.

psql --host 10.3.70.189 --port 5434 --user postgres gitlab -f /root/gitlabhq_production.sql > pg_import_ygdb.log
Enter fullscreen mode Exit fullscreen mode

About the implementation of more than 1 hour, the whole process did not report errors, check the database object.

gitlab=# select C.relkind,count(C.relname) from pg_class C left join pg_namespace n on n.oid = C.relnamespace where n.nspname = 'public' group by C.relkind;
 relkind | count
---------+-------
 S       |   231
 i       |   903
 r       |   249
(3 rows)
Enter fullscreen mode Exit fullscreen mode

Consistent with the standard postgreSQL Schema.

Modified the database connection and restarted GitLab, then tried to open the page to log in to the existing user, and found that the login was successful and jumped to the home page.

Image description

3. Scene Comparison

Scene Type CockroachDB YugabyteDB Result
Project List Both are supported
Project View Both are supported
Repository View Both are supported
Branch List Both are supported
Issue List Both are supported
Issue View Both are supported
Merge Request List Both are supported
Merge Request View Both are supported
Project Members Both are supported
New Project CockroachDB: Go to the Create Project page and return 500 exceptions, logging the error message "ActionView::Template::Error (PG::UndefinedColumn: ERROR: column "namespaces.rowid" does not exist")

YugabyteDB: It keeps loading after submitting the import request, observing the log without error, and jumping to the created project page after some time.
GitLab Import CockroachDB: Can't go to the new project page, can't import project.

YugabyteDB: It keeps loading after submitting the import request, and no error is reported in the log. I suspect it is a gitlab permission problem, I restarted the gitlab program with root user and imported successfully.
New Commit Both are supported
Create Branch Both are supported
Create Issue Both are supported
Create Merge Request Both are supported
PR Merge CockroachDB and YugabyteDB have the same situation.

After submitting the merge request, the page continues to load, and after a period of time, the page shows an error message and cannot submit merge again, and there is no exception in the log.
Add Project Member Both are supported

Test Conclusion

1、CockroachDB has 3 final failures in all tested scenarios, which are New Project, Import Existing GitLab Project, PR Merge.

2、YugabyteDB has 1 final failure in all the tested scenarios, namely PR Merge.

From the results of this test, YugabyteDB is more compatible with GitLab. It is necessary to further investigate whether the PR Merge error is related to the database.

The Next Step

The next step will be to analyze and locate the problems found in this test, and then try to minimally modify the Gitlab source code to see if it is compatible with the test failure scenario.

Top comments (1)

Collapse
 
franckpachot profile image
Franck Pachot

Hi, You mentioned one hour to create the schema. By default, indexes are created online (like concurrently in PostgreSQL) but this waits between DDL to get it synchronized between the cluster nodes. This is ok when creating an index on large table but when creating 900 indexes on empty table, this takes long in total. Better create index nonconccurently for that as in:
dev.to/yugabyte/create-index-in-yu...
I'm curious about the PR Merge. The cause can be performance and then looking at the execution plan may help. YugabyteDB has an optimization for distributed Nested Loop that can be enabled by set yb_bnl_batch_size=100 as in dev.to/yugabyte/the-best-indexes-f...
Note that if there are range scan involved, you should range-shard the primary key or indexes because YugabyteDB have both hash and range sharding