Testing Background
GitLab is a globally popular source code management tool. In earlier versions, users could choose to use either MySQL or PostgreSQL, but since version 12.1.0, the official support for MySQL has been dropped completely.
Many of the features in the new version of GitLab are based on PostgreSQL, which is the benchmark for many products that use PostgreSQL as the underlying data store.
Imagine a scenario where a large group is divided into divisions and each division or even a small team may maintain its own GitLab, making it tricky to manage these repositories from the group level. For example.
- Versioning issues (open source and commercial versions, high and low versions)
- Fine-grained permission control
- Data backups
- Infrastructure utilization
Having a unified GitLab environment with good scalability and high availability would certainly be the best solution. But the traditional standalone PostgreSQL database does not meet the above needs, so can we consider running GitLab on a distributed database?
CockroachDB and YugabyteDB are relatively well-known new open source distributed databases that implement the PG protocol, and according to the descriptions on their respective official websites.
CockroachDB supports the PostgreSQL wire protocol and the majority of PostgreSQL syntax. This means that existing applications built on PostgreSQL can often be migrated to CockroachDB without changing application code. (reference)
YugabyteDB is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features. (reference)
CockroachDB says it supports most PG syntax, and YugabyteDB says it supports all PG features. This series of review articles is used to compare how well these two databases support GitLab, and to a certain extent reflects the compatibility with standard PostgreSQL.
Test Environment
- CockroachDB
defaultdb=# select version();
version
-----------------------------------------------------------------------------------------
CockroachDB CCL v21.2.2 (x86_64-unknown-linux-gnu, built 2021/12/01 14:35:45, go1.16.6)
(1 row)
- YugabyteDB
postgres=# select version();
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 11.2-YB-2.9.1.0-b0 on x86_64-pc-linux-gnu, compiled by gcc (Homebrew gcc 5.5.0_4) 5.5.0, 64-bit
(1 row)
- GitLab
GitLab information
Version: 12.1.0-ee
Revision: 1f2e6f3f6d8
Directory: /home/git/gitlab
DB Adapter: PostgreSQL
GitLab deployed with standard PostgreSQL contains the following database schema:
gitlab_production=# select C.relkind,count(C.relname) from pg_class C left join pg_namespace n on n.oid = C.relnamespace where n.nspname = 'public' group by C.relkind;
relkind | count
---------+-------
r | 249
i | 903
S | 231
(3 rows)
CockroachDB Startup Process
1. Database initialization
Execute the GitLab setup program to generate the required database schema.
dc@dc-virtual-machine:/home/git/gitlab$ sudo -u git -H bundle exec rake gitlab:setup RAILS_ENV=production
This will create the necessary database tables and seed the database.
You will lose any previous data stored in the database.
Do you want to continue (yes/no)? yes
Dropped database 'gitlab'
Created database 'gitlab'
-- enable_extension("pg_trgm")
rake aborted!
ActiveRecord::StatementInvalid: PG::FeatureNotSupported: ERROR: unimplemented: extension "pg_trgm" is not yet supported
HINT: You have attempted to use a feature that is not yet implemented.
See: https://go.crdb.dev/issue-v/51137/v21.2
: CREATE EXTENSION IF NOT EXISTS "pg_trgm"
/home/git/gitlab/config/initializers/peek.rb:18:in `async_exec_params'
/home/git/gitlab/config/initializers/peek.rb:18:in `exec_params'
/home/git/gitlab/vendor/bundle/ruby/2.6.0/gems/activerecord-5.2.3/lib/active_record/connection_adapters/postgresql_adapter.rb:611:in `block (2 levels) in exec_no_cache'
....
As you can see from the output above, GitLab initialization relies on PostgreSQL's Extension feature, but unfortunately CockroachDB does not currently support it, and fails in the first step, when no objects are created in the database.
gitlab=# select C.relkind,count(C.relname) from pg_class C left join pg_namespace n on n.oid = C.relnamespace where n.nspname = 'public' group by C.relkind;
Empty set
2. Visit GitLab
When we visit the main GitLab page it will return a 502 error message.
From the logs, it is because the SQL execution could not find the target table when it reported the error.
ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation "geo_nodes" does not exist
: SELECT a.attname, format_type(a.atttypid, a.atttypmod),
pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
c.collname, col_description(a.attrelid, a.attnum) AS comment
FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
LEFT JOIN pg_type t ON a.atttypid = t.oid
LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
WHERE a.attrelid = '"geo_nodes"'::regclass
AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum
3. Update database version
Considering that the current version of CockroachDB is not the latest version, is it possible that the latest version already supports extension function, try to upgrade the version to latest-v22.1:
defaultdb=# select version();
version
------------------------------------------------------------------------------------
CockroachDB CCL v22.1.0 (x86_64-pc-linux-gnu, built 2022/05/23 16:27:47, go1.17.6)
(1 row)
Executing setup again to create the database, I still find the same problem "ActiveRecord::StatementInvalid: PG::FeatureNotSupported: ERROR: unimplemented: extension "pg_trgm " is not yet supported", indicating that the extension feature is not supported in the new version either.
YugabyteDB Startup Process
1. Database initialization
Modify the GitLab configuration file to switch the database connection to YugabyteDB and initialize a new repository in the same way.
dc@dc-virtual-machine:/home/git/gitlab$ sudo -u git -H bundle exec rake gitlab:setup RAILS_ENV=production
This will create the necessary database tables and seed the database.
You will lose any previous data stored in the database.
Do you want to continue (yes/no)? yes
Dropped database 'gitlab'
Created database 'gitlab'
-- enable_extension("pg_trgm")
-> 2.5496s
-- enable_extension("plpgsql")
-> 0.1143s
-- create_table("abuse_reports", {:id=>:serial, :force=>:cascade})
-> 0.3709s
-- create_table("appearances", {:id=>:serial, :force=>:cascade})
-> 0.3022s
...
...
-- create_table("issue_tracker_data", {:force=>:cascade})
-> 3.7627s
-- create_table("issues", {:id=>:serial, :force=>:cascade})
rake aborted!
ActiveRecord::StatementInvalid: PG::InternalError: ERROR: index method "ybgin" not supported yet
HINT: See https://github.com/YugaByte/yugabyte-db/issues/1337. Click '+' on the description to raise its priority
: CREATE INDEX "index_issues_on_description_trigram" ON "issues" USING gin ("description" gin_trgm_ops)
/home/git/gitlab/vendor/bundle/ruby/2.6.0/gems/peek-pg-1.3.0/lib/peek/views/pg.rb:17:in `async_exec'
/home/git/gitlab/vendor/bundle/ruby/2.6.0/gems/peek-pg-1.3.0/lib/peek/views/pg.rb:17:in `async_exec'
From the above output information, we can see that at first setup runs normally and can create extension and table normally, but after about 20 minutes, it fails to create index, because YugabyteDB can't recognize "gin" type index, and the type instead is "ybgin" instead.
Look at the objects generated by the database up to this point:
gitlab=# select C.relkind,count(C.relname) from pg_class C left join pg_namespace n on n.oid = C.relnamespace where n.nspname = 'public' group by C.relkind;
relkind | count
---------+-------
S | 113
i | 391
r | 117
(3 rows)
The situation looks a little better than CockroachDB, but still much worse than the full database schema.
2. Visit GitLab
At this point, the main GitLab page is still inaccessible, and from the logs, I found that the reason for the error is that the target table is missing.
source=rack-timeout id=7gatOugcqB8 timeout=60000ms state=ready
Started GET "/" for 10.3.74.126 at 2022-05-27 16:05:31 +0800
Processing by RootController#index as HTML
Completed 500 Internal Server Error in 78ms (ActiveRecord: 58.8ms | Elasticsearch: 0.0ms)
ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR: relation "projects" does not exist
LINE 8: WHERE a.attrelid = '"projects"'::regclass
^
: SELECT a.attname, format_type(a.atttypid, a.atttypmod),
pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
c.collname, col_description(a.attrelid, a.attnum) AS comment
FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
LEFT JOIN pg_type t ON a.atttypid = t.oid
LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
WHERE a.attrelid = '"projects"'::regclass
AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum
):
3. Update database version
Similarly, we tried to upgrade YugabytesDB to the latest version to see if Gin index compatibility has been completed:
postgres=# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 11.2-YB-2.13.2.0-b0 on x86_64-pc-linux-gnu, compiled by clang version 12.0.1 (https://github.com/yugabyte/llvm-project.git bdb147e675d8c87cee72cc1f87c4b82855977d94), 64-bit
(1 row)
Execute the setup program again, the process is relatively smooth, about 30 minutes later the program exits normally without errors. At this point we look at the objects in the database.
gitlab=# select C.relkind,count(C.relname) from pg_class C left join pg_namespace n on n.oid = C.relnamespace where n.nspname = 'public' group by C.relkind;
relkind | count
---------+-------
S | 231
i | 903
r | 249
(3 rows)
You can see that the comparison with the standard PostgreSQL library is exactly the same. Opening a browser to visit the GitLab homepage automatically jumps to the login page, and checking the logs without error reporting.
Fill out the user registration form and submit, then the new user will be registered successfully and automatically jump to the main page of GitLab.
Initially, GitLab functionality is not affected by switching databases. More detailed tests will be presented to you in the next issue.
Test Conclusion
1、CockroachDB v21.2 does not support Extension function, so GitLab cannot initialize the database, and finally fails to start, but the problem still exists after updating to the latest version v22.1.
2、YugabyteDB v2.9 does not support Gin Index (Generalized inverted indexes), resulting in an error after creating a part of the table, which also can not be started, but after updating to the latest version v2.13, the problem is solved, and you can access GitLab page and register users normally.
3、YugabyteDB supports PostgreSQL Extension, CockroachDB does not.
The Next Step
Next we will try to bypass the GitLab database generation step and import a standard GitLab library with data into CockroachDB and YugabyteDB, select some frequently used read and write scenarios, and then compare their compatibility performance.
Top comments (2)
CockroachDB will have
pg_trgm
support in 22.2, due out in another month or so. There are betas up now and an RC should be out in a couple of weeks!github.com/cockroachdb/cockroach/p...
Yes YugabyteDB has GIN indexes as of 2.11 and pg_trgm (supported from the get-go because YugabyteDB query layer is PostgreSQL compatible) can use it with
gin_trgm_ops
.