- A character set that defines different character encodings, like UTF-8, UTF-16, and UTF-32
- encodes common ASCII characters using 8-bits by assigning every character a unique number called a code point
- vs. utf8mb4: a preferred alternative since it stores a maximum of four bytes per code point instead of 3 (utf8 is an alias of utf8mb3)—meaning utf-8 might not support some characters from other languages and symbols
An example of a sql file with the utf8mb4 charset
CREATE TABLE IF NOT EXISTS dbName.tableName ( `id` int NOT NULL AUTO_INCREMENT, `email` varchar(40) COLLATE utf8mb4_unicode_ci NOT NULL, `username` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL, `password` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL, PRIMARY KEY (`id`), ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
- Collation provides the sorting rules, case, and accent sensitivity properties for the data
- You specify the COLLATE where ci stands for 'case insensitive'