(1) MNIST(Modified National Institute of Standards and Technology)(1998):
- has 70,000 handwritten digits[0~9] by 28x28 pixels each. *60,000 for train and 10,000 for test.
- is MNIST() in PyTorch.
(2) EMNIST(Extended MNIST)(2017):
- has the handwritten characters(digits[0~9] and alphabet letters[A~Z][a~z]) by 28x28 pixels each, splitted into 6 datasets(ByClass, ByMerge, Balanced, Letters, Digits and MNIST):
*Memos:
- ByClass has 814,255 characters(digits[0~9] and alphabet letters[A~Z][a~z]). *697,932 for train and 116,323 for test.
- ByMerge has 814,255 characters(digits[0~9] and alphabet letters[A~Z][a, b, d~h, n, q, r, t]). *697,932 for train and 116,323 for test.
- Balanced has 131,600 characters(digits[0~9] and alphabet letters[A~Z][a, b, d~h, n, q, r, t]). *112,800 for train and 18,800 for test.
- Letters has 145,600 alphabet letters[a~z]. *124,800 for train and 20,800 for test.
- Digits has 280,000 digits[0~9]. *240,000 for train and 40,000 for test.
- MNIST has 70,000 digits[0~9]. *60,000 for train and 10,000 for test.
- is EMNIST() in PyTorch.
(3) QMNIST(2019):
- has 120,000 handwritten digits[0~9] by 28x28 pixels each. *60,000 for train and 60,000 for test.
- is an extended MNIST. *I don't know what Q of QMNIST means.
- is QMNIST() in PyTorch.
(4) ETLCDB(Extract-Transform-Load Character Database)(2011):
- has the handwritten or machine-printed numerals, symbols, alphabet letters and Japanese characters splitted into 9 datasets(ETL-1, ETL-2, ETL-3, ETL-4, ETL-5, ETL-6, ETL-7, ETL-8 and ETL-9):
*Memos:
- ETL1 has 141,319 characters(digits[0~9], alphabet letters[A~Z], symbols[+-*/=()・,?’] and Katakana[ア~ン]).
- ETL2 has 52,796 characters(digits[0~9], alphabet letters[A~Z], symbols, Katakana letters[ア~ン], Hiragana letters[あ~ん] and Kanji letters).
- ETL3 has 9,600 characters(digits[0~9], alphabet letters[A~Z] and symbols[¥+-*/=()・,_▾]).
- ETL4 has 6,120 Hiragana letters[あ~ん].
- ETL5 has 10,608 Katakana letters[ア~ン].
- ETL6 has 52,796 characters(digits[0~9], alphabet letters[A~Z][a~z], symbols and Katakana letters[ア~ン]).
- ETL7(ETL7L and ETL7S) has 16,800 characters(Hiragana letters[あ~ん], Dakuten[゛] and Handakuten[゜]).
- ETL8(ETL8G and ETL8B2) has 152,960 characters(Hiragana letters[あ~ん] and Kanji letters).
- ETL9(ETL9G and ETL9B) has 607,200 characters(Hiragana letters[あ~ん] and JIS first level Kanji letters).
- isn't in PyTorch so we need to download it from etlcdb.
(5) Kuzushiji(2018):
- has the cursive style Japanese characters splitted into 3 datasets(Kuzushiji-MNIST, Kuzushiji-49 and Kuzushiji-Kanji):
*Memos:
- Kuzushiji-MNIST has the balanced 70,000 Hiragana letters by 28x28 pixels each.
- Kuzushiji-49 has the imbalanced 270,912 characters(Hiragana characters and Hiragana iteration marks) by 28x28 pixels each.
- Kuzushiji-Kanji has the imbalanced 140,424 Kanji characters by 64x64 pixels each.
- is KMNIST() in PyTorch but it only has Kuzushiji-MNIST so we need to download Kuzushiji-49 and Kuzushiji-Kanji from etlcdb.
(6) Moving MNIST(2015):
- has 10,000 videos by 64x64 pixels each. *Each video has 20 frames with 2 moving digits.
- is MovingMNIST() in PyTorch.
Top comments (0)