DEV Community

Vee Satayamas
Vee Satayamas

Posted on

Reading lines using Flexi-streams on a Zstd stream is not fast.

I use Flexi-streams to read 1,000 lines from a ZSTD archive and find the average length of lines. My program looks like the one below.

(defun average-1000-etipitaka-flexi-streams ()                                                                                                                                             
  (with-open-file (f #P"etipitaka.txt.zst" :element-type '(unsigned-byte 8))                                                                                 
    (zstd:with-decompressing-stream (zs f)                                                                                                                   
      (let ((s (utf8-input-stream:make-utf8-input-stream zs)))                                                                                               
        (loop for line = (read-line s nil nil)                                                                                                               
              until (null line)                                                                                                                              
              count 1 into l                                                                                                                                 
              sum (length line) into c                                                                                                                       
              when (> l 1000) do (return (float (/ c l)))                                                                                                    
              finally (return (float (/ c l))))))))    
Enter fullscreen mode Exit fullscreen mode

The text in the file etipitaka.txt.zst looks like below:

สิงสถิต เขตเมืองเวรัญชา พร้อมด้วยภิกษุสงฆ์หมู่ใหญ่ประมาณ ๕๐๐ รูป เวรัญชพราหมณ์                                                                                              
ได้สดับข่าวถนัดแน่ว่า ท่านผู้เจริญ พระสมณะโคดมศากยบุตร ทรงผนวชจากศากยตระกูล                                                                                              
ประทับอยู่ ณ บริเวณต้นไม้สะเดาที่นเฬรุยักษ์สิงสถิต เขตเมืองเวรัญชา พร้อมด้วยภิกษุสงฆ์   
Enter fullscreen mode Exit fullscreen mode

The average line length is 68.8 bytes.

I ran the average-1000-etipitaka-flexi-streams on SBCL 2.2.5-1.1-suse on my laptop with Celeron N4500. It took 1.591 seconds.

Then I change the file to my-data.ndjson.zst, whose average line length is 515.5 bytes. Running average-1000-ndjson-flexi-streams took 4.411 seconds.

So I also tested with my customized utf8-input-stream. Running average-1000-etipitaka-utf8-input-stream, and average-1000-ndjson-utf8-input-stream took 0.019 seconds, and 0.043 seconds respectively, which means utf8-input-stream is 83X faster for short lines, and 102X faster for long lines, than Flexi-streams in these tests.

Oldest comments (0)