DEV Community

Kevin Mungai
Kevin Mungai

Posted on

Programming Problem

Write a program that given a number of documents, can find all the documents with words containing the letter "a" in them.

Using the with-open macro that will automatically close any I/O. The clojure.string/includes? function will check each line as it comes in if it contains the letter "a" and will return true or false.

(defn contains-a?
  [document]
  (with-open [rdr (clojure.java.io/reader document)]
    (clojure.string/includes? (line-seq rdr) "a")))

(filter #(contains-a? %) [documents])
Enter fullscreen mode Exit fullscreen mode

Next step is to:

  1. try and succeed as fast as possible, i.e once the program detects an "a" it should return true and stop looking through that file and,

  2. leverage the AsynchronousFileChannel of Java plus core.async to maybe parallelize the work? Not sure how this would work.

Inspiration
gist

Top comments (1)

Collapse
 
garry_tribure profile image
Garry Tribure

Here’s a possible answer that refines your initial program using Clojure's core.async and Java's AsynchronousFileChannel to speed up the search for words containing the letter "a" across multiple documents. The idea is to stop scanning a document as soon as an "a" is found, and to parallelize the processing of documents.

(ns async-file-search
  (:require [clojure.core.async :refer [go chan <!! >!! go-loop]]
            [clojure.java.io :as io]
            [clojure.string :as str])
  (:import [java.nio.channels AsynchronousFileChannel]
           [java.nio.file Paths StandardOpenOption]))

(defn contains-a?
  "Checks if a file contains a word with the letter 'a' using an AsynchronousFileChannel."
  [document]
  (let [buf-size 1024
        ch (AsynchronousFileChannel/open
             (Paths/get document)
             (into-array [StandardOpenOption/READ]))
        buffer (java.nio.ByteBuffer/allocate buf-size)]
    (loop []
      (let [bytes-read (.read ch buffer)]
        (if (pos? bytes-read)
          (let [data (String. (.array buffer) 0 bytes-read)]
            (if (str/includes? data "a")
              true
              (recur)))
          false)))))

(defn process-documents
  "Processes a list of documents in parallel to check for 'a' asynchronously."
  [documents]
  (let [results (chan)]
    (doseq [doc documents]
      (go
        (>!! results {:doc doc :contains-a (contains-a? doc)})))
    (go-loop [result-map []]
      (if-let [result (<!! results)]
        (recur (conj result-map result))
        result-map))))

;; Example usage:
(def documents ["doc1.txt" "doc2.txt" "doc3.txt"])
(def results (<!! (process-documents documents)))

(println "Documents containing 'a':"
         (filter #(true? (:contains-a %)) results))
Enter fullscreen mode Exit fullscreen mode