DEV Community

Cover image for Navigating the uncharted Stream
Bastien Helders
Bastien Helders

Posted on • Updated on

Navigating the uncharted Stream

Wrong bend of the river

When I left school, Java 6 had just been released and wasn’t really integrated into the school curriculums yet. But I was content to think that I could simply stay as a web developer and forget all that Java is (didn't pay that much attention), forget about IDE, compiler, and all that nonsense. Who needs something like that when you’ve got your trusty text editor to power your web wizardry? Boy, was I wrong.

Since about 2 years, I need to use Java again, I first went in it with my basic programming language knowledge, then rediscovered Object-Oriented Programming and lately I’ve been introduced to Java 8 features. But each time I encountered a new obstacle in my way, I used the new or re-acquainted knowledge more as a Band-Aid, leading more often degraded a bit the quality of the code. Lately, after some refactoring session to level things a bit, I realized that I needed to change strategy. So instead of always scratch the surface, I decided to dig deeper.

The uncharted Stream

It first began when I wanted to find specific information in files, to see if the content wasn’t affected during the course of a test. I would have used reader = new BufferedReader(new FileReader(file)); to check each line for the specific pattern, And this for each file, if I didn't find this question on how to find pattern in files which used the Stream object introduced in Java 8. As a result, I came up with the following code:

// Searching for <filename:"…"> and extract its value.
Pattern pattern = Pattern.compile("filename:\"(.+)\"");

// List of objects containing information on the files
// generated at the beginning of the test.
List<GeneratedFile> generatedFileList = myTest.getFileList();

For (File file : fileList){
    Optional<String> message = Files.lines(Paths.get(file.toURI()))
                                    .filter(pattern.asPredicate())
                                    .findFirst();
    if (message.isPresent())
    {
        String fileContent = message.get();
        Matcher matcher = pattern.matcher(fileContent);
        matcher.find();
        String value = matcher.group(1);
        GeneratedFile generatedFile = findFileByOriginalName(value, generatedFileList);
        if(generatedFileList == null){
            Assert.fail("File "
                        + originalFilename
                        + " wasn't found.");
        }
    }
}

private static GeneratedFile findFileByOriginalName(String filename,
                                List<GeneratedFile> generatedFileList){
    for (GeneratedFile generatedFile : generatedFileList){
        if(generatedFile.getFileName().equals(filename)){
            return generatedFile;
        }
    }
    return null;
}

Enter fullscreen mode Exit fullscreen mode

It became more complicated when I wanted to extract another data point from the file content and it threw me an IllegalStateException. I then discovered the Supplier which would allow me to get the content of the file for each search without worrying about the Stream auto-closing:

Supplier<Stream<String>> linesSupplier = null;
try
{
    linesSupplier = () -> getFileStream(tempfile);
}
catch (Exception e)
{
    log.error(e);
}

//Replace the pattern for each piece of information searched.
Optional<String> message = linesSupplier.get()
                                        .filter(filenamePattern.asPredicate())
                                        .findFirst();

private static Stream<String> getFileStream(File file)
{
    try
    {
        return Files.lines(Paths.get(file.toURI()));
    }
    catch (IOException e)
    {
    log.error("Exception while trying to supply file "
              + file.getName()
              + " content"
              , e);
    }
    return null;
}
Enter fullscreen mode Exit fullscreen mode

Rocky ride

I was quite intrigued and decided to use my newfound knowledge everywhere I thought applicable. And I thought it would be perfect for dealing with any file operation and get rid of those pesky finally clause. I thought that the new and shiny thing would be so much better. Turns out that a Stream is not always auto-closeable, and furthermore it completely broke my code and became an unreadable mess...

Needless to say, my understanding of Stream was not as good as the one I had for Optional for which I already found a use I mostly agree upon. At that point, I decided that I should thrive to better understand what I use, so I could make the best of it.

So when does one use Stream? Let me quote an answer to this question as I asked on Stack Overflow:

If you can easily separate out distinct steps to apply to each dataset member, a Stream seems like a solid bet since that API can combine steps whenever possible.

It means that you need to have a concrete idea of what needs to happen to a point where you can describe it with discrete steps.

Furthermore, I found out that one should be aware of the order in which you do your operations: first, you need to filter, then manipulate and finally collect. Even if it seems like it wouldn't matter most of the time, there are some cases where it could be relevant and it makes more sense from a story standpoint.

Crashing waves

Let's talk about a concrete example, the one I alluded at the beginning of the previous section.

I have a template file that I want to populate by replacing placeholder strings with the wanted values. For that, I feed my method two lists, one with the placeholder keys and one with the desired values.

private static void populateTemplateFile(ArrayList<String> sKeys, ArrayList<String> sVals){
    String line="";
    String newline = System.getProperty("line.separator");
    BufferedReader reader = null;
    FileWriter writer = null;
    StringBuilder storeLine = new StringBuilder("");

    try{
        File file = new File(TEMPLATE_FILE);
        reader = new BufferedReader(new FileReader(file));
        while((line = reader.readLine()) != null){
            storeLineBuilder.append(line + newline);
        }
        reader.close();

        String storeLine = storeLineBuilder.toString();
        for (int iCtr=0; iCtr<sKeys.size();iCtr++){
            storeLine=storeLine.replaceAll(sKeys.get(iCtr), sVals.get(iCtr));
        }

        writer = new FileWriter(TARGET_FILE);
        writer.write(storeLine);
        writer.close();

    } catch(FileNotFoundException e) {
        log.error("The file " + TARGET_FILE + " could not be found!", e);
    } finally {
        if(reader != null) {
            reader.close();
        }
        if(writer != null) {
            writer.close();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Now it was my first attempt at adapting the method to use Stream:

private static void populateTemplateFile(ArrayList<String> sKeys, ArrayList<String> sVals) {
    File file = new File(TEMPLATE_FILE);
    File trgFile = new File(TARGET_FILE);
    Stream<String> lines = null;

    try
    {
        lines = Files.lines(Paths.get(file.toURI()));

        //Newline for readability, this is only one line.
        Stream<String> replaced = lines
            .map(line -> {
                for(int iCtr=0; iCtr<sKeys.size(); iCtr++)
                    line.replaceAll(sKeys.get(iCtr), sVals.get(iCtr));
                    return line;});
        lines.close();
        Files.write(Paths.get(trgFile.toURI()), replaced.toString().getBytes());
    }
    catch (IOException e)
    {
       log.error("Exception while trying to create " + TARGET_FILE, e);
    }
    finally {
        lines.close();
    }
}
Enter fullscreen mode Exit fullscreen mode

What a mess did I make... At this point, I would have expected a three-beat story of filtering, processing, collecting but here is what I have:

  1. Get the stream I initialized earlier.
  2. For each of the key, replace it by its corresponding value. Repeat for each line.
  3. Profit?

It appears quite clunky to me for two reasons: there are no clear collecting steps and we have here a double iteration and after further researching the subject, it appears that the use of Stream, in most cases, would not be advised.

In the end I couldn't find a process which would do things better as what is currently used.

Parting thoughts

I am still at the beginning of my journey, but now I have a much better insight on when to use Stream. To summarize:

  • There should be a clear story as to what you want to do.
  • The story should, in general, follow the pattern filter -> process -> collect.
  • Double loop is a no go.

In the end, I now have the feeling that I acquired a comprehension of the utility of Stream and will be able to use it better from now on.

Until the next bend of the river.

Top comments (4)

Collapse
 
lluismf profile image
Lluís Josep Martínez

Check accepted answer in stackoverflow.com/questions/285045... it's basically what you need, with less boilerplate and faster.

Collapse
 
eldroskandar profile image
Bastien Helders • Edited

Still I think I'll need to use a Supplier, as I'll have to loop through the Lists or better through a Map to change each placeholder with their corresponding values.

Furthermore, I will then still use a loop to loop through the Map entries which will make me write the file anew for each of the entries. Unless I'm missing something, which probably could be, I don't think this would be efficient.

Collapse
 
lluismf profile image
Lluís Josep Martínez

If you want to get rid of the loop (whatever the reason) I guess you need to change the input values. Instead of 2 lists, use a single collection of KeyValue objects or a Map. And you would be able to use map and a closure to perform the replacement.

Regarding efficiency, if the input file is small no problem. But if it's in the order of gigabytes, converting the Stream to String needs to allocate a huge array of bytes in the heap and you risk an out of memory. Writing to the output file as a stream OTOH will be efficient - in memory terms (not CPU).

Collapse
 
lluismf profile image
Lluís Josep Martínez

If the file is huge there is a high chance of getting an OutOfMemoryException. You should write to the output file as a stream to avoid this particular problem.