DEV Community

Evgeny Vashchenko
Evgeny Vashchenko

Posted on • Edited on

Improving the .NET API to work with the structure of the file system. Part 1. Enumerate filesystem objects.

In this post, we will not dive deeper into information about filesystems, but will focus on the following. There are two main objects in the file system that form its structure: a file and a directory, which is also called a catalog or folder. A file contains data, and a directory contains both files and other subdirectories. We can say that directories are special files that contain a list of references to nested files and subdirectories. Such a structure is usually represented as a tree, where files can only be leaf nodes, and directories can be root, internal, or leaf nodes.
In .NET, in the System.IO namespace, there are several classes that represent file system objects: a directory and a file, which are represented by the DirectoryInfo and FileInfo classes, respectively. The base for them is the FileSystemInfo class, which allows you to represent any node in the tree structure of the file system.
Here, as an example, we will consider the filesystem structure of the Windows operating system, in which the logical unit of storage with a separate file system is a volume. The name of each volume, also sometimes called a disk, is represented by a Latin letter followed by a colon. The path to the volume's root node can be obtained through the GetDirectoryRoot method of the Directory class, passing the name of the volume as a parameter. The volume root node directory object can be created from the root node path obtained above by passing it to the constructor of the DirectoryInfo class. This class contains a lot of methods that allow you to get a list of filesystem objects contained in a directory - files and subdirectories, both separately and together. There are two kinds of methods, prefixed with Enumerate and Get.
Methods that start with Get return an array of elements after they have been received completely, so the delay between calling the method and getting the results can be quite long. Methods that start with Enumerate are deferred and immediately return an enumerated object, and then, as the objects are enumerated, they can load the next elements.
In these methods, you can specify a name filter mask (the searchPattern parameter) that is applied to the name of a file system object. These methods also have an option (the searchOption parameter), with which you can specify the extraction depth of directory objects: either only from the specified directory, or from all subdirectories. After analyzing the capabilities of the methods provided by the standard API, we can conclude that they are not flexible enough in filtering. Therefore, extension methods have been made that provide additional functionality.
There are several categories that these extension methods fall into:

  • immediate (Get) and deferred (Enumerate).
  • regular and extended.
  • outputting standard API objects and individual objects created from standard ones.
  • outputting any objects of the filesystem (files and directories), or only specific ones.

All methods support object filtering by predicate, sorting, and some additional features. Extended methods with the Ex suffix in the searchPatternSelector delegate, filtering predicate and output selector (for custom types) take the list of parent directories from the start directory. Thus, there is ready information about the nesting level of the current object and its parent directories. Due to the large number of method parameter variations. Let's take a look at these parameters.

startDirectoryInfo DirectoryInfo - The starting directory from which file system objects are enumerated.

searchPattern string - The search string to match against the names of files in path. This parameter can contain a combination of valid literal path and wildcard (* and ?) characters, but it doesn't support regular expressions. Null value is not allowed for compatibility with the standard API.

searchPatternSelector Delegate - Delegate that returns a search pattern for the specified directory. If the delegate returns null string as the search pattern for the specified directory, then no filesystem entries are searched for in that directory. Null value is not allowed for parameter.

If the method does not specify the searchPattern or searchPatternSelector parameter, then in this case the search pattern * is used, i.e. all elements will be output.

searchOption SearchOption - One of the enumeration values that specifies whether the search operation should include only the current directory or should include all subdirectories.

maxDepth int - The maximum descent depth to enumerate file system objects. 0 - depth of the start directory node, 1 - depth of child elements of the start directory and so on.

If no searchOption or maxDepth traversal depth parameter is specified in these extension methods, then by default it is considered that the traversal is performed to the maximum possible depth of the subtree.

traversalOptions FileSystemTraversalOptions - Enum type flags value specifying file system object enumeration options.

None - no action.
ExcludeStartDirectory - excludes the start directory from the resulting list of objects if possible (this flag has no effect for outputting files only).
ExcludeEmptyDirectory - excludes empty directories from the resulting list of objects. If a directory contains empty subdirectories, then it will also be considered empty and will be excluded from the list. By specifying this flag, you can exclude entire subtrees of empty directories from the output. If you want to exclude only really empty directories (without any elements), then this operation can be easily performed in the directory predicate.
Reverse - Indicates that the elements will be in reverse order. Because traversal of the file system tree occurs in depth, then this option will be useful when deleting file system objects element by element, starting from the deepest levels. Keep in mind that the access operation is performed in memory, and all elements will be loaded before the first element is displayed.
Refresh - causes the state of file system objects to be updated before they are used directly. In some cases, this will avoid the generation of exceptions in the presence of logical errors in the actions performed. For example, when you enumerate a directory in depth and move it or delete it along with its contents. After the content directory is moved or deleted, the underlying items in the list will no longer be valid. When this option is set, the state of the objects will be updated and actions to move or delete them will be skipped. But the correct solution would be to describe a recursive action with each element, or in the case of moving or deleting a directory with contents, limit the depth of enumeration descent.

The presence of this parameter is a hallmark of all extension methods described here.

Predicates (predicate), passed either as delegates or as interfaces, are used to filter output file system elements.

To sort the displayed elements, comparators are used, also passed as delegates (comparison) or interfaces (comparer). Sorting is done within each directory.

selector Delegate - To display custom elements in the corresponding methods, the selector delegate parameter is used, which converts the standard API object into objects of a custom type.

I especially note the very rich possibilities for filtering objects. For example, you can limit the depth of tree descent by specifying its maximum value. You can also limit the entry of an element into the selection itself through a predicate. But there is another flexible and efficient option, when a directory is included in the selection, but its contents are not. To do this, you need to apply a search pattern selector (searchPatternSelector parameter), i.e., for the specified directory, return null, thereby preventing the selection of underlying elements. Also, using the specified selector, you can set your own search pattern for each directory.

Below is a small list of methods (there are a lot of methods with different variations of parameters).


public static IEnumerable<FileSystemInfo> EnumerateFileSystemInfos(this DirectoryInfo startDirectoryInfo, Func<DirectoryInfo, string?> searchPatternSelector, int maxDepth, FileSystemTraversalOptions traversalOptions, Predicate<FileSystemInfo>? predicate, Comparison<FileSystemInfo>? comparison);

public static IEnumerable<TFileSystemInfo> EnumerateFileSystemInfosEx<TFileSystemInfo>(this DirectoryInfo startDirectoryInfo, Func<IReadOnlyList<DirectoryInfo>, string?> searchPatternSelector, int maxDepth, FileSystemTraversalOptions traversalOptions, Predicate<(FileSystemInfo fileSystemInfo, IReadOnlyList<DirectoryInfo> parentalDirectoryInfos)>? predicate, Comparison<FileSystemInfo>? comparison, Func<FileSystemInfo, IReadOnlyList<DirectoryInfo>, TFileSystemInfo> selector);

public static IEnumerable<TDirectoryInfo> EnumerateDirectories<TDirectoryInfo>(this DirectoryInfo startDirectoryInfo, Func<DirectoryInfo, string?> searchPatternSelector, int maxDepth, FileSystemTraversalOptions traversalOptions, Predicate<DirectoryInfo>? predicate, Comparison<DirectoryInfo>? comparison, Func<DirectoryInfo, TDirectoryInfo> selector);

public static IEnumerable<DirectoryInfo> EnumerateDirectoriesEx(this DirectoryInfo startDirectoryInfo, Func<IReadOnlyList<DirectoryInfo>, string?> searchPatternSelector, int maxDepth, FileSystemTraversalOptions traversalOptions, Predicate<(DirectoryInfo directoryInfo, IReadOnlyList<DirectoryInfo> parentalDirectoryInfos)>? predicate, Comparison<DirectoryInfo>? comparison);

public static IEnumerable<FileInfo> EnumerateFiles(this DirectoryInfo startDirectoryInfo, Func<DirectoryInfo, string?> searchPatternSelector, int maxDepth, FileSystemTraversalOptions traversalOptions, Predicate<FileInfo>? filePredicate, Predicate<DirectoryInfo>? directoryPredicate, Comparison<FileSystemInfo>? comparison);

public static IEnumerable<TFileInfo> EnumerateFilesEx<TFileInfo>(this DirectoryInfo startDirectoryInfo, Func<IReadOnlyList<DirectoryInfo>, string?> searchPatternSelector, int maxDepth, FileSystemTraversalOptions traversalOptions, Predicate<(FileInfo fileInfo, IReadOnlyList<DirectoryInfo> parentalDirectoryInfos)>? filePredicate, Predicate<(DirectoryInfo directoryInfo, IReadOnlyList<DirectoryInfo> parentalDirectoryInfos)>? directoryPredicate, Comparison<FileSystemInfo>? comparison, Func<FileInfo, IReadOnlyList<DirectoryInfo>, TFileInfo> selector)

Enter fullscreen mode Exit fullscreen mode

Consider the example, when displaying the structure of file system objects from the initial directory with a depth of no more than four levels (all directories and files with names starting with an underscore and two digits) and sorting elements by name within each directory.

var items = new DirectoryInfo(@"C:\Test")
  .EnumerateFileSystemInfosEx("*", 4, FileSystemTraversalOptions.None,
      item => item.fileSystemInfo.IsDirectory() || Regex.IsMatch(item.fileSystemInfo.Name, @"^_\d{2}.+"),
      (x, y) => Comparer<string>.Default.Compare(x.Name, y.Name),
      (fsi, dis) => new { Name = fsi.FullName, Level = dis.Count });
Enter fullscreen mode Exit fullscreen mode

As a result, I note that the nuget package that contains the FileSystemInfoExtension class from namespace PowerLib.System.IO is called VasEug.PowerLib.System and has a MIT license.

In the second part, I will describe how to manipulate a group of file system objects in one method call (copy, move, delete).

Top comments (0)