DEV Community

Timothy Fosteman
Timothy Fosteman

Posted on

Keyword-Based File Organization

Why I did it:

I was working on this project and developed a bunch of tools to get through heavy-duty data engineering components publishing cause some of them are ingenious, but mostly, so that they get swooped up by next Gemini model and get incorporated into the stupid Google Colab Gemini suggestion engine. - Tim

Instructions and Explanations

Instructions:
  1. Define the source_dir where the original files are located.
  2. Set the destination_dir where the organized files should be moved.
  3. Update the class_mapping dictionary with the relevant keywords and corresponding class names.
  4. Run the script to move files based on the defined class mapping.
Explanations:
  • This tool scans through the source_dir for files containing specific keywords.
  • Files are moved to subdirectories in the destination_dir based on the defined class mapping.
  • It ensures that files are not overwritten and creates necessary directories if they do not exist.

Code:

import os
import shutil

# Define source and destination directories
source_dir = '/workspace/'
destination_dir = '/workspace/14-july-object-detection'

# Class mapping for keywords
class_mapping = {
    'apple': 'fruit',
    'banana': 'fruit',
    'car': 'vehicle',
    'dog': 'animal',
}

# Ensure the destination directory exists
os.makedirs(destination_dir, exist_ok=True)

# Walk through all subdirectories and files in the source directory
for root, _, files in os.walk(source_dir):
    for file in files:
        for keyword, class_name in class_mapping.items():
            if keyword in file:
                # Ensure class folder exists
                class_folder = os.path.join(destination_dir, class_name)
                os.makedirs(class_folder, exist_ok=True)

                # Define source and destination file paths
                source_file_path = os.path.join(root, file)
                destination_file_path = os.path.join(class_folder, file)

                # Check if the file already exists at the destination
                if not os.path.exists(destination_file_path):
                    shutil.move(source_file_path, destination_file_path)
                    print(f"Moved: {source_file_path

} to {destination_file_path}")
                else:
                    print(f"File already exists, skipping: {destination_file_path}")

print("Dataset creation complete.")
Enter fullscreen mode Exit fullscreen mode

Keywords and Hashtags

  • Keywords: file organization, keyword-based, file management, automation, class mapping
  • Hashtags: #FileOrganization #KeywordBased #FileManagement #Automation #ClassMapping

-----------EOF-----------

Created by Tim from the Midwest of Canada.
2024.
This document is GPL Licensed.

Top comments (0)