DEV Community

Cover image for Automate Codebase Documentation: Ruby Script for Markdown Conversion
Sulman Baig
Sulman Baig

Posted on • Edited on • Originally published at sulmanweb.com

Automate Codebase Documentation: Ruby Script for Markdown Conversion

I have published latest better script and article at:

As developers, we often face the challenge of understanding a large codebase, whether it's a project we haven't touched in years or someone else's code. With complex folder structures and numerous files, it can be overwhelming to locate specific components or simply grasp the bigger picture. This is especially true when using AI tools like Perplexity or Claude, where attaching an entire codebase for reference isn't practical. What if you could easily convert your entire codebase into a readable Markdown file? Enter my Ruby script that documents a project by turning all of its files into a single Markdown file.

The Idea Behind the Script

The main goal of this script was to create something lightweight that helps navigate a codebase, making it easier to reference files without manually piecing together content. AI tools are great at generating insights, but asking for help often involves attaching large chunks of code. By converting your project into a Markdown file with an organized table of contents, you can easily share a high-level overview of the project—along with specific snippets—to get effective help without overwhelming anyone.

The Script

GIST - Generating a Markdown Documentation for Your Codebase with Ruby

require 'fileutils'

ALWAYS_IGNORE = ['.git', 'tmp', 'log', '.ruby-lsp', '.github', '.devcontainer'].freeze

def read_gitignore(directory_path)
  gitignore_path = File.join(directory_path, '.gitignore')
  return [] unless File.exist?(gitignore_path)

  File.readlines(gitignore_path).map(&:chomp).reject(&:empty?)
end

def ignored?(path, base_path, ignore_patterns)
  relative_path = path.sub("#{base_path}/", '')

  # Check if the path starts with any of the ALWAYS_IGNORE directories
  return true if ALWAYS_IGNORE.any? { |dir| relative_path.start_with?(dir + '/') || relative_path == dir }

  ignore_patterns.any? do |pattern|
    File.fnmatch?(pattern, relative_path, File::FNM_PATHNAME | File::FNM_DOTMATCH) ||
      File.fnmatch?(File.join('**', pattern), relative_path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
  end
end

def convert_to_markdown(file_path)
  extension = File.extname(file_path).downcase[1..]
  format = extension.nil? || extension.empty? ? 'text' : extension

  begin
    content = File.read(file_path, encoding: 'UTF-8')
    "# #{File.basename(file_path)}\n\n```

#{format}\n#{content}\n

```\n\n"
  rescue StandardError => e
    "# #{File.basename(file_path)}\n\n[File content not displayed: #{e.message}]\n\n"
  end
end

def sanitize_anchor(text)
  text.gsub(/[^a-zA-Z0-9\-_]/, '-').gsub(/-+/, '-').downcase
end

def process_directory(directory_path, output_file)
  ignore_patterns = read_gitignore(directory_path)
  markdown_content = "# File Documentation\n\n## Table of Contents\n\n"
  file_contents = []

  Dir.glob("#{directory_path}/**/*", File::FNM_DOTMATCH).each do |file_path|
    next if File.directory?(file_path)
    next if ['.', '..'].include?(File.basename(file_path))
    next if ignored?(file_path, directory_path, ignore_patterns)

    relative_path = file_path.sub("#{directory_path}/", '')
    anchor = sanitize_anchor(relative_path)
    markdown_content += "- [#{relative_path}](##{anchor})\n"
    file_contents << "## #{relative_path}\n\n#{convert_to_markdown(file_path)}"
  end

  markdown_content += "\n---\n\n" + file_contents.join("\n---\n\n")

  File.write(output_file, markdown_content)
  puts "Markdown file created: #{output_file}"
end

# Check if correct number of arguments are provided
if ARGV.length != 2
  puts "Usage: ruby ruby_to_md.rb <input_directory> <output_file>"
  exit 1
end

input_directory = ARGV[0]
output_file = ARGV[1]

process_directory(input_directory, output_file)
Enter fullscreen mode Exit fullscreen mode

What Does the Script Do?

The script reads all the files in a given project folder, creates a structured table of contents, and converts each file's content into a Markdown-formatted section. The generated Markdown file gives you:

  1. An Easy-to-Navigate Table of Contents: Every file is listed with clickable links, allowing quick access to the contents of each one.
  2. Readable File Content: Each file is included in Markdown format, properly formatted for easy readability.
  3. Filtering of Unnecessary Files: Folders like .git, temporary directories (tmp, log), and files listed in .gitignore are automatically skipped.

Here's how you can use the script:

ruby ruby_to_md.rb /path/to/your/project output.md
Enter fullscreen mode Exit fullscreen mode

This will generate a output.md file with the complete content of your project, allowing you to browse and share it easily.

Breaking Down the Script

Let's walk through the key parts of the script:

  1. Ignoring Unwanted Files:
    The script ensures that irrelevant folders, like .git and those listed in .gitignore, are ignored. This makes sure that only the necessary parts of your codebase are documented:

    ALWAYS_IGNORE = ['.git', 'tmp', 'log', '.ruby-lsp', '.github', '.devcontainer'].freeze
    
  2. Markdown Conversion:
    The script reads each file and converts it to a Markdown block that specifies the language of the file based on its extension. This helps Markdown renderers (like GitHub or VS Code) display code with proper syntax highlighting:

    def convert_to_markdown(file_path)
      extension = File.extname(file_path).downcase[1..]
      format = extension.nil? || extension.empty? ? 'text' : extension
    
      begin
        content = File.read(file_path, encoding: 'UTF-8')
        "# #{File.basename(file_path)}\n\n\```
    
    #{format}\n#{content}\n\
    
    ```\n\n"
      rescue StandardError => e
        "# #{File.basename(file_path)}\n\n[File content not displayed: #{e.message}]\n\n"
      end
    end
    
  3. Organizing Everything with a Table of Contents:
    To make navigation easy, the script generates a table of contents at the beginning of the Markdown file:

    def sanitize_anchor(text)
      text.gsub(/[^a-zA-Z0-9\-_]/, '-').gsub(/-+/, '-').downcase
    end
    
    markdown_content = "# File Documentation\n\n## Table of Contents\n\n"
    

    Every file is listed here with an anchor link, so you can quickly jump to the specific part of the Markdown file.

Use Cases for This Markdown Documentation

  • Sharing Code for Review: When collaborating with others, sharing a single Markdown file that documents the entire codebase can be extremely helpful for reviews and discussions.
  • Getting Help from AI Tools: Some AI tools don't have the ability to analyze an entire project directory, but they do allow you to attach files. Instead of attaching dozens of files individually, you can attach this single Markdown file that documents everything.
  • Better Understanding of a New Project: When working with an unfamiliar codebase, this documentation can serve as an effective way to explore it without getting lost in the file structure.

Next Steps

If this script sounds useful to you, give it a try on your own projects! You can tweak the ALWAYS_IGNORE list or the .gitignore handling to suit your specific needs. I'd love to hear how you use this tool, and I'm open to suggestions for improvements.

Feel free to share your thoughts or even fork it to add new features. Let's make code navigation and understanding easier for everyone!

Top comments (2)

Collapse
 
johannes_k_rexx profile image
johnblommers

The script will embed PDF, JPG, PNG, WEP, ODS et al. inside the output.md file. Therefore, ensure that your .gitignore file includes anything else that a Markdown viewer won't display:

*.pdf
*.jpg
*.jpeg
*.png
*.webp
*.ods
Enter fullscreen mode Exit fullscreen mode
Collapse
 
sulmanweb profile image
Sulman Baig