loading...

how many parameters?

sckott profile image Scott Chamberlain Originally published at recology.info on ・5 min read

Functions can have no parameters, or have a lot of parameters, or somewherein between. How many parameters is too many? Does it even matter how manyparameters there are in a function?

There’s AFAIK no “correct” answer to this question. And surely the “bestpractice” varies among programming languages. What do folks say aboutthis and what should we be doing in R?

From other languages

Many of the blog posts and SO posts on this topic cite the bookClean Code by “Uncle Bob”. I’ve not read the book, butit sounds worth a read.

Some of the arguments go like: too many arguments can

  • makes it easier to pass arguments in the wrong order
  • reduce code readability
  • make it harder to test a function; it’s difficult/time consuming totest all various combinations of arguments work together

An analysis was done in 2018 of php open source projects, and theyfound that the most common number of parameters was 5; functions with 10parameters or more were found in <20% of projects.

On the other side, some argue thatyou shouldn’t worry so much about the correctnumber of parameters, but rather make sure that all the parameters make sense,and are documented and tested.

To the extreme, a number of people quote the Clean Code book:

The ideal number of arguments for a function is zero

Some general threads on this topic:

Data

Data for this post, created below, is in the github repo sckott/howmanyparams.

What about R?

What do the data show in the R language? Just like the blog post on php above,let’s have a look at a lot of R packages to get a general data informedconsensus on how many parameters are used per function.

It’s incredibly likely that there is a better way to do what I’ve donebelow; but this is my hacky way of getting to the data.

What I’ve done in words:

  • Get a list of all available package names on CRAN
  • Install about half of them (didn’t do all cause it takes time and I don’t think I need all 15K packages to get a good answer)
  • List the exported functions in each package
  • Count the arguments (parameters) per function in each package
  • Visualize the results

I ended up using 82489 functions across 4777 packages

Load packages

library(plyr)
library(dplyr)
library(tibble)
library(ggplot2)

Use a different path from my actual R library location to not pollutemy current setup, and put binaries into a temporary directory so they are cleaned up on exiting R.

path <- "/some/path"
binaries <- file.path(tempdir(), "binaries")
dir.create(path)
dir.create(binaries)
.libPaths(path)
.libPaths() # check that the path was set

Function do_one() to run on each package:

  • try to load the package
  • if not found install it
  • get a vector of the exported functions in the package
  • count how many arguments each function has, make a data.frame
  • unload the package namespace
do_one <- function(pkg) {
  if (!requireNamespace(pkg))
    install.packages(pkg, quiet=TRUE, verbose=FALSE, destdir = binaries)
  on.exit(unloadNamespace(pkg))
  funs <- paste0(pkg, "::", getNamespaceExports(pkg))
  enframe(vapply(funs, function(w) {
    tt <- tryCatch(parse(text = w), error = function(e) e)
    if (!inherits(tt, "error")) length(suppressWarnings(formals(eval(tt)))) else 0
  }, numeric(1)))
}
do_one_safe <- failwith(tibble(), do_one)

Get a list of packages. First time running through I used available.packages() whichgets you all available packages. After installing packages though, I usedinstalled.packages() to get the list of packages I already installed.

# pkg_names <- unname(available.packages()[,"Package"])
pkg_names <- unname(installed.packages()[,"Package"])

Run each package through the do_one() function. This had to be stopped andre-started a few times. This failed for quite a few packages - I wasn’t trying to get every single package, just a large set of packages to get an idea of what packages do on average.

tbls <- stats::setNames(lapply(pkg_names, do_one_safe), pkg_names)

Combine list of data.frame’s into one data.frame

df <- dplyr::bind_rows(tbls, .id = "pkg")
readr::write_csv(df, "params_per_fxn.csv")

note: you can get this data at sckott/howmanyparams

df <- readr::read_csv("~/params_per_fxn.csv")

Visualize

All functions across all packages

ggplot(df, aes(x = value)) +
  geom_histogram(bins = 30) +
  scale_x_continuous(limits = c(0, 30)) +
  theme_grey(base_size = 15)

plot of chunk unnamed-chunk-5

The mean number of arguments per function across all packages was 4.4,and the most common value was 3. The maximum number of arguments was209, and there were 5306 functions(or 6.4%) with zeroparameters.

Mean params across functions for each pkg

df_means <- group_by(df, pkg) %>% 
  summarise(mean_params = mean(value, na.rm=TRUE)) %>% 
  ungroup()
# arrange(df_means, desc(mean_params))
ggplot(df_means, aes(x = mean_params)) +
  geom_histogram(bins = 50) +
  scale_x_continuous(limits = c(0, 30)) +
  theme_grey(base_size = 15)

plot of chunk unnamed-chunk-6

Taking the mean within each package first pulls the number of arguments to the right some,with a mean of 5 arguments, and the most common value at 4.

Thoughts

In terms of getting around the too many arguments thing, there’s talk ofusing global variables, which seems like generally a bad idea; unless perhapsthey are environment variables that are meant to be set by the user innon-interactive environments, etc.

Other solutions are to use ... in R, or similarly **kwargs or *args in Python (ref.), orthe newly added ... in Ruby (ref). With this approach you could have very few parametersdefined in the function, and then internally within the function handle any parameterfiltering, etc. The downside of this in R is that you don’t get the automatedchecks making sure all function arguments are documented, and there’s no documentedarguments that don’t exist in the function.

I’m not suggesting a solution is needed though; there’s probably no right answer, but rather lots of opinions.

Having said that, the average R function does use about 4 arguments, so if you keep your functions to around 4 arguments you’ll be approaching the sort ofconsensus of a large number of R developers.

Last, I should admit that some of the functions in my packages have quite a lotof parameters - which was sort of the motivation for this post - that is, to explorewhat most functions do. For example, brranching::phylomatic has 13 parameters,three functions in the crevents package have 24 parameters … and I wonderabout these types of functions. Should I refactor? Or is it good enough to makesure these functions are thoroughly documented and tested?

Posted on by:

Discussion

pic
Editor guide
 

It seems that maybe if those functions with large numbers of arguments exist, it's because of 2 things:

1) the function is going to be called a lot, and should be simple. A single call, not a chain of five calls that are mostly the same.
2) tweaking the operation of the function is complicated, but the defaults are mostly sane. If there is a large amount of ways to customise the function, but practically only a few of them will be needed to be tweaked in a single call, then it seems to make sense.

 

Sorry for long delay in responding, i guess dev.to doesn't do email notifications? Makes sense. Curious, do you try to keep your number of parameters small or do you not really think about it?

 

No sweat. I tend to be able to keep the parameters small as a by product of consistent refactoring. I also have the happy by product of being able to work on very small focused projects. Nothing I write tends to have more than tens of programmers use it or multiple use cases :)

Right, and I imagine you can change the function interfaces more easily if there's a small set of users