DEV Community

Cover image for JavaScript, Ruby and C are not call by reference
Derk-Jan Karrenbeld for XP Bytes

Posted on • Updated on • Originally published at xpbytes.com

JavaScript, Ruby and C are not call by reference

🛑 This article is a response to various articles in the wild which state that JavaScript and Ruby are "Call/Pass by reference" for objects and "Call/Pass by value" for primitives.

Many of these articles provide a lot of valuable information and this article is not to unequivically say that those articles should not have been written or are useless. Instead, this article attempts to explore the semantic, yet pedantic, meanings and definitions of

  • call by reference
  • pass a reference
  • reference type
  • reference

First, I would like to make a few statements, after which Ill try to explore what these statements actually mean and why I've made them, contrary to various articles in the wild.

☕ When you see this emoji (☕), I try to give a non-code analogy to help you better understand what's going on. These abstractions are pretty leaky and might not hold up, but they're only meant in the context of the paragraphs that surround them. Take them with a grain of salt.

Black and yellow metal signage beside green grasses during daytime, in Yangmingshan, Taipei, Taiwan

Statements

  • JavaScript is always call by value.
  • Ruby is always call by value.
  • C is always call by value.
  • The terminology is confusing and perhaps even flawed.
  • The terminology only applies to function (procedure) parameters.
  • Pointers are an implementation detail and their presence don't say anything about the evaluation of function parameters.

History and Definitions

I've tried to look up the origins of the terms as mentioned above, and there is quite a bit of literature out there from the earlier programming languages.

The Main Features of CPL (D. W. Barron et al., 1963):

Three modes of parameter call are possible; call by value (which is equivalent to the ALGOL call by value), call by substitution (equivalent to ALGOL call by name), and call by reference. In the latter case, the LH value of the actual parameter is handed over; this corresponds to the "call by simple name" suggested by Strachey and Wilkes (1961).

It is important to note that here the literature talks about mode of parameter call. It further distinguishes three modes: call by value, call by name and call by reference.

Further literature gives a good, yet technical, definition of these three and a fourth strategy (namely copy restore), as published in the Semantic Models of Parameter Passing (Richard E. Fairly, 1973). I've quoted 2 of the 4 definitions below, after which I'll break them down and explain what they mean in more visual terms.

Call by Value

[...] Call by Value parameter requires that the actual parameter be evaluated at the time of the procedure call. The memory register associated with the formal parameter is then initialised to this value, and references to the formal parameter in the procedure body are treated as references to the local memory register in which the initial value of the actual parameter was stored. Due to the fact that a copy of the value associated with the actual parameter is copied into the local memory register, transformations on the parameter value within the procedure body are isolated from the actual parameter value. Because of this isolation of values, Call by value can not be used to communicate calculated values back to the calling program.

Roughly, this means that a parameter is, before the function (procedure) is called, completely evaluated. The resulting value (from that evaluation), is then assigned to the identifier inside the function (formal parameter). In many programming languages this is done by copying the value to a second memory address, making the changes inside the function (procedure body) isolated to that function.

In other words: the original memory address' contents (the one used to store the evaluated expression before passing it into the function) can not be changed by code inside the function and changes inside the function to the value are not propagated to the caller.

☕ When you order a coffee and someone asks for your name, they might write it down incorrectly. This doesn't affect your actual name and the change is only propagated to the cup.

Call by Reference

[...] In Call by Reference, the address (name) of the actual parameter at the time of the procedure call is passed to the procedure as the value to be associated with the corresponding formal parameter. References to the formal parameter in the procedure body result in indirect addressing references through the formal parameter register to the memory register associated with the actual parameter in the calling procedure. Thus, transformations of formal parameter values are immediately transmitted to the calling procedure, because both the actual parameter and the formal parameter refer to the same register.

Roughly, this means that, just like before, the parameter is evaluated, but, unlike before, the memory address (address / name) is passed to the function (procedure). Changes made to the parameter inside the function (formal parameter) are actually made on the memory address and therefore propagate back to the caller.

☕ When you go to a support store for one of your hardware devices and ask for it to be fixed, they might give you a replacement device. This replacement device is still yours, you own it just like before, but it might not be the exact same one you gave to be fixed.

Reference (and value) types

This is not the complete picture. There is one vital part left that causes most of the confusion. Right now I'll explain what a reference type is, which has nothing to do with arguments/parameters or function calls.

Reference types and value types are usually explained in the context of how a programming language stores values inside the memory, which also explains why some languages choose to have both, but this entire concept is worthy of (a series of) articles on its own. The Wikipedia page is, in my opinion, not very informative, but it does refer to various language specs that do go into technical detail.

A data type is a value type if it holds a data value within its own memory space. It means variables of these data types directly contain their values.

Unlike value types, a reference type doesn't store its value directly. Instead, it stores the address where the value is being stored.

In short, a reference type is a type that points to a value somewhere in memory whereas a value type is a type that directly points to its value.

☕ When you make a payment online, and enter your bank account number details, for example your card number, the card itself can not be changed. However, the bank account's balance will be affected. You can see your card as a reference to your balance (and multiple cards can all reference the same balance).

☕ When you pay offline, that is with cash, the money leaves your wallet. Your wallet holds its own value, just like the cash inside your wallet. The value is directly where the wallet/cash is.

Show me the code proof

function reference_assignment(myRefMaybe) {
  myRefMaybe = { key: 42 }
}

var primitiveValue = 1
var someObject = { is: 'changed?' }

reference_assignment(primitiveValue)
primitiveValue
// => 1

reference_assignment(someObject)
// => { is: 'changed?' }

As shown above, someObject has not been changed, because it was not a reference to someObject. In terms of the definitions before: it was not the memory
address of someObject that was passed, but a copy.

A language that does support pass by reference is PHP, but it requires special syntax to change from the default of passing by value:

function change_reference_value(&$actually_a_reference)
{
    $actually_a_reference = $actually_a_reference + 1;
}

$value = 41;
change_reference_value($value);
// => $value equals 42

I tried to keep the same sort of semantic as the JS code.

As you can see, the PHP example actually changes the value the input argument refers to. This is because the memory address of $value can be accessed by the parameter $actually_a_reference.

What's wrong with the nomenclature?

Reference types and "boxed values" make this more confusing and also why I believe that the nomenclature is perhaps flawed.

The term call-by-value is problematic. In JavaScript and Ruby, the value that is passed is a reference. That means that, indeed, the reference to the boxed primitive is copied, and therefore changing a primitive inside a function doesn't affect the primitive on the outside. That also means that, indeed, the reference to a reference type, such as an Array or Object, is copied and passed as the value.

Because reference types refer to their value, copying a reference type makes the copy still refer to that value. This is also what you experience as shallow copy instead of deep copy/clone.

Whoah. Okay. Here is an example that explores both these concepts:

function appendOne(list) {
  list.push(1)
}

function replaceWithFive(list) {
  list = [5]
}

const first = []
const second = []

appendOne(first)
first
// => [1]

replaceWithFive(second)
second
// => []

In the first example it outputs [1], because the push method modifies the object on which it is called (the object is referenced from the name list). This propagates because the list argument still refers to the original object first (its reference was copied and passed as a value. list points to that copy, but points to the same data in memory, because Object is a reference type).

In the second example it outputs [] because the re-assignment doesn't propagate to the caller. In the end it is not re-assigning the original reference but only a copy.

Here is another way to write this down. 👉🏽 indicates a reference to a different location in memory.

first_array   = []
second_array  = []

first         = 👉🏽 first_array
list          = copy(first) = 👉🏽 first_array
list.push     = (👉🏽 first_array).push(...)

// => (👉🏽 first_array) was changed

second        = 👉🏽 second_array
list          = copy(second) = 👉🏽 second_array
replace_array = []
list          = 👉🏽 replace_array

// => (👉🏽 second_array) was not changed

What about pointers?

C is also always pass by value / call by value, but it allows you to pass a pointer which can simulate pass by reference. Pointers are implementation details, and for example used in C# to enable pass by reference.

In C, however, pointers are reference types! The syntax *pointer allows you to follow the pointer to its reference. In the comments in this code I tried to explain what is going on under the hood.

void modifyParameters(int value, int* pointerA, int* pointerB) {
    // passed by value: only the local parameter is modified
    value = 42;

     // passed by value or "reference", check call site to determine which
    *pointerA = 42;

    // passed by value or "reference", check call site to determine which
    *pointerB = 42;
}

int main() {
    int first = 1;
    int second = 2;
    int random = 100;
    int* third = &random;

    // "first" is passed by value, which is the default
    // "second" is passed by reference by creating a pointer,
    //         the pointer is passed by value, but it is followed when
    //         using *pointerA, and thus this is like passing a reference.
    // "third" is passed by value. However, it's a pointer and that pointer
    //         is followed when using *pointerB, and thus this is like
    //         passing a reference.
    modifyParameters(first, &second, third);

    // "first" is still 1
    // "second" is now 42
    // "random" is now 42
    // "third" is still a pointer to "random" (unchanged)
    return 0;
}

Call by sharing?

The lesser used and known term that was coined is Call by sharing which applies to Ruby, JavaScript, Python, Java and so forth. It implies that all values are object, all values are boxed, and they copy a reference when they pass it as value. Unfortunately, in literature, the usage of this concept is not consistent, which is also why it's probably less known or used.

For the purpose of this article, call-by-sharing is call by value, but the value is always a reference.

Conclusion

In short: It's always pass by value, but the value of the variable is a reference. All primitive-methods return a new value and thus one can not modify it, all objects and arrays can have methods that modified their value, and thus one can modify it.

You can not affect the memory address of the parameter directly in the languages that use call-by-value, but you may affect what the parameter refers to. That is, you may affect the memory the parameter points to.

The statement Primitive Data Types are passed By Value and Objects are passed By Reference. is incorrect.

Photo of the Centrale Bibliotheek in Rotterdam, The Netherlands: an industrial looking building with metallic walls and various yellow pipes on the side.

Oldest comments (27)

Collapse
 
sleeplessbyte profile image
Derk-Jan Karrenbeld • Edited

The post that sparked this article:

This is a great article with a lot of valuable information, especially if you're a beginner. Check it out!

Collapse
 
flrnd profile image
Florian Rand • Edited

Hey I really enjoyed your article, but, if I had to link every comment and article I read about design that "sparks" me, I would need a whole dev.to only for me.

You made your point and it was very instructive, but with all due respect, I find this last comment not necessary. I simple search can give hundres of examples.

Shameful edit 😅

Collapse
 
sleeplessbyte profile image
Derk-Jan Karrenbeld • Edited

If you look at the original article you can see a discussion between me and that author and a link to this article.

Since that discussion was both very nice and interesting, and he asking me to link him this article so he could link that in his article, and the comment here is to link back to that -- no negative feelings whatsoever, which I understand this can be interpreted as :). I've edited the comment to make it more clear.

I actually think his original article, as stated in the comments there, is a great resource.

In general I completely agree with your statement, but the story here is, in my opinion, different.

Thread Thread
 
flrnd profile image
Florian Rand

Oh now It makes sense. I didn't read the comments on the other article and It looked like a totally diffent thing. Totally ashamed, my most sincere apologies.

Thread Thread
 
sleeplessbyte profile image
Derk-Jan Karrenbeld • Edited

It happens! We're all only human :)

Being on the spectrum, I find it very difficult.to.gauge these things so I rather have people call me out so that I can either fix it or apologise than not have that opportunity at all.

Thread Thread
 
flrnd profile image
Florian Rand

Yes, Next time, I'll keep my rule to comment Next day and not tired!

Collapse
 
powerc9000 profile image
Clay Murray

Great write up.

Collapse
 
sleeplessbyte profile image
Derk-Jan Karrenbeld

Thank you! Looking for my next subject, so let me know if there are things you'd like to see.

Collapse
 
val_baca profile image
Valentin Baca

Same for Java.

The way I phrase it is: Java/JS/Ruby/C/etc are always pass by value; for objects the value is the reference.

Collapse
 
sleeplessbyte profile image
Derk-Jan Karrenbeld

In JS among others, the value is actually always a reference. V8 for example, boxes all primitives as JSValue which is a reference type in C.

That said, for general everyday use, your phrasing is perfectly fine when you're just trying to get stuff done :)

Collapse
 
bbarbour profile image
Brian Barbour

This is super duper informative and now I know a whole lot more!

Collapse
 
sleeplessbyte profile image
Derk-Jan Karrenbeld

Happy to write it!

I did try to look up a bit of the literature since you mentioned it was almost nowhere to be found. If I ever find a more modern or better source, I'll link it for ya!

Collapse
 
johncip profile image
jmc

Call by sharing is sometimes called call by object too, for the reasons you mentioned.

The term call-by-value is problematic.

To be fair, it made more sense in the 60s. The distinction mattered when (a handful of) languages had call-by-reference semantics, but they're dead now. The hip thing nowadays is to be nominally call-by-value while also providing a way of passing around things that can reasonably be called references, a la carte.

It's similar to how variables are scoped in JS... mostly lexical, but with exceptions. But languages used to be all one way or all the other, and people forget to say "with exceptions."

The reasons feel similar to me as well. I think that in the early days, it was natural to think of variables as not much more than the memory addresses they started out as. But it's easier to reason about values directly, and over the years we've abstracted up towards that. In Clojure even the collections are immutable, and its creator distinguishes between values and place-oriented programming.

Collapse
 
sleeplessbyte profile image
Derk-Jan Karrenbeld

To be fair, it made more sense in the 60s.

This is absolutely true! I didn't mean the term itself is a problem, but it's causing confusion, in today's day and age.

The distinction mattered when (a handful of) languages had call-by-reference semantics, but they're dead now

I mean, most languages where it still exists, it's super clear when you use it (C#, PHP, sorta rust etc).

Call by sharing is sometimes called call by object too, for the reasons you mentioned.

Thank you for the link! Some great content in there. I did not know this, but TIL.

Collapse
 
johncip profile image
jmc • Edited

Yeah, it's funny to see the Python longhairs having the same argument as everyone else. FWIW I'm with Tim Peters:

"Joe, I think our son might be lost in the woods"
"Don't worry, I have his social security number"

i.e. we care about the object, not the references. This is important too:

in Python, the variables in the formal argument list are bound to the actual argument objects. the objects are shared between caller and callee; there are no "fresh locations" or extra "stores" involved.

So not only are the "values" mutable at a distance, the references aren't even copied.

I think you could argue that Clojure, where by default (a) everything is immutable and (b) values are copied*, is properly call-by-value. I might grudgingly toss C in there too, since pointers are first-class there and the abstraction around structs and arrays is thin.

Others might be CbV, in a narrow sense, but their emphasis on stateful/mutable/referenced things violate the spirit of it. Java at least has some primitives which I imagine need to get copied. Python doesn't even do that...

There's a bit in a book called Theoretical Introduction to Programming that applies here, I think:

... to say that int is integer arithmetic with bounds and overflow conditions is to say that it is not integer arithmetic.

Anyway, I agree with you that the OO languages are not CbR w/r/t objects (though I think they can be in spirit), and IMO you did a good job of explaining the history and the nuance.


* Clojure uses persistent data structures for collections, so technically those aren't 😑

Thread Thread
 
sleeplessbyte profile image
Derk-Jan Karrenbeld

i.e. we care about the object, not the references. This is important too:

Yes! This is super cool. Basically from my research I found out that Python, unlike Ruby and JavaScript, doesn't create a new "ref object" in the higher level language but actually directly assigns the reference, like C would.

Anyway, I agree with you that the OO languages are not CbR w/r/t objects (though I think they can be in spirit), and IMO you did a good job of explaining the history and the nuance.

❤❤ I also think you provide very valuable extra information!

I'm not very articulate at the moment but here is a meh response by me 😅 on the Python thang.

Valentin made a good comment and jmc went into more detail.

val_baca image

johncip image

Python isn't really an exception, but yes, I understand what you're saying. Looking through the python source code, it seems again that it is mostly what we are trying to say when no reference is copied. The re-assignment in python is actually copying a reference, but indeed, no new memory need be allocated, for that reference -- sorta, because the identifier (name) still needs to live... inside the memory.

So even though Python doesn't "copy" the reference like JavaScript does (create a new JSVal that points to the same object), it does so on a waaaay lower level (point directly to the original same object).

Ugh. It's giving me a slight headache 😅😅😅.

However, there are actually quite a few (mostly older) languages that don't copy at all, which would be those languages that are not call by value/sharing/object :).

The most interesting to me are copy-restore languages or those theoretical ones that only copy on write... a topic for another time.

Collapse
 
johncip profile image
jmc

I mean, most languages where it still exists, it's super clear when you use it (C#, PHP, sorta rust etc).

whoops, I meant to write "languages had default call-by-reference semantics"

Collapse
 
iduoad profile image
Iduoad

This actually was a semester long discussion with a professor of mine 😁.

Here is how I think of it:

Functions are borowed from maths, and in maths functions always copy !

Let's consider the following real function: R->R, f:x->x2
with a=4 if we evaluate f(a)=16 the variable a will still be equal to 4.

In programming(To be clear in most programming languages[1]) functions always copy their arguments, then work the copy. i.e functions are passing_by_value creatures, they always copy what they got, and it's what they copy ("reference type" or "value type") which define if they will act on the state or not.

As @Valentin_Baca pointed out Java behave in the same way, python too :

def by_value(ob):
    ob = []

def by_reference(ob):
    ob.append(11)

li = [0]*4
print(li) #[0,0,0,0]
by_value(li)
print(li) #[0,0,0,0]
by_reference(li)
print(li) #[0,0,0,0,11]

1: python is an exception, it has mutables and immutables and it passes arguments to functions by assigning them.

Please correct me if I'm wrong

Collapse
 
sleeplessbyte profile image
Derk-Jan Karrenbeld

Valentin made a good comment and jmc went into more detail.

val_baca image

johncip image

Python isn't really an exception, but yes, I understand what you're saying. Looking through the python source code, it seems again that it is mostly what we are trying to say when no reference is copied. The re-assignment in python is actually copying a reference, but indeed, no new memory need be allocated, for that reference -- sorta, because the identifier (name) still needs to live... inside the memory.

So even though Python doesn't "copy" the reference like JavaScript does (create a new JSVal that points to the same object), it does so on a waaaay lower level (point directly to the original same object).

Ugh. It's giving me a slight headache 😅😅😅.

However, there are actually quite a few (mostly older) languages that don't copy at all, which would be those languages that are not call by value/sharing/object :).

The most interesting to me are copy-restore languages or those theoretical ones that only copy on write... a topic for another time.

Collapse
 
rhymes profile image
rhymes • Edited

I agree with you wholeheartedly. Great article. The nomenclature doesn't help and this is not the only domain where better words and definitions would make things much easier, even for newcomers (see the concurrency domain for example)

Collapse
 
sleeplessbyte profile image
Derk-Jan Karrenbeld

Yes, concurrency/parallelism is still on my list to write about!

Collapse
 
kkkoshumu profile image
Kosumu

At the replaceWithOne function body, did you mean list = [1]?
because if it is [] then I cannot observe the unchanged.

Sorry if I got it wrong I'm super newbie.
Thank you very much for the article!

Collapse
 
sleeplessbyte profile image
Derk-Jan Karrenbeld

No need to apologise for asking!

I've changed the example to be more clear. Does this answer your question?

Collapse
 
ama profile image
Adrian Matei • Edited

I have trouble grasping the following section:

function reference_assignment(myRefMaybe) {
  myRefMaybe = { key: 42 }
}

var primitiveValue = 1
var someObject = { is: 'changed?' }

reference_assignment(primitiveValue)
primitiveValue
// => 1

reference_assignment(someObject)
// => { is: 'changed?' }
Enter fullscreen mode Exit fullscreen mode


`

As shown above, someObject has not been changed, because it was not a reference to someObject. In terms of the definitions before: it was not the memory
address of someObject that was passed, but a copy.

Isn't someObject not changed, because in the function the copy reference myRefMaybe is pointing to a totally new object { key: 42 }, and thus from this point on it will not modify the original, but if you would do myRefMaybe.is='indeed changed', you would see the results in someObject because the copy reference is referencing sameObject? Wouldn't this be more consistent with call by sharing definition...

I have written an article regarding this yesterday - Javascript call by value or by reference, actually by sharing, and would like to know if my mental model is flawed. Thanks

Collapse
 
sleeplessbyte profile image
Derk-Jan Karrenbeld

Call by sharing is explicitly mentioned at the bottom of this article and has, as far as my formal education goes, no consensus in definition, which is why I am hesitant in using that nomenclature, but yes. It is call by sharing given the definition you linked. That doesn't really change anything here.

Isn't someObject not changed, because in the function the copy reference myRefMaybe is pointing to a totally new object { key: 42 },

I don't quite understand this sentence. I think the point I was trying to make is that, if myRefMaybe was a reference, then re-assigning the value of that reference, would change the original definition too. But in JavaScript, it's a copy of a reference. Thus, re-assigning it doesn't change the original, but modifying the referenced object does change the original (as both the original and the argument / copy of the reference point to the same memory that stores the object).

Collapse
 
peerreynders profile image
peerreynders • Edited

A reference is simply a memory address that is automatically dereferenced by the language/runtime.

From that point of view

// (01) primitiveValue stores address #A
var primitiveValue = 1;
// (02) someObject stores address #B
var someObject = { is: 'changed?' };

// (03) myRefMaybe stores address #A from primitiveValue
reference_assignment(primitiveValue);
// (06) primitiveValue still has address #A
primitiveValue;

// (07) myRefMaybe stores address #B from someObject
reference_assignment(someObject);
// (10) someObject still has address #B
someObject;

function reference_assignment(myRefMaybe) {
  // (04) myRefMaybe still has address #A
  // (08) myRefMaybe still has address #B
  myRefMaybe = { key: 42 };
  // (05) myRefMaybe now has address #C
  // (09) myRefMaybe now has address #D
}
Enter fullscreen mode Exit fullscreen mode

if myRefMaybe was a reference, then re-assigning the value of that reference, would change the original definition too

To be able to change the "original definition" you would need the reference to the someObject reference itself (rather than the reference to the original object):

  • if someObject stores the address #B to the actual object
  • and that address #B is stored on behalf of someObject at address #Z
  • you need to place address #C at address #Z so that someObject now stores the address #C to that other object.

Update:

  • Call by Value: Copy the actual value.
  • Call by Sharing: Copy the address to the value.This makes it possible to share the value but not possible to replace the original value at the source variable.
  • Call by Reference: Copy the address of the "variable". This makes it possible to replace the original value at the source variable.

The way JavaScript behaves it actually never has to store anything at the address that is found in a variable - it only stores new addresses in variables.

  // (1) `myRefMaybe` has address #B as set by caller
  myRefMaybe = { key: 42 };
  // (2) New value `{ key: 42 }` is created
  // (3) `myRefMaybe` stores address #C to new value
Enter fullscreen mode Exit fullscreen mode
Collapse
 
ama profile image
Adrian Matei

When you put it like this is more clear to me. Thanks