Just recently I heard again that PHP folks still talk about single quotes vs. double quotes and that using single quotes is just a micro optimisation but if you get used to using single quotes all the time you'd save a bunch of CPU cycles!
"Everything has already been said, but not yet by everyone" – Karl Valentin
It is in this spirit that I am writing an article about the same topic Nikita Popov did already 12 years ago (if you are reading his article, you can stop reading here).
What is the fuzz all about?
PHP performs string interpolation, in which it searches for the use of variables in a string and replaces them with the value of the variable used:
$juice = "apple";
echo "They drank some $juice juice.";
// will output: They drank some apple juice.
This feature is limited to strings in double quotes and heredoc. Using single quotes (or nowdoc) will yield a different result:
$juice = "apple";
echo 'They drank some $juice juice.';
// will output: They drank some $juice juice.
Look at that: PHP will not search for variables in that single quoted string. So we could just start using single quotes everywhere. So people started suggesting changes like this ..
- $juice = "apple";
+ $juice = 'apple';
.. because it'll be faster and it'd save a bunch of CPU cycles with every execution of that code because PHP does not look for variables in single quoted strings (which are non-existent in the example anyway) and everyone is happy, case closed.
Case closed?
Obviously there is a difference in using single quotes vs. double quotes, but in order to understand what is going on we need to dig a bit deeper.
Even though PHP is an interpreted language it is using a compile step in which certain parts play together to get something the virtual machine can actually execute, which is opcodes. So how do we get from PHP source code to opcodes?
The lexer
The lexer scans the source code file and breaks it down into tokens. A simple example of what this means can be found in the token_get_all()
function documentation. A PHP source code of just <?php echo "";
becomes these tokens:
T_OPEN_TAG (<?php )
T_ECHO (echo)
T_WHITESPACE ( )
T_CONSTANT_ENCAPSED_STRING ("")
We can see this in action and play with it in this 3v4l.org snippet.
The parser
The parser takes these tokens and generates an abstract syntax tree from them. An AST representation of the above example looks like this when represented as a JSON:
{
"data": [
{
"nodeType": "Stmt_Echo",
"attributes": {
"startLine": 1,
"startTokenPos": 1,
"startFilePos": 6,
"endLine": 1,
"endTokenPos": 4,
"endFilePos": 13
},
"exprs": [
{
"nodeType": "Scalar_String",
"attributes": {
"startLine": 1,
"startTokenPos": 3,
"startFilePos": 11,
"endLine": 1,
"endTokenPos": 3,
"endFilePos": 12,
"kind": 2,
"rawValue": "\"\""
},
"value": ""
}
]
}
]
}
In case you wanna play with this as well and see how the AST for other code looks like, I found https://phpast.com/ by Ryan Chandler and https://php-ast-viewer.com/ which both show you the AST of a given piece of PHP code.
The compiler
The compiler takes the AST and creates opcodes. The opcodes are the things the virtual machine executes, it is also what will be stored in the OPcache if you have that setup and enabled (which I highly recommend).
To view the opcodes we have multiple options (maybe more, but I do know these three):
- use the vulcan logic dumper extension. It is also baked into 3v4l.org
- use
phpdbg -p script.php
to dump the opcodes - or use the
opcache.opt_debug_level
INI setting for OPcache to make it print out the opcodes- a value of
0x10000
outputs opcodes before optimisation - a value of
0x20000
outputs opcodes after optimisation
- a value of
$ echo '<?php echo "";' > foo.php
$ php -dopcache.opt_debug_level=0x10000 foo.php
$_main:
...
0000 ECHO string("")
0001 RETURN int(1)
Hypothesis
Coming back to the initial idea of saving CPU cycles when using single quotes vs. double quotes, I think we all agree that this would only be true if PHP would evaluate these strings at runtime for every single request.
What happens at runtime?
So let's see which opcodes PHP creates for the two different versions.
Double quotes:
<?php echo "apple";
0000 ECHO string("apple")
0001 RETURN int(1)
vs. single quotes:
<?php echo 'apple';
0000 ECHO string("apple")
0001 RETURN int(1)
Hey wait, something weird happened. This looks identical! Where did my micro optimisation go?
Well maybe, just maybe the ECHO
opcode handler's implementation parses the given string, although there is no marker or something else which tells it to do so ... hmm 🤔
Let's try a different approach and see what the lexer does for those two cases:
Double quotes:
T_OPEN_TAG (<?php )
T_ECHO (echo)
T_WHITESPACE ( )
T_CONSTANT_ENCAPSED_STRING ("")
vs. single quotes:
T_OPEN_TAG (<?php )
T_ECHO (echo)
T_WHITESPACE ( )
T_CONSTANT_ENCAPSED_STRING ('')
The tokens are still distinguishing between double and single quotes, but checking the AST will give us an identical result for both cases - the only difference is the rawValue
in the Scalar_String
node attributes, that still has the single/double quotes, but the value
uses double quotes in both cases.
New Hypothesis
Could it be, that string interpolation is actually done at compile time?
Let's check with a slightly more "sophisticated" example:
<?php
$juice="apple";
echo "juice: $juice";
Tokens for this file are:
T_OPEN_TAG (<?php)
T_VARIABLE ($juice)
T_CONSTANT_ENCAPSED_STRING ("apple")
T_WHITESPACE ()
T_ECHO (echo)
T_WHITESPACE ( )
T_ENCAPSED_AND_WHITESPACE (juice: )
T_VARIABLE ($juice)
Look at the last two tokens! String interpolation is handled in the lexer and as such is a compile time thing and has nothing to do with runtime.
For completeness, let's have a look at the opcodes generated by this (after optimisation, using 0x20000
):
0000 ASSIGN CV0($juice) string("apple")
0001 T2 = FAST_CONCAT string("juice: ") CV0($juice)
0002 ECHO T2
0003 RETURN int(1)
This is different opcode than we had in our simple <?php echo "";
example, but this is okay because we are doing something different here.
Get to the point: should I concat or interpolate?
Let's have a look at these three different versions:
<?php
$juice = "apple";
echo "juice: $juice $juice";
echo "juice: ", $juice, " ", $juice;
echo "juice: ".$juice." ".$juice;
- the first version is using string interpolation
- the second is using a comma separation (which AFAIK only works with
echo
and not with assigning variables or anything else) - and the third option uses string concatenation
The first opcode assigns the string "apple" to the variable $juice
:
0000 ASSIGN CV0($juice) string("apple")
The first version (string interpolation) is using a rope as the underlying data structure, which is optimised to do as little string copies as possible.
0001 T2 = ROPE_INIT 4 string("juice: ")
0002 T2 = ROPE_ADD 1 T2 CV0($juice)
0003 T2 = ROPE_ADD 2 T2 string(" ")
0004 T1 = ROPE_END 3 T2 CV0($juice)
0005 ECHO T1
The second version is the most memory effective as it does not create an intermediate string representation. Instead it does multiple calls to ECHO
which is a blocking call from an I/O perspective so depending on your use case this might be a downside.
0006 ECHO string("juice: ")
0007 ECHO CV0($juice)
0008 ECHO string(" ")
0009 ECHO CV0($juice)
The third version uses CONCAT
/FAST_CONCAT
to create an intermediate string representation and as such might do more memory copies and/or use more memory than the rope version.
0010 T1 = CONCAT string("juice: ") CV0($juice)
0011 T2 = FAST_CONCAT T1 string(" ")
0012 T1 = CONCAT T2 CV0($juice)
0013 ECHO T1
So ... what is the right thing to do here and why is it string interpolation?
String interpolation uses either a FAST_CONCAT
in the case of echo "juice: $juice";
or highly optimised ROPE_*
opcodes in the case of echo "juice: $juice $juice";
, but most important it communicates the intent clearly and none of this has been bottle neck in any of the PHP applications I have worked with so far, so none of this actually matters.
TLDR
String interpolation is a compile time thing. Granted, without OPcache the lexer will have to check for variables used in double quoted strings on every request, even if there aren't any, waisting CPU cycles, but honestly: The problem is not the double quoted strings, but not using OPcache!
However, there is one caveat: PHP up to 4 (and I believe even including 5.0 and maybe even 5.1, I don't know) did string interpolation at runtime, so using these versions ... hmm, I guess if anyone really still uses PHP 5, the same as above applies: The problem is not the double quoted strings, but the use of an outdated PHP version.
Final advice
Update to the latest PHP version, enable OPcache and live happily ever after!
[Edit: August 16th]
What about sprintf()
?
So actually I intended to say that none of this is a performance problem, if you are using string interpolation, single quotes and concatenation or anything else. Someone stepped up and mentioned sprintf()
and where this clocks in performance-wise. So for the sake of completeness, lets have a look at sprintf()
:
<?php
$juice = "apple";
echo sprintf("juice: %s %s", $juice, $juice);
compiles to the following opcode:
0000 ASSIGN CV0($juice) string("apple")
0001 INIT_FCALL 3 128 string("sprintf")
0002 SEND_VAL string("juice: %s %s") 1
0003 SEND_VAR CV0($juice) 2
0004 SEND_VAR CV0($juice) 3
0005 V1 = DO_ICALL
0006 ECHO V1
A quick benchmark shows that the sprintf()
variant takes 14 to 21 times as long as the string interpolation variant on my local machine.
Here comes the catch: this is only true up to PHP 8.3, PHP 8.4 comes with another compile time optimisation that will treat sprintf()
calls that just have %s
and %d
in them as if you wrote string interpolation:
0000 ASSIGN CV0($juice) string("apple")
0001 T2 = ROPE_INIT 4 string("juice: ")
0002 T2 = ROPE_ADD 1 T2 CV0($juice)
0003 T2 = ROPE_ADD 2 T2 string(" ")
0004 T1 = ROPE_END 3 T2 CV0($juice)
0005 ECHO T1
So the final advice still holds: update to the latest PHP version (well, maybe wait with upgrading to PHP 8.4 until there is a stable release).
Top comments (6)
After 20 years using PHP I've settled on double quotes. Too much text has apostrophes & I hate the look of having to escape them and of concatenated strings with vars.
That used to be my reasoning until the frontend developer I worked with at the time told me that I shouldn’t be using single quotes as apostrophes in the first place since they’re different characters, so now I make a point of using typographically correct apostrophes in dictionaries :
I tend to mostly use single quotes for short strings that won't change because I think they look cleaner :
But for full sentences like in logs or dictionaries I'll use double quotes by default so I don't have to change or resort to tedious interpolation when I need to add a variable.
literally nothing to do with php, but as a javascript ho (and never breaking out of my typing in all lowercase to seem ~artsy~ since high-school), i am a proponent of single-quotes and will, in fact, die on that hill 💅
Thanks for sharing your thoughts on Florian's article! You've highlighted some fascinating angles that really got me thinking.
The historical perspective on PHP's evolution is intriguing. I'd love to dive into old changelogs or interview PHP core developers to pinpoint when string interpolation moved to compile-time. That transition must have been a game-changer for performance.
Your points about complex string scenarios are spot-on. I've wrestled with nested variable interpolation in large heredoc blocks before - it can get messy fast! Maybe we could explore some best practices or clever workarounds for those tricky cases?
The team dynamics aspect is crucial and often overlooked. In my experience, aligning on a consistent quoting style can prevent a lot of unnecessary code review debates. Perhaps we could discuss strategies for reaching team consensus on these conventions?
Security is always paramount. I wonder if there are any static analysis tools specifically designed to catch string interpolation vulnerabilities? That could be a valuable addition to many CI/CD pipelines.
Your bigger picture view is refreshing. It reminds me of Donald Knuth's famous quote: "Premature optimization is the root of all evil." Instead of obsessing over quotes, maybe we should talk about how to effectively profile PHP applications to find real bottlenecks?
Thanks for sparking such an engaging discussion! You've inspired me to dig deeper into PHP internals and rethink some of my own coding practices. Keep sharing these insightful perspectives!
You make really good points 💯
Single quotes! As simple as this. Faster interpretation, better consistency with major modern open-source projects, clearer IDE syntax highlighting, PSR compliance, and so on.
Only, and only use double quotes if you need to interpolate a variable (although concatenation might be more appropriate in this scenario though).