Discussion on: Why is our source code so boring?

View post

Line length limits are to keep things easy to follow. Most sane people these days go to 120 or so, not 80 (possibly more), but the concept is more that you're still able to fit the whole line on a single line in your editor, because wrapped lines are hard to follow and horizontal scroll-off makes it easy to miss things. Those aspects haven't really changed since punch cards died off. Some languages are more entrenched in this (some dialects of Forth for example), but it's generically useful no matter where you are, and has stuck around for reasons other than just punch cards.

Monospaced fonts are similar, they got used because that's all you had originally (little to do with punch cards there, the earliest text termiansl were character cell designs, partly because of punch cards but also partly because it was just easier to interface with them that way (you didn't need any kind of font handling, and text wrapping was 100% predictable). These days they're still used because some people still code in text only environments, but also because it makes sure that everything lines up in a way it's easy to follow. Proportional fonts have especially narrow spaces in many cases, which makes indentation difficult to follow at times, and the ability to line up a sequence of expressions seriously helps readability.

As far as graphical coding environments, the issue there is efficiency. Information density is important, and graphical coding environments are almost always lower information density than textual source code (this, coincidentally, is part of why they're useful for teaching). I can write a program in Scratch and in Python, and the Python code will take up less than half the space on-screen that the Scratch code will. On top of that, it often gets difficult to construct very large programs (read as 'commercially and industrially important programs') in graphical languages and keep track of them, both because of that low information density, and because the flow control gets unwieldy in many cases.

As for ligatures, it's hit or miss whether they realistically help. I don't personally like them myself, they often screw with the perceptual alignment of the code (because they often have wider spacing on the sides than the equivalent constructs without ligatures, and it's not unusual for different 2 or 3 character ligatures to have different spacing as well) and they make it easier to accidentally misread code (for example == versus === without surrounding code to compare length against).

I'm not particularly fond of using fonts to encode keyword differences for a lot of the same reasons as I don't like ligatures. It's also hard to find fonts that are distinctive enough relative to each other but still clearly readable (that sample picture fails the second requirement, even if it is otherwise a good demonstration).

You run into the same types of problems though when you start looking at custom symbols instead of plain ASCII text. APL has issues with this because of it's excessive use of symbols, but I run into issues with this just using non ASCII glyphs in string literals on a regular basis (if most people have to ask 'How do I type that on a keyboard?', you shouldn't be using it in your code).

Andrew (he/him) • Nov 4 '19

The funny thing is, APL probably has a higher "information density" than just about any other programming language that has ever been created, and it was one of the very first languages.

But people don't like entirely symbolic code, it seems. We want something in-between a natural, written language and just strings of symbols.

Will we be stuck with ASCII forever?

Austin S. Hemmelgarn • Nov 4 '19

There's a 'sweet spot' for information density in written languages (natural or otherwise). Too high, and the learning curve is too high for it to be useful. Too low, and it quickly becomes unwieldy to actually use it. You'll notice if you look at linguistic history that it's not unusual for logographic, ideographic, and pictographic writing systems to evolve towards segmental systems over time (for example, Egyptian Hieroglyphs eventually being replaced with Coptic, or the Classical Yi script givning way to a syllabary), and the same principle is at work there.

Most modern textual programming languages are right about there right now. There's some variance one way or the other for some linguistic constructs, but they're largely pretty consistent once you normalize symbology (that is, ignoring variance due to different choices of keywords or operators for the same semantic meaning).

The problem with natural language usage for this though is a bit different. The information density is right around that sweet spot, and there's even a rather good amount of erasure coding built in, but it's quite simply not precise enough for most coding usage. I mean, if we wanted to all start speaking Lojban (never going to happen), or possibly Esperanto (still unlikely, but much more realistic than Lojban), maybe we could use 'natural' language to code, but even then it's a stretch. There's quite simply no room in programming for things like metaphors or analogies, and stuff like hyperbole or sarcasm could be seriously dangerous if not properly inferred by a parser (hell, that's even true of regular communication).

As far as ASCII, it's largely practicality any more. Almost any sane modern language supports Unicode in the source itself (not just literals), and I've even personally seen some stuff using extended Latin (stuff like å, ü, or é), Greek, or Cyrillic characters in stuff like variable names. The problem with that is that you have to rely on everyone else who might be using the code to have appropriate fonts to be able to read it correctly, as well as contending with multi-byte encodings when you do any kind of text processing on it. It's just so much simpler to use plain ASCII, which works everywhere without issue (provided you don't have to interface with old IBM mainframes).

Thomas H Jones II • Nov 4 '19

How could you not love EBCDIC? :p

Thomas H Jones II • Nov 4 '19

Humans are funny critters. There was a recentishly-published study about information-density in various human languages. Basically, whether the language was oriented towards a lower or higher number of phonemes-per-minute, the amount of information conveyed across a given time-span was nearly the same.

One of the values of ASCII vice even Unicode is the greater degree of distinctness to the available tokens. I mean, adding support for Unicode in DNS has been a boon for phishers. Further, the simplicity of ASCII means I have fewer letters that I need to pay close attention to. Also means fewer keys I have to create finger-memory for when I want to type at a high speed ...returning us to the phonemes-per-minute versus effective information-density question.