Discussion on: Should a modern programming language assume a byte is 8-bits in size?

View post

Yes, for something like integer Leaf defines a default size based on the natural word type. You can request specific sizes if you'd like as well, such as integer 16bit, or even integer 23bit if you want.

float also has a default (64-bit usually), but allows the high or low modifiers.

There's also an octet for a binary 8bit -- note that binary and integer are handled differently, unlike say signed vs. unsigned in other languages.

So the question for me is whether byte is an alias for octet or whether it's an alias binary Nbit where N is platform dependent.

Sam Ferree • Mar 23 '18

it's an alias for octet.

binary Nbit where N is platform dependent is called a WORD

edA‑qa mort‑ora‑y • Mar 23 '18

No, a word typically refers to a natural integer or intstruction size, so 32-bit or 64-bit.

The term byte I'm using comes from the C/c++ standards that define it as platform dependent, and that is it's traditional definition. It's only after hardware standardized on 8-bits that it got that meaning.

Erebos Manannán • Mar 23 '18

Random thought along these lines, I really prefer my numbers to not have a fixed size as you're almost guaranteed to run into issues with them eventually.

Whenever possible I'd like the default common number type to be limitless, automatically growing to contain whatever you want to store in it. Then on top of that you can have optimized types for those special cases where you really do want an int8.

Vast majority of uses don't have such performance constraints that they need to micro-optimize such things, but there are a large number of cases where even after careful thinking you can end up with bugs due to fixed bit length, especially given a decade or two of progress.

Vinay Pai • Mar 23 '18

I have exactly the opposite preference. At least for the kind of code I normally write it's exceptionally rare for most numbers to grow without bound and come remotely close to overflowing... especially 64 bit numbers. It would be a huge waste for every counter, every database ID, every number everywhere to be handled with arbitrary precision rather than native types.

Erebos Manannán • Mar 23 '18

"exceptionally rare" sounds like you'd practically never take it into consideration and thus can end up with related bugs -> unlimited would be better default to avoid accidents.

When you KNOW you're fine with an int64 like for auto-increment database IDs, loop counters, etc., then you can still use that. How much is the typical application going to be slowed down by using an arbitrary precision integer over native types? Zilch, you will never even notice a few extra CPU cycles going on.

For those people who do performance critical things and performance optimization it's fine to offer the optimized types, but they should be opt-in when appropriate, not a source for the occasional "gotcha, you didn't think this would ever wrap now did you?"

Erebos Manannán • Mar 23 '18

In short I guess you could summarize my stance as: "programming languages should by default empower conveniently getting things done with minimal chance for nasty surprises"

Computers are incredibly powerful nowadays compared to the 80286 days, and it would be better to have the programming languages make it as easy as possible for you to make programs that do what you wanted to, rather than programs that save a few CPU cycles in return for micro-optimizations that nobody will notice until they hit a rare bug with them.

Again, no need to remove the option for optimized types for those who need them, but the vast majority of programming nowadays tends to be about trying to make the correct thing happen without bugs rather than optimizing the number of CPU cycles that it takes.

Vinay Pai • Mar 23 '18

I can't think of the last time I had a bug caused by integer overflow. Does that happen to you a lot?

The cost of using arbitrary precision everywhere is way more than "a few cycles". You're adding overhead all over the place. Incrementing a number goes from a simple machine instruction that can be pipelined to a loop which will likely result in a pipeline stall.

You can't just allocate an array of integers because you don't know how much memory that will need. Increment the nth element of an array? That is potentially an O(n) operation now because it might increase in size. Or your array of integers could actually be an array of pointers to the structs holding your arbitrary precision integers. That DRASTICALLY slows things down on modern processors because your memory references have worse locality so your all-important L1 cache hit ratio goes down the tubes.

It's like making airliners at 10,000 feet instead of 30,000 feet to avoid the risk of cabin depressurization.

Erebos Manannán • Mar 23 '18

When you say "drastically" you mean "has literally no perceivable impact at all in most cases". The O(n) timing etc. means literally nothing if you're talking about a general scope. There are places where speed matters and those places are getting increasingly rare.

I follow the security scene a fair bit and especially there I keep reading about random pieces of software constantly running into integer overflow/underflow issues.

They ARE a cause for bugs when e.g. a developer thinks "well I'm just asking the users to input an age, and a normal human lives to be at most 100 years old so I'll just use int8" and then the user doesn't know or care about what constraints the programmer had in mind and tries to use the same application to catalog the ages of antique items, species, or planets.

"Premature optimization is the root of all evil" is a fitting quote for this discussion. Optimize where you need it, don't worry and micro-optimize your CPU cycles everywhere because some school teacher taught you about O(...) notation. YOUR time is often much more valuable (i.e. you getting things done, without nasty surprises that can lead to unhappy users, security issues, or anything else) than the CPU cycles.

How often you care or even see if what your L1 cache hit ratio is when you write a typical desktop or mobile app, or any web frontend/backend? Much less often than you care about having code that just works regardless of what size a number the user (malicious or not) decided to give you.

And AGAIN, when you DO need to care, the option can be there to be explicit.

Vinay Pai • Mar 23 '18

People mindlessly repeating mantras like "premature optimization is the root of all evil" is the root of all evil.

Erebos Manannán • Mar 23 '18

I think my comment had quite a bit more content to it than that.

cvedetails.com/google-search-resul...

About 16,500 results

cvedetails.com/google-search-resul...

About 3,150 results

And these are just reported security issues, not bugs caused by choosing the wrong integer size.

Here's a new quote, it's quoting me saying it just here: "People quoting O(...) notation and talking about L1 cache as if any of it mattered at all for most cases are the root of all evil" ;)

Vinay Pai • Mar 23 '18

Okay let's say you replaced them with arbitrary precision arithmetic. How many new bugs would be caused there by malicious input causing huge memory allocations and blowing up the server?

Erebos Manannán • Mar 23 '18

Quick estimate: probably fewer. For one it'd be easier to do an if (length > MAX_LENGTH) -type check.

Also if you use user input to determine how much memory you allocate you're probably doing something wrong anyway, regardless of what kind of arithmetic you're doing. Take a file upload, do you trust on the client to tell you "I'm sending you a file that is 200kB in size, and here it comes" and then trust the client, or do you just take in an arbitrary file stream and then if it's too big just say "ok enough" at some point and disconnect?

Anyway I tire of this mindless banter. I've made my point.

edA‑qa mort‑ora‑y • Mar 23 '18

A few notes, related to Leaf, for this discussion:

I intend on doing over/underflow checks by default (unless turned off for optimization). Thus an overflow will result in an error.
I will provide logical ranges for values, like integer range(0,1000) so you can give real world limits to numbers and let an appropriate type be picked.
Arbitrary precision is extremely costly compared to native precision. A fixed, but very high, precision, is not as costly, but doesn't solve anything. On that note, you can do integer 1024bit in Leaf if you want.
Leaf constants are arbitrary rationals and high precision floating points during compilation. Conversions that lose precision (like float -> integer) are also disallowed. This helps in several situations.

Vinay Pai • Mar 23 '18

So you pointed to a bunch of bugs caused by a lack of range checks. Your solution to avoid creating another bug is to... add a range check. Brilliant! You have indeed made your point.