Coding since 11yo, that makes it over 30 years now ~~~
Have a PhD in Comp Sci ~~~
Love to go on bike tours ~~~
I try to stay as generalist as I can in this crazy wide place coding is at now.
Nice one!
This can be made a little cleaner though by using zero-width assertions to divide up the camelCased words, that way you don't need to reinsert any captures and can do it all in one go...
I'd do something like:
the /(?<=[A-Za-z])(?=[A-Z])/ part matches the gap between words, i.e. between a letter and a following upper-case letter but not the letters themselves.
So, the whole regex matches any one of those gaps, and any string of non-letters.
Of course, this screws up numbers completely 🤷♂️
Coding since 11yo, that makes it over 30 years now ~~~
Have a PhD in Comp Sci ~~~
Love to go on bike tours ~~~
I try to stay as generalist as I can in this crazy wide place coding is at now.
BTW I've updated the main line of code from /(?<=[a-z])(?=[A-Z])... to /(?<=[A-Za-z])(?=[A-Z])... matching the explanation below (sort of changed my mind on that halfway through writing).
Using [A-Za-z] just makes sure that SentencesContainingAWordWithOneLetter get translated correctly.
Of course it breaks support for CONSTANT_CASE, but what can you do?
Coding since 11yo, that makes it over 30 years now ~~~
Have a PhD in Comp Sci ~~~
Love to go on bike tours ~~~
I try to stay as generalist as I can in this crazy wide place coding is at now.
Well, it's a trade-off. On the one hand "AAAA" is the word "A" four times in UpperCamelCase, on the other hand it's the word "AAAA" in CONSTANT_CASE. In the end the problem is just ambiguous.
You've solved the tradeoff by only breaking camelcase words where the next word has more than one char
That may be a fair strategy since it means that constant case is well covered, but does mean some pretty reasonable camel case sentences won't work. ("ThisIsASpinalTap" -> "this-isa-spinal-tap" )
In the end, it's about figuring out which cases are important and what rules you want to cover.
I think a good medium might be to take blocks of uppercase letters surrounded by non-letters, and stop them from being clobbered as camel case words by lowercasing them.
f=s=>s.replace(/(?<=[^A-Za-z]|^)[A-Z]+(?=[^A-Za-z]|$)/g,w=>w.toLowerCase()).replace(// edited to handle numbers a bit/(?<=[A-Za-z])(?=[A-Z0-9])|(?<=[0-9])(?=[A-Za-z])|[^A-Za-z0-9]+/g,'-').toLowerCase()a=["ThisIsASpinalTapAAAA","THIS_IS_A_SPINAL_TAP_A_A_A_A","thisIsASpinalTapAAAA","this-is-a-spinal-tap-a-a-a-a","this is a spinal tap a a a a"]a.map(f)// -> ["this-is-a-spinal-tap-a-a-a-a","this-is-a-spinal-tap-a-a-a-a","this-is-a-spinal-tap-a-a-a-a","this-is-a-spinal-tap-a-a-a-a","this-is-a-spinal-tap-a-a-a-a"]
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Nice one!
This can be made a little cleaner though by using zero-width assertions to divide up the camelCased words, that way you don't need to reinsert any captures and can do it all in one go...
I'd do something like:
the
/(?<=[A-Za-z])(?=[A-Z])/
part matches the gap between words, i.e. between a letter and a following upper-case letter but not the letters themselves.So, the whole regex matches any one of those gaps, and any string of non-letters.
Of course, this screws up numbers completely 🤷♂️
thanks for the thorough explanation on that! 🤯🤯
No worries, hope it helps!
BTW I've updated the main line of code from
/(?<=[a-z])(?=[A-Z])...
to/(?<=[A-Za-z])(?=[A-Z])...
matching the explanation below (sort of changed my mind on that halfway through writing).Using
[A-Za-z]
just makes sure that SentencesContainingAWordWithOneLetter get translated correctly.Of course it breaks support for CONSTANT_CASE, but what can you do?
How do you like this?
Well, it's a trade-off. On the one hand "AAAA" is the word "A" four times in UpperCamelCase, on the other hand it's the word "AAAA" in CONSTANT_CASE. In the end the problem is just ambiguous.
You've solved the tradeoff by only breaking camelcase words where the next word has more than one char
That may be a fair strategy since it means that constant case is well covered, but does mean some pretty reasonable camel case sentences won't work. ("ThisIsASpinalTap" -> "this-isa-spinal-tap" )
In the end, it's about figuring out which cases are important and what rules you want to cover.
I think a good medium might be to take blocks of uppercase letters surrounded by non-letters, and stop them from being clobbered as camel case words by lowercasing them.