Lately I was trying to get information about ftp paths for couple of directories. I implement catching needed directories using regular expressions. But this is not a common solution.
Despite the fact that I am a big fan of regex at all, I mainly use it to catch something in large documents. Mostly I try to use web solution like regex101 (I am not an author of this, but find it very useful).
Regular expressions can be really quick solutions to complex parsing problems.
But at a certain level of complexity or performance requirements a custom solution (usually some sort of state machine) is better suited.
If you see a regular expression as a good fit, please always leave a detail comment above it.
'Inappropriate intimacy' for code smells ~ usually end up with functions/methods (that should belong in different classes) in the same class. This in turn leads to areas that shouldn't interact being tightly coupled. Often end up refactoring
Also, forgot the name of the code smell but ~ having code for a feature that you think will be there in future but isn't there yet. Makes it hard to debug after a long time away from the code. I bet most of you guys have this too?
As for regex, regex makes me cry. Still practicing. Curious as to what resources you guys used to learn/reference regex. I use RegExr for referencing and testing
Edit: Seen @Pawel's link to Regex101 ~ definitely going to be fiddling around with that more.
Quick story: when I started with my current employer, one of the first things I did with my mentor was strip out over 900 regular expressions and replace them with an HTML parser. One lesson that was hit home during that time is regular expressions are for "Regular" languages (meaning they have some predictable pattern to them). In our case, HTML is not a regular language and regular expressions was a poor choice that caused many, many bugs. It got much better when we replaced them with an HTML parser.
I've been a professional C, Perl, PHP and Python developer.
I'm an ex-sysadmin from the late 20th century.
These days I do more Javascript and CSS and whatnot, and promote UX and accessibility.
Use them all the time outside of "programming", like in shell commands or quick one-off hacks. Only use them in code when they're run against predictable input.
That's it. Just validate that they match something before assuming that they're right, and if your predictable input fails to match, log the context.
When I finally really understood regular expressions, I was overusing it for everything for a short time. My colleagues were joking that my code was becoming unreadable for being riddled with/by RegExp and that I was writing more "strange Emojis" than JavaScript.
My current best practice is: if the comment that is necessary to explain the RegExp to my colleagues is longer than the RegExp itself, it's better to split it into multiple parts or rewrite the code into an understandable parser. Vice versa, the code smell would be any RegExp that requires a comment longer than itself to understand it will in 99% of all cases better be rewritten without (too long) regular expressions.
The other (obvious) code smell is the attempt to use regular expressions to solve irregular problems (like HTML).
Definitely multiple simple expressions over one big one. The benefits of simplicity far outstrip any performance that might be gained, and greatly improve correctness.
I personally see regular expressions no better than a magic string. They stand out in code like a sore spot in your mouth. The more that regex tries to do the more that sore hurts and paradoxically becomes something you keep on agitating.
My approach with regexes, if I truly decide they are needed, focuses on creating progressive filters. I'll use regexes that progressively hone in on what I am exactly looking for.
I don't use regexes for validation. Creating a correct regex that validates anything is way harder than it seems, and the result is confusing and unmaintainable.
I saw a regex for validating an SMTP adress once, and it was a glorious, ridiculous piece of work. Something like 600 or 700 characters worth. Pretty sure that represents an anti-pattern! :-P
I mainly use regexes for searching/replacing, or for testing for presence of specific fragments. A common one for me is checking to see if a link URL is an absolute address, a mail address or a relative path, for example. Simple, and reasonable.
Other have expressed most code smells I've dealt with. Whenever we'd have to use a regex, we make sure we use libraries from verbalexpressions.github.io/. Its an agreeable balance between readable code & using regexes when necessary.
Top comments (19)
Lately I was trying to get information about ftp paths for couple of directories. I implement catching needed directories using regular expressions. But this is not a common solution.
Despite the fact that I am a big fan of regex at all, I mainly use it to catch something in large documents. Mostly I try to use web solution like regex101 (I am not an author of this, but find it very useful).
Thumbs up for Regex 101. I don't think i can write Regular expressions without it...Lol
For the same there is debuggex.com/ which I find more friendly and readable.
Wow! debuggex seems to be better. Visualization is really important (Not just for Regex).
Thanks Pawel
Thanks for the link. Definitely a life saver
Regular expressions can be really quick solutions to complex parsing problems.
But at a certain level of complexity or performance requirements a custom solution (usually some sort of state machine) is better suited.
If you see a regular expression as a good fit, please always leave a detail comment above it.
In case someone is trying to figure out if a RegEx is the right fit for their HTML parsing problem here is my favorite post on that topic.
'Inappropriate intimacy' for code smells ~ usually end up with functions/methods (that should belong in different classes) in the same class. This in turn leads to areas that shouldn't interact being tightly coupled. Often end up refactoring
Also, forgot the name of the code smell but ~ having code for a feature that you think will be there in future but isn't there yet. Makes it hard to debug after a long time away from the code. I bet most of you guys have this too?
As for regex, regex makes me cry. Still practicing. Curious as to what resources you guys used to learn/reference regex. I use RegExr for referencing and testing
Edit: Seen @Pawel's link to Regex101 ~ definitely going to be fiddling around with that more.
Use named capture groups.
It makes it so much easier to correlate results with the matching pattern.
I have no idea how I didn't know about those before. You have changed my life.
This. Amen. I was about to say something similar.
Quick story: when I started with my current employer, one of the first things I did with my mentor was strip out over 900 regular expressions and replace them with an HTML parser. One lesson that was hit home during that time is regular expressions are for "Regular" languages (meaning they have some predictable pattern to them). In our case, HTML is not a regular language and regular expressions was a poor choice that caused many, many bugs. It got much better when we replaced them with an HTML parser.
I usually add a comment with two key things: a sample of the input it's supposed to parse, and a list of what's getting pulled out as a capture group.
But even then, I use them with reluctance, and only with rigorous unit tests to make sure they do exactly what I want.
Use them all the time outside of "programming", like in shell commands or quick one-off hacks. Only use them in code when they're run against predictable input.
That's it. Just validate that they match something before assuming that they're right, and if your predictable input fails to match, log the context.
When I finally really understood regular expressions, I was overusing it for everything for a short time. My colleagues were joking that my code was becoming unreadable for being riddled with/by RegExp and that I was writing more "strange Emojis" than JavaScript.
My current best practice is: if the comment that is necessary to explain the RegExp to my colleagues is longer than the RegExp itself, it's better to split it into multiple parts or rewrite the code into an understandable parser. Vice versa, the code smell would be any RegExp that requires a comment longer than itself to understand it will in 99% of all cases better be rewritten without (too long) regular expressions.
The other (obvious) code smell is the attempt to use regular expressions to solve irregular problems (like HTML).
Definitely multiple simple expressions over one big one. The benefits of simplicity far outstrip any performance that might be gained, and greatly improve correctness.
Yes!
I personally see regular expressions no better than a magic string. They stand out in code like a sore spot in your mouth. The more that regex tries to do the more that sore hurts and paradoxically becomes something you keep on agitating.
My approach with regexes, if I truly decide they are needed, focuses on creating progressive filters. I'll use regexes that progressively hone in on what I am exactly looking for.
I don't use regexes for validation. Creating a correct regex that validates anything is way harder than it seems, and the result is confusing and unmaintainable.
I saw a regex for validating an SMTP adress once, and it was a glorious, ridiculous piece of work. Something like 600 or 700 characters worth. Pretty sure that represents an anti-pattern! :-P
I mainly use regexes for searching/replacing, or for testing for presence of specific fragments. A common one for me is checking to see if a link URL is an absolute address, a mail address or a relative path, for example. Simple, and reasonable.
Other have expressed most code smells I've dealt with. Whenever we'd have to use a regex, we make sure we use libraries from verbalexpressions.github.io/. Its an agreeable balance between readable code & using regexes when necessary.