TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Regular expressions you can read: a visual syntax and UI

172 pointsby secureabout 9 years ago

17 comments

kileywmabout 9 years ago
As someone who has crafted thousands of complex regular expression rules for data capture, here is my take:<p>1. This is a fine idea to aid regex newbies in crafting their expressions. I see this as a gateway instead of a longterm tool. The expressions won&#x27;t be optimal (by no fault of the tool), nor will they likely be complete, but that&#x27;s not the point. If it helps reduce the barrier(s) to adoption of regular expressions, then I can heartily support it.<p>2. To the people who say they use regular expressions only a handful of times a year, thus it&#x27;s not worthwhile to invest time in learning the syntax, I offer this: once you know it, you will use it far more often than you ever expected. Find &amp; replace text, piping output, Nginx.conf editing, or even the REGEXP() function in MySQL. It&#x27;s a valuable skillset in so many environments that I expect you will use weekly, if not daily.<p>3. Ultimately regular expressions, like everything, are extra difficult until you know all of the available tools in the toolbox. At that point, you may realize you wrote an unnecessarily complex expression simply because you didn&#x27;t know better.
评论 #11703081 未加载
评论 #11702346 未加载
评论 #11702895 未加载
评论 #11708247 未加载
bartkappenburgabout 9 years ago
Our tool[0] for using persuasion principles on your site to increase conversion had a UX problem when setting things up. We&#x27;d like to have a generic way to detect what type of page a certain url is. Most obvious way was to go with regular expressions (&#x2F;.<i>\-.</i>\-d+\.html for product pages for example).<p>Turned out this was by far the most misunderstood setting while it was most of the important ones. Target audience had something to do with it (marketeers), but even when Google analytics or google tag manager is widely used by them, setting up these expressions is really hard.<p>We decided to built an internal tool that generates a regular expression based on examples for which the regex must hold. We called it the regexhelper. It was so successfull that we made it into an external tool[1].<p>It&#x27;s not perfect (in terms of generating the most efficient regexes), but it works fantastic for our audiences of marketeers. Planning to open source this as well!<p>An visual UI when dealing with regexes that are the result of our helper using this idea could be beneficial.<p>[0] <a href="https:&#x2F;&#x2F;www.conversify.com" rel="nofollow">https:&#x2F;&#x2F;www.conversify.com</a><p>[1] <a href="http:&#x2F;&#x2F;regexhelper.conversify.com&#x2F;" rel="nofollow">http:&#x2F;&#x2F;regexhelper.conversify.com&#x2F;</a>
Drupabout 9 years ago
If you want readable regexp, just use combinators and your language&#x27;s variable declaration facilities. No need for more.<p>I don&#x27;t understand why people still insist on using insane syntax for regexps instead of just ... functions (`rep` for repetition, `seq` for sequences, `opt` for optional ..).
评论 #11701975 未加载
评论 #11701882 未加载
评论 #11702781 未加载
评论 #11701814 未加载
dottrapabout 9 years ago
The problem with &quot;regex&quot; is that it left the pure computer science realm of true regular expressions, and thus lost many of the mathematical properties of regular expressions.<p>Regex&#x27;s are then further abused to do things far beyond what true regular expressions can do, which results in cryptic regex expressions whose behaviors are implementation dependent instead of bounded by computer science principles.<p>Lua creator Roberto Ierusalimschy resurfaced and explored the idea of PEGs (Parsing Expression Grammars) as a better way to do the things that people have abused regex to do, while keeping it grounded in pure CS principles, allowing better syntax making things easier to express, more powerful behavior, mathematically grounded complexity (for performance), and more clarity in what can and cannot be accomplished.<p>This video presentation from the Lua Workshop explains all of this and more about why PEGs. <a href="https:&#x2F;&#x2F;vimeo.com&#x2F;1485123" rel="nofollow">https:&#x2F;&#x2F;vimeo.com&#x2F;1485123</a>
评论 #11703102 未加载
eganistabout 9 years ago
I can see the claimed advantages to what&#x27;s proposed, but I feel like if the railroad diagram by RegExper could be reversed, that that would be a far more successful visual syntax for regular expressions. Then again, most of my regex-fu entails building a regex relatively close to what I want and then repeatedly throwing it at a local instance of RegExper and test strings until I have something which accomplishes what I&#x27;m looking for it to do. I&#x27;d definitely fall outside the &quot;true regex superheroes&quot; category.<p>Anyway, to simplify what I have in mind for us less-than-experts, it&#x27;d be neat if someone could put together a railroad diagram of a regular expression that would then be compiled as the regex itself.<p>That being said, I don&#x27;t have the presence of mind right now to determine if two different regexes can result in the same diagram in RegExper. If so, that kinda thoroughly breaks my idea.
评论 #11703056 未加载
constoabout 9 years ago
I understand that the first email regex is simplified and as a result doesn&#x27;t handle oddities such as weird symbols, quotations and IP addresses, but it should be able to handle modern TLDs. Not only do you have names longer than 4 characters, but internationalised domain names starting with --.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;List_of_Internet_top-level_domains" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;List_of_Internet_top-level_dom...</a><p>Depending how simple you want it either:<p>\b[A-Z0–9._%+-]+@[A-Z0–9.-]+\.(xn--[A-Z0-9]|[A-Z]+)\b<p>or simpler:<p>\b[A-Z0–9._%+-]+@[A-Z0–9.-]+\.[A-Z0-9-]+\b<p>However you could argue that validating email via regex misses the point entirely. A simple, permissive regex is all you really need assuming you are actually sending an email to check that the account exists.
评论 #11701946 未加载
评论 #11720738 未加载
评论 #11704867 未加载
zwischenzugabout 9 years ago
In the past I&#x27;ve used kodos, but as time has gone on I&#x27;ve needed it less and less:<p><a href="http:&#x2F;&#x2F;kodos.sourceforge.net&#x2F;" rel="nofollow">http:&#x2F;&#x2F;kodos.sourceforge.net&#x2F;</a><p>As a result I&#x27;m not into the idea of such a visualisation; you should be using regexps all the time, and internalising the rules. When that&#x27;s not enough you have to go and read up. I&#x27;m not sure such a visualisation will help that much in those non-regular cases, simply because they won&#x27;t always be available to hand.
评论 #11701803 未加载
评论 #11701813 未加载
Cozumelabout 9 years ago
Stuff like this while well intentioned is ultimately harmful, regex always looked like total gibberish to me, then one weekend I sat myself down and actually learnt it, no more issues. It&#x27;s really simpler than it seems and worth the effort to learn, programs like this just work as a crutch.
评论 #11701778 未加载
callesggabout 9 years ago
When i read the &quot;graphical&quot; version i missed the a major issue with that email verifier, it only allows emails in UPPER CASE.<p>Spotted it directly in the normal one.<p>That says something.<p>PS, i do think it looks rather nice.
markbnjabout 9 years ago
I&#x27;ll be really interested to see others&#x27; reactions to this. My first impression when I glanced over the example construction was not good. I felt like it really didn&#x27;t improve comprehension, but just forced me to try to learn a new way of seeing those symbols. Perhaps a visual regex &quot;IDE&quot; that completely abstracted the syntax would be a better approach.
评论 #11701687 未加载
kolapuriyaabout 9 years ago
Depends on what you mean by &quot;parse&quot;. If all you want is to search a document that is known to be well-formed, find an element that meets a few criteria, and grab a value out of that element, you can sometimes get away with using regex to find a substring that &quot;looks right&quot; without actually parsing the document. Running your document through an actual parser gives you access to more information about the structure of the document and the context of the elements of interest. Actually parsing your input is therefore more robust to unexpected variations than any of the superficially-cheaper alternatives that people try.
forrestthewoodsabout 9 years ago
My #1 issue with regex is just knowing the damn syntax. Every implementation is a little bit different.<p>Is there a good website that lets me select a language&#x2F;platform&#x2F;IDE&#x2F;etc and cleanly shows all the tools in that particular toolbox?
ZenoArrowabout 9 years ago
What about using this pattern matching visualisation with SNOBOL? I&#x27;d suggest it could be better for this than RegEx.<p><a href="http:&#x2F;&#x2F;langexplr.blogspot.co.uk&#x2F;2007&#x2F;12&#x2F;quick-look-at-snobol.html?m=1" rel="nofollow">http:&#x2F;&#x2F;langexplr.blogspot.co.uk&#x2F;2007&#x2F;12&#x2F;quick-look-at-snobol...</a><p>&quot;The most interesting thing about the language is the string pattern matching capabilities. Here&#x27;s an small(and very incomplete) example that extracts the parts of a simplified URL string:<p><pre><code> LETTER = &quot;abcdefghijklmnopqrstuvwxyz&quot; LETTERORDOT = &quot;.&quot; LETTER LETTERORSLASH = &quot;&#x2F;&quot; LETTER LINE = INPUT LINE SPAN(LETTER) . PROTO &quot;:&#x2F;&#x2F;&quot; SPAN(LETTERORDOT) . HOST &quot;&#x2F;&quot; SPAN(LETTERORSLASH) . RES OUTPUT = PROTO OUTPUT = HOST OUTPUT = RES</code></pre> END<p>In line 6, the contents of the LINE variable is matched against a pattern. The pattern contains the following elements:<p>1.The SPAN(LETTER) . PROTO &quot;:&#x2F;&#x2F;&quot; section says identify a sequence of letters followed by &quot;:&#x2F;&#x2F;&quot; and assign them to the variable called PROTO<p>2.The SPAN(LETTERORDOT) . HOST &quot;&#x2F;&quot; secotion says take a sequence of letters and dots followed by &quot;&#x2F;&quot; and assign then to the variable called HOST<p>3.Finally the last section takes the remaining letters and slash characters and assign them to the RES variable&quot;
JelteFabout 9 years ago
Any time I want to write some non trivial regex I use <a href="https:&#x2F;&#x2F;debuggex.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;debuggex.com&#x2F;</a> to check&#x2F;write it. It is also great to quickly find out what a regular expression that someone else write actually does.
Annatarabout 9 years ago
This helped me master regular expressions:<p><a href="http:&#x2F;&#x2F;www.amazon.com&#x2F;Mastering-Regular-Expressions-Jeffrey-Friedl&#x2F;dp&#x2F;0596528124&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.amazon.com&#x2F;Mastering-Regular-Expressions-Jeffrey-...</a><p>once I read that, it was AWK forever.
评论 #11703430 未加载
vatotemkingabout 9 years ago
I use <a href="http:&#x2F;&#x2F;regexr.com&#x2F;" rel="nofollow">http:&#x2F;&#x2F;regexr.com&#x2F;</a> for this purpose
tacosabout 9 years ago
I like when he got to the hard part and then just stopped writing instead of doing some actual specification or design.
评论 #11708230 未加载