So it is trivial to check a url is valid using regex, but if you wanted to take this one step further and make sure it is a valid domain name that is registered and actually in use...how would you do this? I have a few ideas for how to achieve this with a Microservice but I feel like others may have solved this problem before and there may be better solutions out there.
First, you need to be precise in what you mean by "valid".<p>"Valid" can encompass at least these four possibilities:<p>1) the url follows the correct syntax for url's;<p>2) the url is valid as per #1 and further the "host" portion of the url (when it contains a name) can be resolved to an IP address;<p>3) the url is valid as per #2 and further there is a server located at the host (and optional port) value encoded in the URL that responds to requests;<p>4) The url is valid as per #3 and further the path and/or query and fragment parts defines a valid path on the server running at the host:port encoded in the url.<p>#1 you can do yourself, as it is just a check that the syntax is correct.<p>All of numbers 2-4 require some form of 'lookup' occur from some other system in order to verify 'validity'.
So I think in this instance I would deem valid to be:
1) the url is the correct format
2) the url resolves to an ip address
3) the url is registered and is in use. By this I mean it's not one of the "this domain name is for sale" pages.<p>Number 3 is the novel and challenging piece of this.
curllib will tell you whether there's something there, presuming the network is available.<p>In fact, a simple HEAD request will suffice for that.<p>That would also prove that the domain is registered, presuming DNS is working.