Three bugs in the Go MySQL driver

303 点作者 farslan大约 5 年前

20 条评论

bgrainger大约 5 年前

While the second and third problems sound specific to Go's `database/sql` API, the first problem (server-closed connections) is an issue any MySQL client library (that implements connection pooling) has to deal with.My .NET MySQL connector (<a href="https://github.com/mysql-net/MySqlConnector" rel="nofollow">https://github.com/mysql-net/MySqlConnector</a>) "solves" it at the MySQL level by sending a PING packet, but as the post points out, this adds extra latency.The TCP-based approach of performing a non-blocking read sounds like a much better approach; I'm glad the author shared this, and I want to see if that technique can also be implemented in MySqlConnector: <a href="https://github.com/mysql-net/MySqlConnector/issues/821" rel="nofollow">https://github.com/mysql-net/MySqlConnector/issues/821</a>.

评论 #23254603 未加载

评论 #23279454 未加载

JulienSchmidt大约 5 年前

Bug #3 (The race) was silently introduced by a semantic change in Go's database/sql in January 2018: <a href="https://github.com/golang/go/commit/651ddbdb5056ded455f47f9c494c67b389622a47" rel="nofollow">https://github.com/golang/go/commit/651ddbdb5056ded455f47f9c...</a>It took until December of the same year until we got the first bug report and figured out what was going on. While the semantic change might look subtle, it was certainly not from our (driver maintainer's) perspective. We were quite disappointed that such a change was made 1) without informing the driver maintainers 2) making sure the changes were in place before this change made it into a Go release.We regularly test against Go's master (now using a Travis CI cron job), but that only helps if the existing tests fail. We don't have the time to constantly monitor all changes in the Go repo.If there is a need to make such changes (not just in database/sql and not just in Go), PLEASE actively communicate early with the community / the direct users.

评论 #23261818 未加载

评论 #23256872 未加载

kardianos大约 5 年前

FYI: Go 1.15 will have idle timeout: <a href="https://tip.golang.org/pkg/database/sql/#DB.SetConnMaxIdleTime" rel="nofollow">https://tip.golang.org/pkg/database/sql/#DB.SetConnMaxIdleTi...</a>The driver interface also separates out the session resetter and the connection validator concepts. <a href="https://tip.golang.org/pkg/database/sql/driver/" rel="nofollow">https://tip.golang.org/pkg/database/sql/driver/</a>Please do have drivers implement the Connector.

EmielMols大约 5 年前

The work-around for the first issue might work well in practice, but due to asymmetry of tcp streams, is still open for a race where the server will idle-close a connection while a new (non-idempotent!) query is just in-flight.The correcter solution would let the client manage the idle-timeout and disconnect once reached. Depending a bit on how controlled the client is, this might be a good or bad idea.Note that this problem is very generalizable to http (1) upstream servers. If you need to support non-idempotent requests and want persistent connections to your upstream, it's not a good idea to have the upstream manage the idle timeout (and disconnect if reached).In practice, I would have the client manage an idle-timeout of 30s, and server of 40s as an extra protection against misbehaving clients.

评论 #23259100 未加载

user5994461大约 5 年前

>>> Keep in mind, we run our production MySQL clusters with a pretty aggressive idle timeout (30s)Number one source of connectivity issue with MySQL. That setting alone is the source to every bug they encountered.It's marvelous the amount of applications that have a low timeout and wonder why the (idle) connections are dropped all the time.The timeout works fantastic when software prepare connection to the database, spend maybe 30 seconds initializing and doing other stuff, then send the query. The carefully prepared connection is guaranteed to be broken consistently. But the most impressive use case is with connection pools, whose purpose is to maintain idle connections, that gets butchered every 30 seconds.MySQL should remove this fatal setting and enable TCP keepalive by default.At least have mercy on the developers to set a sane default and cap the minimum value to 5 minutes.

评论 #23298258 未加载

评论 #23281518 未加载

kccqzy大约 5 年前

Cancellation is always such a big issue regardless of language or environment, and whether it's due to a timeout or some other action.In C, when using threads and thread cancellation there are a lot of subtleties involved: <a href="https://ewontfix.com/2/" rel="nofollow">https://ewontfix.com/2/</a>Even in Haskell where there's generally a focus on well designed abstractions, asynchronous cancellation can fail due to badly written code that wants to catch all exceptions.

sargun大约 5 年前

These bugs are not specific to MySQL. The bug around context closing the connection resulting in a txn not having an explicitly successful abort / rollback is a problem as well because conn.Close gets called in a goroutine when context expires.We've experienced this in pq as well.

评论 #23255397 未加载

kevinherron大约 5 年前

Bug #1 doesn't seem solved. They just narrowed the window for the race condition.I don't see how you can fix this without the client first calling db.Ping() if within some proximity to its estimation of the connection's expiration.

nitwit005大约 5 年前

> A quick Code Search for rows.Close() on GitHub shows that pretty much nobody is explicitly checking the return value of rows.Close() in their Go code.Seems to be a pattern in all languages. Plenty of C code ignoring the result of the close system calls, or Java code that just wraps exceptions from close calls in try/catch blocks that do nothing with the exception.

评论 #23257964 未加载

评论 #23255722 未加载

评论 #23258407 未加载

评论 #23256799 未加载

sudhirj大约 5 年前

Why aren’t these problems applicable to Postgres? Is the connection model different or do the libraries already do all this?

评论 #23256919 未加载

评论 #23257633 未加载

评论 #23254858 未加载

nerdbaggy大约 5 年前

I’m sure there are a lot of reasons to hate it but I like how Go does the different database types and the base type

评论 #23257052 未加载

jijji大约 5 年前

I encountered this same issue a few years ago when using the Go mysql driver, and the simple fix was calling db.Ping() before doing a query, which checks the connection and allows the query to proceed.

评论 #23255123 未加载

评论 #23254689 未加载

yuribro大约 5 年前

I really don't understand the discussion around the first bug. Either something is over simplified, or is it just an issue of a bad abstraction?This issue (other side closes connection) is so fundamental to all networking code, I don't see how the use specific use case (MySQL idle timeout) is special. The connection pool also doesn't sound relevant - the same would happen if the caller is keeping the same connection (socket) and using it.For this class of issues - this is the easiest case to deal with - the other side sent a FIN and we got it! Our kernel knows that this socket is closed (doesn't matter if fully closed or only half for this case).So if you would do a select (or equivalent) call for the socket it will not return as writable (and will probably return in the exception case), and you'll reconnect.If you don't use select - the write will fail immediately, and I would guess that at the worst case, you can see that 0 bytes were written?So the problem here was just that the wrong error was returned through the multiple layers of abstractions? Why not propagate the error correctly? Or handle the re connection in a lower level? Trying to "Ping" on the application level is such an overkill and misuse of resources. Do we really want to have an extra round trip for every action we do on a DB?Of course, as someone else pointed out, there are much harder issues to consider - silent disconnects, proxied connections where the proxy didn't propagate the error, failures on the DB level, and so on.EDIT: I see now that another comment mentions that there is a specific error code in MySQL clients for lost server.

zkirill大约 5 年前

>> Instead, make sure that every SQL operation in your app is using the QueryContext / ExecContext interfacesI've been wondering if this is indeed a best practice or something that should be used only when necessary. Should every database query really abort if the connection to the client fails?In addition, if you follow this advice you now need to check if the returned error is context.Cancelled which warrants an HTTP 400 response.

评论 #23255360 未加载

markdog12大约 5 年前

Dumb question, but you can't listen for a close socket event in Go?

评论 #23260943 未加载

Traubenfuchs大约 5 年前

One consequence of reinventing the wheel (go) is going through the same problems others stumbled over years ago.

评论 #23255593 未加载

frobisher大约 5 年前

Just curious - are any of the these bugs possible in Rust?

评论 #23256641 未加载

The_rationalist大约 5 年前

The justification for going to go given here is very weak. Firstly if they really cared about performance they would go to jruby or truffleruby, secondly most of the performance of a language are in its library ecosystem beyond being the language itself. Ruby has had decades of optimisation for the server that the nascent go has to reinvent

PunksATawnyFill大约 5 年前

Go: Yet another Google language.What's it going to be next week?

评论 #23257081 未加载

erikrothoff大约 5 年前

As someone running MySQL with Go in production and seeing random MySQL connection loss exceptions, is there a summary of how they fixed this? It’s an insanely long article and I appreciate the detail and effort, but if someone has read it all and have a gist that would be insanely helpful .

评论 #23258799 未加载

评论 #23271069 未加载