SQLAlchemy and "Lost connection to MySQL server during query"

Published at July 24, 2016 | Tagged with:

... a weird behaviour when pool_recycle is ignored

Preface: This is quite a standard problem for apps/websites with low traffic or those using heavy caching and hitting the database quite seldom. Most of the articles you will find on the topic will tell you one thing - change the wait_timeout setting in the database. Unfortunately in some of the cases this disconnect occurs much earlier than the expected wait_timeout (default ot 8 hours). If you are in one of those cases keep reading.

This issue haunted our team for weeks. When we first faced it the project that was raising it was still in dev phase so it wasn't critical but with getting closer to the release data we started to search for solution. We have read several articles and decided that pool_recycle is our solution.

Adding pool_recycle: According to SQL Alchemy's documentation pool_recycle "causes the pool to recycle connections after the given number of seconds has passed". Nice, so if you recycle the connection in intervals smaller that the await_timeout the error above should not appear. Let's try it out:

import time

from sqlalchemy.engine import create_engine


url = 'mysql+pymysql://user:pass@127.0.0.1:3306/db'
engine = create_engine(url, pool_recycle=1).connect()

query = 'SELECT NOW();'

while True:
    print('Q1', engine.execute(query).fetchall())
    engine.execute('SET wait_timeout=2')
    time.sleep(3)
    print('Q2', engine.execute(query).fetchall())

So what does the code do - we create a connection to a local MySQL server and state that it should be recycled every second(line 7). Then we execute a simple query (line 12) just to verify that the connection is working.
We set the wait_timeout to 2 second and wait for 3. At this stage the connection to the server will timeout, but SQL Alchemy should recycle it, so the last query should be executed successfully and the loop should continue.
Unfortunately the results looks like:

sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError)
(2013, 'Lost connection to MySQL server during query') [SQL: 'SELECT NOW();']

Wait, what happened, why is not the connection recycled?

Solution: Well, as with all such problems the solution was much simpler compared to the time it took us to find it (we fought with this for days). The only change that solved it was on line 7:

engine = create_engine(url, pool_recycle=1)

# the result
Q1 [(datetime.datetime(2016, 7, 24, 20, 51, 41),)]
Q2 [(datetime.datetime(2016, 7, 24, 20, 51, 44),)]
Q1 [(datetime.datetime(2016, 7, 24, 20, 51, 44),)]
Q2 [(datetime.datetime(2016, 7, 24, 20, 51, 47),)]

Have you spot the difference? We are not calling the connect() method any more.

Final words: To keep it honest, I don't know why this solved the issue. Hopefully someone more familiar with SQL Alchemy will come with a reasonable explanation for it. The bad part is that the examples in the official docs are using "connect". So it is either a bug or a bad documentation. I will send this article to SQL Alchemy's Twitter account so hopefully we will see some input from them. Till then, if any of you have an idea explanation about the behaviour I'll be happy to hear it.