Pooling#2345
Conversation
Added connection pool options for SQL Alchemy engine.
Change pool_recycle to -1 to preserve current behavior.
Added SQLAlchemy connection-pool tuning options to configuration.
test_sql_pool_options.py exercises `store_db_parameters()` directly, requires no database, and runs in standard CI. It asserts the zero-behaviour-change defaults, override + typing, no DBAPI leakage, the existing dict-filtering, hashable/deterministic cache keys, and coexistence with search_path.
|
Is there a reason to not pop the attributes from the connect_args inside of get_engine? This would consolidate a bit of the complications noted in the PR between hashing and the manager using get_engine. Maybe I am missing something |
ricardogsilva
left a comment
There was a problem hiding this comment.
Just leaving my two cents here - I'm not a core committer so take these with a grain of salt.
Overall I agree with the PR, as adding these connection-related options seems relevant - thanks for your work and I look forward to having it merged!
Personally, I would simplify the implementation a bit, by relying on pygeoapi's JSON Schema document for the validation of the config.
And I would not include most of these tests, which I see as not being relevant.
That's a good shout, I'll refactor |
|
@KoalaGeo any update on this PR? |
Pool configuration parameters are now passed directly within the database 'options' block and parsed by get_engine() rather than store_db_parameters(). This streamlines internal processing and formalizes connection pool settings within the pygeoapi configuration schema. Updated tests reflect this shift in responsibility.
This reverts commit 9be0bf7, ruff used line length 88 instead of 79
|
@tomkralidis think that's addressed everything raised by @webb-ben and @ricardogsilva — let me know if I've missed anything. Quick summary of what changed:
Defaults are unchanged, so existing deployments behave exactly as before. |
Thanks @KoalaGeo! See last minor change request on source code header for |
|
Done :-) |
Overview
Makes the SQLAlchemy connection pool of the SQL provider configurable per provider via the existing
options:block, exposingpool_size,max_overflow,pool_recycle,pool_timeoutandpool_pre_ping.Previously
get_engine()calledcreate_engine(conn_str, connect_args=connect_args, pool_pre_ping=True)with no pool sizing or recycle, so the defaultQueuePoolheldpool_sizeconnections open for the life of each worker process and never recycled them. In multi-process deployments this produces a large number of permanently-IDLE server-side connections (we saw connections idle for days, eventually exhaustingmax_connections). There was no way to bound or recycle the pool from configuration.Changes:
store_db_parameters()now extracts the five pool keys fromoptions, coerces them to their declared types, and stores them as a sorted, hashabletuple(self.db_pool_options). They are popped out ofoptionsso they are not forwarded to the DBAPI asconnect_args.get_engine()takes apool_optionstuple parameter and applies**dict(pool_options)tocreate_engine(). It stays@functools.cache-able because the parameter is a hashable tuple, so engine sharing per process is preserved; providers with differing pool config correctly get distinct engines.pygeoapi/process/manager/postgresql.pyalso callsget_engine(); its call site is updated to passself.db_pool_optionsso the manager does not losepool_pre_pingor skip recycling.Backward compatibility: defaults preserve current behaviour exactly —
pool_size=5,max_overflow=10,pool_pre_ping=True, andpool_recycle=-1(SQLAlchemy's default, i.e. the current effective behaviour).This PR is therefore a pure, opt-in feature add with no behaviour change for existing users. (See the issue for discussion of whether a finite default
pool_recycleshould be adopted as a separate follow-up.)New tests and documentation are included.
Related Issue / discussion
Closes #2344.
Additional information
Example configuration:
Note (documented): because
get_engine()is@functools.cache-d on its full argument set, providers that share a database must use identical pool options to continue sharing a single engine per worker; differing options intentionally yield separate engines.Dependency policy (RFC2)
No new dependencies are introduced; only the standard library and the already-required SQLAlchemy are used.
Updates to public demo
local.config.ymldoes not need to change.Contributions and licensing