Pooling by KoalaGeo · Pull Request #2345 · geopython/pygeoapi

KoalaGeo · 2026-05-19T14:18:41Z

Overview

Makes the SQLAlchemy connection pool of the SQL provider configurable per provider via the existing options: block, exposing pool_size, max_overflow, pool_recycle, pool_timeout and pool_pre_ping.

Previously get_engine() called create_engine(conn_str, connect_args=connect_args, pool_pre_ping=True) with no pool sizing or recycle, so the default QueuePool held pool_size connections open for the life of each worker process and never recycled them. In multi-process deployments this produces a large number of permanently-IDLE server-side connections (we saw connections idle for days, eventually exhausting max_connections). There was no way to bound or recycle the pool from configuration.

Changes:

store_db_parameters() now extracts the five pool keys from options, coerces them to their declared types, and stores them as a sorted, hashable tuple (self.db_pool_options). They are popped out of options so they are not forwarded to the DBAPI as connect_args.
get_engine() takes a pool_options tuple parameter and applies **dict(pool_options) to create_engine(). It stays @functools.cache-able because the parameter is a hashable tuple, so engine sharing per process is preserved; providers with differing pool config correctly get distinct engines.
pygeoapi/process/manager/postgresql.py also calls get_engine(); its call site is updated to pass self.db_pool_options so the manager does not lose pool_pre_ping or skip recycling.

Backward compatibility: defaults preserve current behaviour exactly — pool_size=5, max_overflow=10, pool_pre_ping=True, and pool_recycle=-1 (SQLAlchemy's default, i.e. the current effective behaviour).

This PR is therefore a pure, opt-in feature add with no behaviour change for existing users. (See the issue for discussion of whether a finite default pool_recycle should be adopted as a separate follow-up.)

New tests and documentation are included.

Related Issue / discussion

Closes #2344.

Additional information

Example configuration:

providers:
  - type: feature
    name: PostgreSQL
    data:
      host: 127.0.0.1
      port: 5432
      dbname: test
      user: postgres
      password: postgres
      search_path: [osm, public]
    options:
      pool_size: 2          # persistent connections per worker process
      max_overflow: 3       # short-lived burst capacity
      pool_recycle: 300     # recycle connections older than 5 minutes
      pool_timeout: 30
    id_field: osm_id
    table: hotosm_bdi_waterways
    geom_field: foo_geom

Note (documented): because get_engine() is @functools.cache-d on its full argument set, providers that share a database must use identical pool options to continue sharing a single engine per worker; differing options intentionally yield separate engines.

Dependency policy (RFC2)

I have ensured that this PR meets RFC2 requirements

No new dependencies are introduced; only the standard library and the already-required SQLAlchemy are used.

Updates to public demo

I have ensured that breaking changes to the pygeoapi master demo server have been addressed
No changes required: defaults preserve existing behaviour, so the demo local.config.yml does not need to change.

Contributions and licensing

I'd like to contribute a bugfix/feature (configurable SQL connection pool) to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution
I have already previously agreed to the pygeoapi Contributions and Licensing Guidelines

Added connection pool options for SQL Alchemy engine.

Change pool_recycle to -1 to preserve current behavior.

Added SQLAlchemy connection-pool tuning options to configuration.

test_sql_pool_options.py exercises `store_db_parameters()` directly, requires no database, and runs in standard CI. It asserts the zero-behaviour-change defaults, override + typing, no DBAPI leakage, the existing dict-filtering, hashable/deterministic cache keys, and coexistence with search_path.

webb-ben · 2026-05-20T22:37:20Z

Is there a reason to not pop the attributes from the connect_args inside of get_engine? This would consolidate a bit of the complications noted in the PR between hashing and the manager using get_engine. Maybe I am missing something

ricardogsilva

Just leaving my two cents here - I'm not a core committer so take these with a grain of salt.

Overall I agree with the PR, as adding these connection-related options seems relevant - thanks for your work and I look forward to having it merged!

Personally, I would simplify the implementation a bit, by relying on pygeoapi's JSON Schema document for the validation of the config.

And I would not include most of these tests, which I see as not being relevant.

KoalaGeo · 2026-05-24T04:45:34Z

Is there a reason to not pop the attributes from the connect_args inside of get_engine? This would consolidate a bit of the complications noted in the PR between hashing and the manager using get_engine. Maybe I am missing something

That's a good shout, I'll refactor

tomkralidis · 2026-06-06T12:45:13Z

@KoalaGeo any update on this PR?

Pool configuration parameters are now passed directly within the database 'options' block and parsed by get_engine() rather than store_db_parameters(). This streamlines internal processing and formalizes connection pool settings within the pygeoapi configuration schema. Updated tests reflect this shift in responsibility.

This reverts commit 9be0bf7, ruff used line length 88 instead of 79

KoalaGeo · 2026-06-08T22:10:50Z

@tomkralidis think that's addressed everything raised by @webb-ben and @ricardogsilva — let me know if I've missed anything.

Quick summary of what changed:

get_engine() now pops the pool keys out of connect_args itself (@webb-ben's suggestion). That removes the separate db_pool_options tuple entirely, and the manager call site no longer needs any special handling — it just passes **self.db_options like the provider does. It stays functools.cache-able since the kwargs are hashable scalars.
Added pool_size / max_overflow / pool_recycle / pool_timeout / pool_pre_ping to the config JSON Schema (@ricardogsilva), so a config can be validated before the server starts. That's now what enforces the types, so I've dropped the manual type(default)(...) coercion — which also removes the bool("False") is True gotcha you spotted.
Removed the outdated pool_recycle comment, and trimmed the tests down to a single behavioural test covering the pool/connect_args split in get_engine().

Defaults are unchanged, so existing deployments behave exactly as before.

tomkralidis · 2026-06-08T23:32:46Z

@tomkralidis think that's addressed everything raised by @webb-ben and @ricardogsilva — let me know if I've missed anything.

Quick summary of what changed:

get_engine() now pops the pool keys out of connect_args itself (@webb-ben's suggestion). That removes the separate db_pool_options tuple entirely, and the manager call site no longer needs any special handling — it just passes **self.db_options like the provider does. It stays functools.cache-able since the kwargs are hashable scalars.

Added pool_size / max_overflow / pool_recycle / pool_timeout / pool_pre_ping to the config JSON Schema (@ricardogsilva), so a config can be validated before the server starts. That's now what enforces the types, so I've dropped the manual type(default)(...) coercion — which also removes the bool("False") is True gotcha you spotted.

Removed the outdated pool_recycle comment, and trimmed the tests down to a single behavioural test covering the pool/connect_args split in get_engine().

Defaults are unchanged, so existing deployments behave exactly as before.

Thanks @KoalaGeo! See last minor change request on source code header for tests/provider/test_sql_pool_options.py

KoalaGeo · 2026-06-09T08:59:21Z

Done :-)

KoalaGeo added 5 commits May 19, 2026 14:56

Enhance SQL Alchemy engine with connection pool options

37f428c

Added connection pool options for SQL Alchemy engine.

Add db_pool_options to PostgreSQL connection

5841ed9

Update pool_recycle to SQLAlchemy default value

bc68af4

Change pool_recycle to -1 to preserve current behavior.

Enhance SQLAlchemy connection pooling settings

cd9c836

Added SQLAlchemy connection-pool tuning options to configuration.

tomkralidis requested review from francbartoli, tomkralidis and webb-ben May 20, 2026 12:01

tomkralidis added this to the 0.24.0 milestone May 20, 2026

ricardogsilva reviewed May 21, 2026

View reviewed changes

KoalaGeo added 5 commits June 8, 2026 22:44

Ruff format

9be0bf7

Revert "Ruff format"

04c0099

This reverts commit 9be0bf7, ruff used line length 88 instead of 79

Ensure files end with a newline

58c0d96

Normalize file endings

60f3c4a

tomkralidis requested changes Jun 8, 2026

View reviewed changes

Comment thread tests/provider/test_sql_pool_options.py

Add copyright and license to SQL pooling test

0b0f684

tomkralidis approved these changes Jun 9, 2026

View reviewed changes

tomkralidis merged commit 4eaef8e into geopython:master Jun 9, 2026
4 checks passed

Uh oh!

Conversation

KoalaGeo commented May 19, 2026

Overview

Related Issue / discussion

Additional information

Dependency policy (RFC2)

Updates to public demo

Contributions and licensing

Uh oh!

webb-ben commented May 20, 2026

Uh oh!

ricardogsilva left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KoalaGeo commented May 24, 2026

Uh oh!

tomkralidis commented Jun 6, 2026

Uh oh!

KoalaGeo commented Jun 8, 2026

Uh oh!

Uh oh!

tomkralidis commented Jun 8, 2026

Uh oh!

KoalaGeo commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants