Skip to content

Pooling#2345

Merged
tomkralidis merged 11 commits into
geopython:masterfrom
KoalaGeo:pooling
Jun 9, 2026
Merged

Pooling#2345
tomkralidis merged 11 commits into
geopython:masterfrom
KoalaGeo:pooling

Conversation

@KoalaGeo

Copy link
Copy Markdown
Contributor

Overview

Makes the SQLAlchemy connection pool of the SQL provider configurable per provider via the existing options: block, exposing pool_size, max_overflow, pool_recycle, pool_timeout and pool_pre_ping.

Previously get_engine() called create_engine(conn_str, connect_args=connect_args, pool_pre_ping=True) with no pool sizing or recycle, so the default QueuePool held pool_size connections open for the life of each worker process and never recycled them. In multi-process deployments this produces a large number of permanently-IDLE server-side connections (we saw connections idle for days, eventually exhausting max_connections). There was no way to bound or recycle the pool from configuration.

Changes:

  • store_db_parameters() now extracts the five pool keys from options, coerces them to their declared types, and stores them as a sorted, hashable tuple (self.db_pool_options). They are popped out of options so they are not forwarded to the DBAPI as connect_args.
  • get_engine() takes a pool_options tuple parameter and applies **dict(pool_options) to create_engine(). It stays @functools.cache-able because the parameter is a hashable tuple, so engine sharing per process is preserved; providers with differing pool config correctly get distinct engines.
  • pygeoapi/process/manager/postgresql.py also calls get_engine(); its call site is updated to pass self.db_pool_options so the manager does not lose pool_pre_ping or skip recycling.

Backward compatibility: defaults preserve current behaviour exactly — pool_size=5, max_overflow=10, pool_pre_ping=True, and pool_recycle=-1 (SQLAlchemy's default, i.e. the current effective behaviour).

This PR is therefore a pure, opt-in feature add with no behaviour change for existing users. (See the issue for discussion of whether a finite default pool_recycle should be adopted as a separate follow-up.)

New tests and documentation are included.

Related Issue / discussion

Closes #2344.

Additional information

Example configuration:

providers:
  - type: feature
    name: PostgreSQL
    data:
      host: 127.0.0.1
      port: 5432
      dbname: test
      user: postgres
      password: postgres
      search_path: [osm, public]
    options:
      pool_size: 2          # persistent connections per worker process
      max_overflow: 3       # short-lived burst capacity
      pool_recycle: 300     # recycle connections older than 5 minutes
      pool_timeout: 30
    id_field: osm_id
    table: hotosm_bdi_waterways
    geom_field: foo_geom

Note (documented): because get_engine() is @functools.cache-d on its full argument set, providers that share a database must use identical pool options to continue sharing a single engine per worker; differing options intentionally yield separate engines.

Dependency policy (RFC2)

  • I have ensured that this PR meets RFC2 requirements

No new dependencies are introduced; only the standard library and the already-required SQLAlchemy are used.

Updates to public demo

  • I have ensured that breaking changes to the pygeoapi master demo server have been addressed
  • No changes required: defaults preserve existing behaviour, so the demo local.config.yml does not need to change.

Contributions and licensing

  • I'd like to contribute a bugfix/feature (configurable SQL connection pool) to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution
  • I have already previously agreed to the pygeoapi Contributions and Licensing Guidelines

KoalaGeo added 5 commits May 19, 2026 14:56
Added connection pool options for SQL Alchemy engine.
Change pool_recycle to -1 to preserve current behavior.
Added SQLAlchemy connection-pool tuning options to configuration.
test_sql_pool_options.py exercises `store_db_parameters()` directly, requires no database, and runs in standard CI. It asserts the zero-behaviour-change defaults, override + typing, no DBAPI leakage, the existing dict-filtering, hashable/deterministic cache keys, and coexistence with search_path.
@tomkralidis tomkralidis added this to the 0.24.0 milestone May 20, 2026
@webb-ben

Copy link
Copy Markdown
Member

Is there a reason to not pop the attributes from the connect_args inside of get_engine? This would consolidate a bit of the complications noted in the PR between hashing and the manager using get_engine. Maybe I am missing something

@ricardogsilva ricardogsilva left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving my two cents here - I'm not a core committer so take these with a grain of salt.

Overall I agree with the PR, as adding these connection-related options seems relevant - thanks for your work and I look forward to having it merged!

Personally, I would simplify the implementation a bit, by relying on pygeoapi's JSON Schema document for the validation of the config.

And I would not include most of these tests, which I see as not being relevant.

Comment thread pygeoapi/provider/sql.py Outdated
Comment thread docs/source/publishing/ogcapi-features.rst
Comment thread pygeoapi/provider/sql.py Outdated
Comment thread pygeoapi/provider/sql.py Outdated
Comment thread tests/provider/test_sql_pool_options.py Outdated
Comment thread tests/provider/test_sql_pool_options.py Outdated
Comment thread tests/provider/test_sql_pool_options.py Outdated
Comment thread tests/provider/test_sql_pool_options.py Outdated
Comment thread tests/provider/test_sql_pool_options.py Outdated
@KoalaGeo

Copy link
Copy Markdown
Contributor Author

Is there a reason to not pop the attributes from the connect_args inside of get_engine? This would consolidate a bit of the complications noted in the PR between hashing and the manager using get_engine. Maybe I am missing something

That's a good shout, I'll refactor

@tomkralidis

Copy link
Copy Markdown
Member

@KoalaGeo any update on this PR?

KoalaGeo added 5 commits June 8, 2026 22:44
Pool configuration parameters are now passed directly within the database 'options' block and parsed by get_engine() rather than store_db_parameters().

This streamlines internal processing and formalizes connection pool settings within the pygeoapi configuration schema. Updated tests reflect this shift in responsibility.
This reverts commit 9be0bf7, ruff used line length 88 instead of 79
@KoalaGeo

KoalaGeo commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

@tomkralidis think that's addressed everything raised by @webb-ben and @ricardogsilva — let me know if I've missed anything.

Quick summary of what changed:

  • get_engine() now pops the pool keys out of connect_args itself (@webb-ben's suggestion). That removes the separate db_pool_options tuple entirely, and the manager call site no longer needs any special handling — it just passes **self.db_options like the provider does. It stays functools.cache-able since the kwargs are hashable scalars.
  • Added pool_size / max_overflow / pool_recycle / pool_timeout / pool_pre_ping to the config JSON Schema (@ricardogsilva), so a config can be validated before the server starts. That's now what enforces the types, so I've dropped the manual type(default)(...) coercion — which also removes the bool("False") is True gotcha you spotted.
  • Removed the outdated pool_recycle comment, and trimmed the tests down to a single behavioural test covering the pool/connect_args split in get_engine().

Defaults are unchanged, so existing deployments behave exactly as before.

Comment thread tests/provider/test_sql_pool_options.py
@tomkralidis

Copy link
Copy Markdown
Member

@tomkralidis think that's addressed everything raised by @webb-ben and @ricardogsilva — let me know if I've missed anything.

Quick summary of what changed:

  • get_engine() now pops the pool keys out of connect_args itself (@webb-ben's suggestion). That removes the separate db_pool_options tuple entirely, and the manager call site no longer needs any special handling — it just passes **self.db_options like the provider does. It stays functools.cache-able since the kwargs are hashable scalars.
  • Added pool_size / max_overflow / pool_recycle / pool_timeout / pool_pre_ping to the config JSON Schema (@ricardogsilva), so a config can be validated before the server starts. That's now what enforces the types, so I've dropped the manual type(default)(...) coercion — which also removes the bool("False") is True gotcha you spotted.
  • Removed the outdated pool_recycle comment, and trimmed the tests down to a single behavioural test covering the pool/connect_args split in get_engine().

Defaults are unchanged, so existing deployments behave exactly as before.

Thanks @KoalaGeo! See last minor change request on source code header for tests/provider/test_sql_pool_options.py

@KoalaGeo

KoalaGeo commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Done :-)

@tomkralidis tomkralidis merged commit 4eaef8e into geopython:master Jun 9, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PostgreSQL/SQL provider connection pool is not configurable and never recycles

4 participants