Skip to content

#2168 - proposal replace meilisearch elasticsearch with seal#2408

Open
MrHDOLEK wants to merge 10 commits into
flow-php:1.xfrom
MrHDOLEK:2168-proposal-replace-meilisearch-elasticsearch-with-seal
Open

#2168 - proposal replace meilisearch elasticsearch with seal#2408
MrHDOLEK wants to merge 10 commits into
flow-php:1.xfrom
MrHDOLEK:2168-proposal-replace-meilisearch-elasticsearch-with-seal

Conversation

@MrHDOLEK

@MrHDOLEK MrHDOLEK commented May 31, 2026

Copy link
Copy Markdown
Contributor

The Elasticsearch and Meilisearch adapters are replaced by a single SEAL adapter
(flow-php/etl-adapter-seal) built on top of SEAL
— a PHP Search Engine Abstraction Layer. One adapter now covers Elasticsearch, OpenSearch,
Meilisearch, Algolia, Solr, Typesense, RediSearch and Loupe: the user builds a
CmsIg\Seal\EngineInterface and passes it to the DSL, exactly like the PostgreSQL/Doctrine
adapters take a client. Tests are backend-agnostic (Memory adapter) and prove every Flow
Entry type survives a round-trip through the adapter.

Resolves: #2168

Change Log


Added

  • SEAL adapter (flow-php/etl-adapter-seal) — a single, strongly typed integration for SEAL search engines (Elasticsearch, OpenSearch, Meilisearch, Algolia, Solr, Typesense, RediSearch, Loupe)
  • from_seal() extractor and to_seal() loader, working with any SEAL EngineInterface
  • to_seal_schema() and seal_schema_to_flow() DSL — recursive, bi-directional Flow Schema ↔ SEAL Schema conversion (nested structures, lists and maps)
  • seal_create_index(), seal_drop_index(), seal_create_schema() and seal_drop_schema() DSL index-lifecycle helpers

Fixed

Changed

Removed

  • Elasticsearch adapter (flow-php/etl-adapter-elasticsearch) — superseded by the SEAL adapter; use SEAL with the Elasticsearch backend (cmsig/seal-elasticsearch-adapter)

Deprecated

Security

@codecov

codecov Bot commented May 31, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.80952% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.19%. Comparing base (6bfba82) to head (ce67e39).
⚠️ Report is 30 commits behind head on 1.x.

Additional details and impacted files
@@             Coverage Diff              @@
##                1.x    #2408      +/-   ##
============================================
+ Coverage     84.94%   85.19%   +0.24%     
- Complexity    20648    20971     +323     
============================================
  Files          1570     1586      +16     
  Lines         63532    64537    +1005     
============================================
+ Hits          53968    54983    +1015     
+ Misses         9564     9554      -10     
Components Coverage Δ
etl 89.35% <ø> (ø)
cli 89.40% <ø> (ø)
lib-array-dot 81.44% <ø> (ø)
lib-azure-sdk 64.44% <ø> (ø)
lib-doctrine-dbal-bulk 93.61% <ø> (ø)
lib-filesystem 85.03% <ø> (ø)
lib-types 91.98% <ø> (ø)
lib-parquet 68.89% <14.28%> (ø)
lib-parquet-viewer 82.26% <ø> (ø)
lib-snappy 89.82% <ø> (-0.45%) ⬇️
lib-dremel 0.00% <ø> (ø)
lib-postgresql 88.59% <33.33%> (+0.01%) ⬆️
lib-telemetry 86.28% <100.00%> (+1.96%) ⬆️
bridge-filesystem-async-aws 92.74% <ø> (ø)
bridge-filesystem-azure 90.45% <ø> (ø)
bridge-monolog-http 97.86% <ø> (ø)
bridge-monolog-telemetry 94.11% <ø> (ø)
bridge-openapi-specification 92.07% <ø> (ø)
symfony-http-foundation 78.57% <ø> (ø)
bridge-psr18-telemetry 100.00% <ø> (ø)
bridge-psr3-telemetry 97.84% <ø> (ø)
bridge-psr7-telemetry 100.00% <ø> (ø)
bridge-telemetry-otlp 90.50% <ø> (ø)
bridge-symfony-http-foundation-telemetry 89.47% <ø> (ø)
bridge-symfony-filesystem-bundle 91.54% <100.00%> (ø)
bridge-symfony-filesystem-cache 98.14% <ø> (ø)
bridge-symfony-postgresql-bundle 94.55% <ø> (ø)
bridge-symfony-postgresql-cache 94.41% <ø> (ø)
bridge-symfony-postgresql-messenger 98.80% <ø> (ø)
bridge-symfony-postgresql-session 93.65% <ø> (ø)
bridge-symfony-telemetry-bundle 79.76% <95.34%> (+3.75%) ⬆️
adapter-chartjs 84.05% <ø> (ø)
adapter-csv 91.16% <ø> (ø)
adapter-doctrine 90.79% <ø> (ø)
adapter-google-sheet 99.18% <ø> (ø)
adapter-http 73.04% <ø> (ø)
adapter-json 88.63% <ø> (ø)
adapter-logger 50.00% <ø> (ø)
adapter-parquet 81.75% <ø> (ø)
adapter-text 74.13% <ø> (ø)
adapter-xml 83.40% <ø> (ø)
adapter-avro 0.00% <ø> (ø)
adapter-excel 94.21% <ø> (ø)
adapter-postgresql 91.42% <95.74%> (+0.53%) ⬆️
adapter-seal 93.80% <93.80%> (∅)
bridge-phpunit-postgresql 75.30% <ø> (ø)
bridge-phpunit-telemetry 80.09% <ø> (ø)
bridge-phpstan-types 0.00% <ø> (ø)
bridge-postgresql-valinor 100.00% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@norberttech

Copy link
Copy Markdown
Member

@MrHDOLEK as we discussed offline, please find some feedback and guidance below.

Extractor / Loader

In general I like the idea of having one extractor that accepts Seal EngineInterface which is aligned with how Doctrine Adapter works right now. Same goes for the loaders.

It would also work very well with frameworks as seal provides a bundle for symfony that literally configures engine so to use it with flow it would be a simple matter of dependency injection.

What to test

I had to think about it a bit more but I dont think we should retest every backend that seal is testing. Doctrine adapter is testing MySQL, PostgreSql nad Sqlite because flow-php/docrtine-dbal-bulk which extends doctrine default behaviors that needs to be tested.

At this time seal provides following adapters:

I would chose one of them (that requires the least configuration and is the easiest to setup in docker) and cover it with proper integration tests that would confirm that moving data from flow to seal works as expected.
Heck you can even use MemoryAdapter in tests unless other adapters have some features that requires some custom configuration.
The only thing you might want to research first is are there any differences between seal adapters. Flow needs to be flexible enough to use all of them, so if one requires something custom, flow extractor/loader needs to be able to pass it through (but I dont think thats the case).

How to test

So in general we want to cover in tests that seal adapter can handle all types of flow Entries. You can achieve that by using one of the two extractors:

They both expose static method schema() : Schema but here is where the tricky part starts. I believe you need to be able to first create an index in tests.
And that brings us to a missing piece of this PR - SchemaConverter.

SchemaConverter

Schema converters are recursive algorithms that can convert schema in both directiosn Flow to Seal and Seal to Flow. You can find some inspirations here:

It might be the easiest to use LLM to help you create one based on those 3 examples for Seal (it's a recursive brain damaging exercise that might not be worth spending time on).

Of course schema converters would need a DSL method.

Search Engine in Tests

So in this PR you are using traits to configure backends in tests, which is fine but there is a different pattern which I found cleaner and easier to maintain.

Contexts.

Here is a good example of DatabaseContext that is used in DatabaseTableListCommandTest

If there will be more than one integration tests you can extract an abstract SealTestCase extends FlowTestCase and setup SealContext in the setUp method making it available through sealContext() : SealContext method.

In case of any questions, you know where to find me 😁

@MrHDOLEK MrHDOLEK marked this pull request as ready for review June 9, 2026 21:00
@MrHDOLEK MrHDOLEK requested a review from norberttech as a code owner June 9, 2026 21:00
Comment thread documentation/upgrading.md Outdated
Comment thread phpunit.xml.dist Outdated
Comment thread src/adapter/etl-adapter-seal/.github/workflows/readonly.yaml Outdated
Comment thread src/adapter/etl-adapter-seal/src/Flow/ETL/Adapter/Seal/SealExtractor.php Outdated

@norberttech norberttech left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better! I left some comments, nothing critical, but there is one gap related to Delete operation.

Please let me know if you have any questions about those comments!

$this->sealContext = null;
}

protected function schema(): Schema

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid any kind of central schema at the test suit level. It's creating similar problem to tests that depends on db loaded fixtures. All tests using this are now fully depending on that schema which makes them fragile as any adjustment to this schama might break them.

Each test should define its own schema and SealTestCase should only expose the Contexts, in this case SealContext

seal_drop_index($this->engine, $this->indexName);
}

seal_create_index($this->engine, $this->indexName);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in general I dont think that SealContext shold expect Schema, oposite, I think it should just expose methods that would simplify creating the engine for specific test.

For example:

SealContext::engine(Seal\Schema $schema)

and then in each test that requires a specific schema you should actually define a FlowSchema and use to_seal_schema to convert FlowSchema automatically to Seal Schema

static::assertSame(1, $items[0]['quantity']);
}

protected function schema(): Schema

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should try to avoid at all cost protected/private methods in tests


final class SealAllEntryTypesTest extends SealTestCase
{
public function test_round_trip_of_all_flow_entry_types(): void

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this test heavier, I would build here a proper ETL pipelines that:

Indexing Pipeline

  1. extract from StaticOrdersExtractor (like 100 rows)
  2. load to Seal
  3. Assert inserted rows (passing analyze() to makes DF return report)

Extracting Pipeline

  1. Extract from seal
  2. fetchToArray
  3. assert

{
public function test_converting_flow_schema_to_a_seal_index(): void
{
$sealSchema = (new SchemaConverter())->toSealSchema(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in tests we should use DSL methods

use Flow\ETL\Rows;
use Generator;

final readonly class RowsNormalizer

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is missing unit tests but even more important, EntryNormalizer is missing them


## Description

ETL Adapter that provides Loaders and Extractors that work with any search engine supported by Seal.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We gonna need more than this :D

}

#[DocumentationDSL(module: Module::SEAL, type: Type::HELPER)]
function seal_create_index(EngineInterface $engine, string $index): void

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we should be exposing those seal_ functions, they are rather Seal related than Flow. DSL purpose is to cover Flow features

$this->engine->bulk(
$this->index,
(new RowsNormalizer(new EntryNormalizer()))->normalize($rows),
[],

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this stands for deleteDocumentIdentifiers - similarly to how Docrtine and Flow PostgreSql is done, we should also allow to delete documents not only index them with this SealLoader

Comment thread compose.yml.dist
MYSQL_ROOT_PASSWORD: root
networks:
- flow-php
elasticsearch:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure we dont wanna keep at least one real search engine in the repo? I mean I know we should be able to trust seal it works exactly same way but... Maybe keeping elasticsearch wouldnt be a bad idea and use it in integration tests keeping memory adapter for unit tests? (no strong opinion about this one yet, I never used seal so dont know how much in memory implementation is different from actual one)


enum Operation: string
{
case DELETE = 'delete';

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more replace than delete cause data will be available, no?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's actual delete

return;
}

if (count($documents) < $this->pageSize) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of counting, add counter variable and increment it in loop?

{
$engine = new Engine(new MemoryAdapter(), $schema);

foreach (array_keys($schema->indexes) as $index) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
foreach (array_keys($schema->indexes) as $index) {
foreach ($schema->indexes as $index => $value) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal]: Replace Meilisearch & Elasticsearch with SEAL

3 participants