[improvement](regression) Use Spark thrift JDBC for external SQL helpers #64886
Open
zgxme wants to merge 11 commits into
Open
[improvement](regression) Use Spark thrift JDBC for external SQL helpers #64886zgxme wants to merge 11 commits into
zgxme wants to merge 11 commits into
Conversation
### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#63719 Problem Summary: The regression Spark Iceberg and Paimon helpers executed SQL through docker exec and spark-sql, which required local Docker access and repeatedly started Spark SQL clients. This change follows the Spark Iceberg JDBC helper approach from PR apache#63719 and routes Spark Iceberg/Paimon helper execution through Spark ThriftServer with Hive JDBC. Multi-statement execution now reuses one JDBC connection. ### Release note None ### Check List (For Author) - Test: Manual test - mvn -q -DskipTests compile under regression-test/framework - git diff --check -- framework/src/main/groovy/org/apache/doris/regression/suite/Suite.groovy - Behavior changed: Yes. spark_iceberg, spark_iceberg_multi, and spark_paimon now execute through Spark ThriftServer JDBC instead of docker exec spark-sql. - Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Spark Iceberg helpers opened a new Hive JDBC connection for every spark_iceberg/spark_paimon call. This added repeated Spark ThriftServer session setup overhead in suites that issue many Spark SQL statements. The framework now keeps a Spark Iceberg JDBC connection in SuiteContext thread-local state, creates it on first use, reuses it for later calls in the same suite context thread, and closes it with other context thread-local resources.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Manual test: mvn package -B -DskipTests=true -Dmaven.javadoc.skip=true in regression-test/framework; git diff --check
- Behavior changed: Yes. Spark Iceberg/Paimon helper SQL reuses a SuiteContext-local Spark JDBC connection instead of opening one per call.
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: The Iceberg docker entrypoint started Spark master and worker before Spark ThriftServer, but the thriftserver command did not specify a Spark master. Without an explicit master, Spark can fall back to local execution, so the standalone master and worker may not be used by Hive JDBC queries. This change starts Spark ThriftServer with --master spark://doris--spark-iceberg:7077 while keeping the Derby system home JVM option unchanged.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Manual test: bash -n docker/thirdparties/docker-compose/iceberg/entrypoint.sh.tpl
- Behavior changed: Yes. Iceberg Spark ThriftServer now explicitly runs against the standalone Spark master in the docker environment.
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: The Iceberg Spark docker environment relied on Spark defaults for ThriftServer and spark-sql resource sizing. Those defaults can use too many CPU cores while leaving executor and driver heap at small defaults, and the default shuffle partition count is high for local regression data. This change caps the Spark app at 8 cores, uses 4-core executors with 8g heap, gives the driver 4g heap, disables dynamic allocation explicitly, and reduces default shuffle/parallelism settings for local regression stability.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Manual test: git diff --check -- docker/thirdparties/docker-compose/iceberg/spark-defaults.conf
- Behavior changed: Yes. Iceberg Spark docker jobs now use explicit resource and parallelism defaults.
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: The Iceberg docker entrypoint started Spark ThriftServer before running the preinstalled Spark SQL setup scripts. After moving ThriftServer onto the standalone master, that idle ThriftServer app can reserve executor resources while setup scripts are still running. The ThriftServer also did not receive Iceberg/Paimon SQL extensions, while regression helpers execute Spark SQL through Hive JDBC. This change runs the setup scripts first, then starts ThriftServer with Iceberg and Paimon extensions, and waits for Hive JDBC readiness before marking the container healthy.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Manual test: bash -n docker/thirdparties/docker-compose/iceberg/entrypoint.sh.tpl; /bin/sh -n docker/thirdparties/docker-compose/iceberg/entrypoint.sh.tpl; git diff --check -- docker/thirdparties/docker-compose/iceberg/entrypoint.sh.tpl
- Behavior changed: Yes. Iceberg Spark ThriftServer starts after preinstalled data setup and waits for JDBC readiness before /mnt/SUCCESS.
- Does this need documentation: No
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Spark 4 thriftserver rejects the previous noSasl JDBC URL and then fails to open sessions against the default Iceberg namespace because demo.default is not created. This makes the Iceberg docker startup loop on the thriftserver readiness check and prevents regression Spark Iceberg JDBC helpers from connecting. Create the default Iceberg namespace before starting thriftserver, use the normal HiveServer2 JDBC URL without auth=noSasl, and fail readiness with useful logs instead of looping forever.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Ran bash -n and /bin/sh -n for docker/thirdparties/docker-compose/iceberg/entrypoint.sh.tpl
- Ran git diff --check for modified files
- Ran mvn package -B -DskipTests=true -Dmaven.javadoc.skip=true in regression-test/framework
- Behavior changed: No
- Does this need documentation: No
Contributor
Author
|
run buildall |
1 similar comment
Contributor
|
run buildall |
yiguolei
previously approved these changes
Jun 26, 2026
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
Issue Number: None
Related PR: None
Problem Summary: Add P2 demo regression cases for Iceberg and Paimon. The cases write data through Spark SQL first, then query the same external table through both Doris and Spark, normalizing JDBC result values before comparison to avoid false failures caused by different Java number classes returned by the two JDBC drivers.
None
- Test: Regression test
- ./run-regression-test.sh --run -d external_table_p2/iceberg -s test_iceberg_spark_doris_consistency_demo
- ./run-regression-test.sh --run -d external_table_p2/paimon -s test_paimon_spark_doris_consistency_demo
- Behavior changed: No
- Does this need documentation: No
Contributor
|
run buildall |
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Paimon preinstalled SQL scripts are executed in a shared Spark SQL session. run06.sql changes the session time zone to +08:00 for timestamp partition coverage, but did not restore it before subsequent scripts. This can make later Paimon bootstrap data depend on session state and change physical file metadata such as partition file size. Restore the session time zone to UTC at the end of run06.sql so later scripts start from the default time zone.
### Release note
None
### Check List (For Author)
- Test: Manual test
- git diff --check -- docker/thirdparties/docker-compose/iceberg/scripts/create_preinstalled_scripts/paimon/run06.sql
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: The Iceberg docker bootstrap was changed to sort preinstalled SQL script paths before generating the Spark SQL source files, and run06.sql restored the session time zone after its timestamp partition setup. Revert those changes so the bootstrap ordering and Paimon setup SQL match the previous behavior while investigating Paimon partition file size differences.
### Release note
None
### Check List (For Author)
- Test: Manual test
- git diff --check -- docker/thirdparties/docker-compose/iceberg/entrypoint.sh.tpl docker/thirdparties/docker-compose/iceberg/scripts/create_preinstalled_scripts/paimon/run06.sql
- Behavior changed: Yes. Iceberg docker preinstalled SQL path handling returns to the prior unsorted find output behavior, and run06.sql no longer restores session time zone.
- Does this need documentation: No
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
This PR improves the execution efficiency of external SQL helper related regression cases by using Spark thrift JDBC access.
Local validation shows the following cache-related cases are significantly faster after this change:
test_iceberg_table_cache: 3m20s -> 30stest_paimon_table_meta_cache: 14m59s -> 40sRelease note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)