Skip to content

[dmftproj] eliminate single-precision loss in the projector construction#14

Open
krystophny wants to merge 1 commit into
TRIQS:unstablefrom
krystophny:fix/dmftproj-precision
Open

[dmftproj] eliminate single-precision loss in the projector construction#14
krystophny wants to merge 1 commit into
TRIQS:unstablefrom
krystophny:fix/dmftproj-precision

Conversation

@krystophny

Copy link
Copy Markdown

Separate, reproducible Fortran precision fix; the Python port stack builds on top.

dmftproj built many COMPLEX(KIND=8) values from REAL(KIND=8) arguments via
CMPLX(...) without KIND=8, so the constructor evaluated in single precision
(~1e-7 loss). This adds KIND=8 (or DCMPLX) at every such site:

  • outputqmc.f (16x) the spinrot phase EXP(CMPLX(0,factor)); same in
    rot_dens.f, setsym.f, set_rotloc.f, symmetrize_mat.f.
  • dmftproj.f k-weight CMPLX(rtetr,0) and orthogonal.f Loewdin eigenvalue
    CMPLX(W,0) widened to DCMPLX.
  • the shipped cubic/f templates stored at full double precision (the buffer
    widening in set_ang_trans.f reads them).

Supersedes the fromfile-only partial fix.

Verification

With this fix the pure-Python generators reproduce dmftproj to:
oubwin byte-identical; symqmc 1.5e-14, sympar 1.9e-14, parproj 1.5e-14
(machine precision). ctqmcout stays at ~1e-6: its overlap is rank-deficient
(more correlated spin-orbitals than bands), so O^{-1/2} amplifies the last-ULP
libm difference between gfortran and numpy by ~1e8. That is a numerical
property of the algorithm (reproducible across BLAS, but not across languages),
not a precision bug.

dmftproj built many COMPLEX(KIND=8) values from REAL(KIND=8) arguments via
CMPLX(...) without KIND=8, evaluating the constructor in single precision (a
~1e-7 loss). Add KIND=8 to every such site: the spinrot phase EXP(CMPLX(0,
factor)) in outputqmc.f (16x), rot_dens.f, setsym.f, set_rotloc.f and the
EXP(CMPLX(0,phase)) in symmetrize_mat.f; widen the two real->complex sites
CMPLX(rtetr,0)/CMPLX(W,0) to DCMPLX (dmftproj.f, orthogonal.f). Also store the
shipped cubic/f templates at full double precision (the buffer widening already
in set_ang_trans.f reads them). Supersedes the partial fromfile-only fix.

With this, the pure-Python port reproduces symqmc/sympar/parproj to machine
precision (1.5e-14) and oubwin byte-identically. ctqmcout retains a ~1e-6
residual: its overlap is rank-deficient (more correlated spin-orbitals than
bands) so O^{-1/2} amplifies the last-ULP libm difference between gfortran and
numpy; that is a numerical property of the algorithm, not a precision bug.
krystophny added a commit to krystophny/dftkit that referenced this pull request Jun 16, 2026
…sion fix)

Use the exact analytic cubic harmonics (reptrans no longer applies the legacy
float32 cast) and call the same LAPACK routine the Fortran does for the Loewdin
inverse (scipy ZHEEV('V','U'), not numpy's ZHEEVD). Regenerate the references
from the precision-fixed dmftproj (PR TRIQS#14) with full-precision templates.

Against that reference the port is now exact: symqmc 1.5e-14, sympar 1.9e-14,
parproj 1.5e-14 (machine precision), oubwin byte-identical. ctqmcout stays at
~6.6e-7: its overlap is rank-deficient (20 correlated spin-orbitals, 17 bands),
so O^{-1/2} amplifies the last-ULP libm difference between gfortran and numpy by
~1e8; dmftproj reproduces itself across BLAS but no cross-language port can match
it below that floor without changing the (preserved) algorithm. Tolerances
tightened to 1e-11 accordingly; ctqmcout documented at 1e-6. Part of TRIQS#7.
krystophny added a commit to krystophny/dftkit that referenced this pull request Jun 16, 2026
…sion fix)

Use the exact analytic cubic harmonics (no float32 cast) and call the same
LAPACK/BLAS routines the Fortran does for the Loewdin step (scipy ZHEEV('V','U')
and ZGEMM with the matching trans flags, not numpy's ZHEEVD / conj-transpose
copies). Regenerate references from the precision-fixed dmftproj (PR TRIQS#14) with
full-precision templates.

The port is now exact to machine precision on every output:
  oubwin   byte-identical
  symqmc   1.5e-14
  sympar   1.9e-14
  parproj  1.5e-14
  ctqmcout 1.5e-14   (full-rank window: 50 bands > 20 correlated spin-orbitals,
                      so the Loewdin overlap O is non-singular and its
                      eigenvectors are unique. The physical narrow-window CaOs2
                      makes O rank-deficient; O^{-1/2} then amplifies the
                      last-ULP libm difference, a property of that degenerate
                      case, not the port.)
Tolerances tightened to 1e-11. Part of TRIQS#7.
krystophny added a commit to krystophny/dftkit that referenced this pull request Jun 16, 2026
…fixtures

The committed CaOs2 dmftproj outputs are now full double precision (precision
fix TRIQS#14 + exact Python TRIQS#12), so the converter golden h5 is regenerated to match.
krystophny added a commit to krystophny/dftkit that referenced this pull request Jun 16, 2026
run_dmftproj writes every case.* file the converter reads (band mode writes
case.outband), a drop-in for the dmftproj Fortran binary. Delete fortran/dmftproj
and its build. End-to-end: the driver feeds the converter and the resulting HDF5
matches the reference (h5diff). The Fortran precision fix lives separately in
TRIQS#14; this stack reproduces dmftproj to machine precision. Closes the port. Part of TRIQS#7.
krystophny added a commit to krystophny/dftkit that referenced this pull request Jun 16, 2026
run_dmftproj writes every case.* file the converter reads (band mode writes
case.outband), a drop-in for the dmftproj Fortran binary. Delete fortran/dmftproj
and its build. End-to-end: the driver feeds the converter and the resulting HDF5
matches the reference (h5diff). The Fortran precision fix lives separately in TRIQS#14.
Closes the port. Part of TRIQS#7.
krystophny added a commit to krystophny/dftkit that referenced this pull request Jun 16, 2026
run_dmftproj writes every case.* file the converter reads (band mode writes
case.outband), a drop-in for the dmftproj Fortran binary. Delete fortran/dmftproj
and its build. End-to-end: driver -> converter -> h5diff. The Fortran precision
fix is TRIQS#14; non-SO support is TRIQS#15. Closes the port. Part of TRIQS#7.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant