Skip to content

feat: criticality worker init (wip)#4161

Draft
mbani01 wants to merge 1 commit into
mainfrom
feat/criticality_worker
Draft

feat: criticality worker init (wip)#4161
mbani01 wants to merge 1 commit into
mainfrom
feat/criticality_worker

Conversation

@mbani01
Copy link
Copy Markdown
Contributor

@mbani01 mbani01 commented Jun 3, 2026

Summary

Changes

Type of change

  • Bug fix
  • New feature
  • Refactor / cleanup
  • Performance improvement
  • Chore / dependency update
  • Documentation

JIRA ticket

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Copilot AI review requested due to automatic review settings June 3, 2026 10:15
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces initial “criticality” groundwork in packages_worker by adding a PageRank-based centrality computation (written into packages_universe) and updating the database ranking function/migrations to support an ADR-based criticality scoring formula.

Changes:

  • Add CSR graph building + PageRank computation and a standalone runner for validating graph correctness.
  • Add DB queries to load direct dependency edges and merge computed centrality scores back into packages_universe.
  • Add migrations for graph-derived signals and a v2 rank_packages_universe() implementation using weighted percentile ranks.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
services/apps/packages_worker/src/criticality/types.ts Adds types for centrality input/output and criticality weight definitions.
services/apps/packages_worker/src/criticality/run-pagerank.ts Adds a standalone CLI-like script to build/validate the graph and optionally run full PageRank.
services/apps/packages_worker/src/criticality/queries.ts Adds SQL helpers to load direct dependency edges and bulk-merge centrality scores.
services/apps/packages_worker/src/criticality/graph.ts Implements CSR graph construction and PageRank iteration utilities.
services/apps/packages_worker/src/criticality/activities.ts Implements the Temporal activity to compute PageRank and persist centrality scores in chunks.
backend/src/osspckgs/migrations/V1780416481__rank_packages_universe_v2.sql Replaces/updates rank_packages_universe() scoring + ranking logic to match the ADR methodology.
backend/src/osspckgs/migrations/V1780394591__packages_universe_graph_signals.sql Adds transitive_dependent_count and centrality_score columns to packages_universe.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +47 to +49
+ weight_downloads * PERCENT_RANK() OVER (
PARTITION BY ecosystem ORDER BY LN(1 + COALESCE(downloads_30d, 0)))
)::numeric(10, 4) AS new_score
Comment on lines +4 to +8
-- score = w_centrality * pct_rank( centrality_score ) within ecosystem
-- + w_transitive * pct_rank( LN(1 + transitive_dependent_count) ) within ecosystem
-- + w_dep_pkgs * pct_rank( LN(1 + dependent_packages_count) ) within ecosystem
-- + w_dep_repos * pct_rank( LN(1 + dependent_repos_count) ) within ecosystem
-- + w_downloads * pct_rank( LN(1 + downloads_30d) ) within ecosystem
n_ranked int;
n_propagated int;
BEGIN
-- ── Step 1: score ──────────────────────────────────────────────────────────
Comment on lines +94 to +104
UPDATE packages p
SET criticality_score = pu.criticality_score,
is_critical = pu.is_critical,
last_rank_pass_at = NOW()
FROM packages_universe pu
WHERE p.purl = pu.purl
AND p.ecosystem = pu.ecosystem
AND (
p.criticality_score IS DISTINCT FROM pu.criticality_score
OR p.is_critical IS DISTINCT FROM pu.is_critical
);
Comment on lines +55 to +58
): { scores: Float64Array; iterations: number } {
const teleportation = (1 - damping) / N // base score every node gets regardless of links
let scores = new Float64Array(N).fill(1 / N) // start: equal weight for all nodes
let next = new Float64Array(N) // scratch buffer, swapped each iteration
Comment on lines +42 to +51
// ── Step 4: merge centrality_score into packages_universe
const batch = Array.from(graph.nodeIndex.entries()).map(([packageId, idx]) => ({
packageId,
centralityScore: scores[idx],
}))

const CHUNK = 10_000
for (let i = 0; i < batch.length; i += CHUNK) {
await mergeCentralityScores(qx, batch.slice(i, i + CHUNK))
}
Comment on lines +5 to +8
* Usage (from services/apps/packages_worker):
* pnpm dev:pagerank [ecosystem] ← full run
* pnpm dev:pagerank cargo --graph-only ← build + validate graph, skip iterations
*/
console.log(`\n nodes=${result.nodeCount.toLocaleString()} edges=${result.edgeCount.toLocaleString()} iters=${result.iterations} duration=${(result.durationMs/1000).toFixed(1)}s`)

// ── Save report ───────────────────────────────────────────────────────────
const outFile = `pagerank-report-${ecosystem}-${Date.now()}.json`
Comment on lines +6 to +10
ecosystem: string
nodeCount: number
edgeCount: number
iterations: number
durationMs: number
Comment on lines +49 to +55
export function computePageRank(
{ numDeps, rowPtr, colData, N }: Graph,
damping = 0.85,
maxIter = 100,
convergence = 1e-6,
onIteration?: (iter: number, delta: number) => void,
): { scores: Float64Array; iterations: number } {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants