-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat(build): add libreoffice() build extension for docx/pptx to PDF conversion #3649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
CuboYe
wants to merge
1
commit into
triggerdotdev:main
from
CuboYe:bounty/fix-1361-libreoffice-extension
+226
−7
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| --- | ||
| "@trigger.dev/build": patch | ||
| --- | ||
|
|
||
| feat(build): add libreoffice build extension for headless docx/pptx to PDF conversion |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| import { BuildManifest } from "@trigger.dev/core/v3"; | ||
| import { BuildContext, BuildExtension } from "@trigger.dev/core/v3/build"; | ||
|
|
||
| export type LibreOfficeOptions = { | ||
| /** | ||
| * Which LibreOffice component packages to install. | ||
| * Defaults to ["writer", "impress"] for docx and pptx support. | ||
| * - "writer" → libreoffice-writer (handles .doc/.docx) | ||
| * - "impress" → libreoffice-impress (handles .ppt/.pptx) | ||
| * - "calc" → libreoffice-calc (handles .xls/.xlsx) | ||
| * - "draw" → libreoffice-draw (handles .odg) | ||
| * - "math" → libreoffice-math (formula editor) | ||
| */ | ||
| components?: Array<"writer" | "impress" | "calc" | "draw" | "math">; | ||
| /** | ||
| * Additional font packages to install beyond the built-in defaults. | ||
| * Built-in defaults: fonts-liberation, fonts-dejavu-core. | ||
| * Example: ["fonts-noto", "fonts-freefont-ttf"] | ||
| */ | ||
| extraFonts?: string[]; | ||
| }; | ||
|
|
||
| export function libreoffice(options: LibreOfficeOptions = {}): BuildExtension { | ||
| return new LibreOfficeExtension(options); | ||
| } | ||
|
|
||
| class LibreOfficeExtension implements BuildExtension { | ||
| public readonly name = "LibreOfficeExtension"; | ||
|
|
||
| constructor(private readonly options: LibreOfficeOptions = {}) {} | ||
|
|
||
| async onBuildComplete(context: BuildContext, manifest: BuildManifest) { | ||
| if (context.target === "dev") { | ||
| return; | ||
| } | ||
|
|
||
| const components = this.options.components ?? ["writer", "impress"]; | ||
| const componentPkgs = components.map((c) => `libreoffice-${c}`); | ||
|
|
||
| // fonts-liberation: free equivalents of Times New Roman, Arial, Courier New – | ||
| // essential for accurate rendering of most Office documents. | ||
| // fonts-dejavu-core: broad Unicode coverage for international content. | ||
| const fontPkgs = ["fonts-liberation", "fonts-dejavu-core", ...(this.options.extraFonts ?? [])]; | ||
|
|
||
| const allPkgs = [...componentPkgs, ...fontPkgs].join(" \\\n "); | ||
|
|
||
| context.logger.debug(`Adding ${this.name} to the build`, { components }); | ||
|
|
||
| context.addLayer({ | ||
| id: "libreoffice", | ||
| image: { | ||
| // Use --no-install-recommends to avoid pulling in X11 desktop packages. | ||
| // LibreOffice's --headless flag handles PDF conversion without a display. | ||
| instructions: [ | ||
| `RUN apt-get update && apt-get install -y --no-install-recommends \\\n ${allPkgs} \\\n && rm -rf /var/lib/apt/lists/*`, | ||
| ], | ||
| }, | ||
| deploy: { | ||
| env: { | ||
| LIBREOFFICE_PATH: "/usr/bin/libreoffice", | ||
| }, | ||
| override: true, | ||
| }, | ||
| }); | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| { | ||
| "name": "references-libreoffice", | ||
| "private": true, | ||
| "type": "module", | ||
| "devDependencies": { | ||
| "trigger.dev": "workspace:*" | ||
| }, | ||
| "dependencies": { | ||
| "@trigger.dev/build": "workspace:*", | ||
| "@trigger.dev/sdk": "workspace:*" | ||
| }, | ||
| "scripts": { | ||
| "dev": "trigger dev", | ||
| "deploy": "trigger deploy" | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| import { task } from "@trigger.dev/sdk"; | ||
| import { execFile } from "node:child_process"; | ||
| import { mkdirSync, readFileSync, unlinkSync, writeFileSync } from "node:fs"; | ||
| import { tmpdir } from "node:os"; | ||
| import { join } from "node:path"; | ||
| import { promisify } from "node:util"; | ||
|
|
||
| const execFileAsync = promisify(execFile); | ||
|
|
||
| /** | ||
| * Convert a .docx or .pptx file (supplied as a URL) to PDF using LibreOffice | ||
| * running in headless mode — no X11 display required. | ||
| * | ||
| * Requires the `libreoffice()` build extension in trigger.config.ts so that | ||
| * LibreOffice is available inside the deployed container. | ||
| */ | ||
| export const libreofficeConvert = task({ | ||
| id: "libreoffice-convert", | ||
| run: async (payload: { | ||
| /** Public URL of the .docx or .pptx file to convert. */ | ||
| documentUrl: string; | ||
| /** Optional output filename (without extension). Defaults to "output". */ | ||
| outputName?: string; | ||
| }) => { | ||
| const { documentUrl, outputName = "output" } = payload; | ||
|
|
||
| // Use a unique temp directory so concurrent runs don't collide. | ||
| const workDir = join(tmpdir(), `lo-${Date.now()}`); | ||
| mkdirSync(workDir, { recursive: true }); | ||
|
|
||
| // Derive a safe input filename from the URL. | ||
| const urlPath = new URL(documentUrl).pathname; | ||
| const ext = urlPath.split(".").pop() ?? "docx"; | ||
| const inputPath = join(workDir, `input.${ext}`); | ||
| // LibreOffice names the output after the input file stem. | ||
| const outputPath = join(workDir, `input.pdf`); | ||
|
|
||
| try { | ||
| // 1. Download the source document. | ||
| const response = await fetch(documentUrl); | ||
| if (!response.ok) { | ||
| throw new Error(`Failed to fetch document: ${response.status} ${response.statusText}`); | ||
| } | ||
| const arrayBuffer = await response.arrayBuffer(); | ||
| writeFileSync(inputPath, Buffer.from(arrayBuffer)); | ||
|
|
||
| // 2. Convert to PDF using LibreOffice headless. | ||
| // --norestore prevents LibreOffice from showing a recovery dialog. | ||
| // --outdir directs the output file to our working directory. | ||
| const libreofficeBin = process.env.LIBREOFFICE_PATH ?? "libreoffice"; | ||
| await execFileAsync(libreofficeBin, [ | ||
| "--headless", | ||
| "--norestore", | ||
| "--convert-to", | ||
| "pdf", | ||
| "--outdir", | ||
| workDir, | ||
| inputPath, | ||
| ]); | ||
|
|
||
| // 3. Read the resulting PDF. | ||
| const pdfBuffer = readFileSync(outputPath); | ||
|
|
||
| return { | ||
| outputName: `${outputName}.pdf`, | ||
| sizeBytes: pdfBuffer.byteLength, | ||
| // Return base64 so the result is JSON-serialisable. | ||
| // In production you would upload pdfBuffer to S3 / R2 instead. | ||
| base64: pdfBuffer.toString("base64"), | ||
| }; | ||
| } finally { | ||
| // Clean up temp files. | ||
| try { | ||
| unlinkSync(inputPath); | ||
| } catch {} | ||
| try { | ||
| unlinkSync(outputPath); | ||
| } catch {} | ||
| } | ||
| }, | ||
| }); | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| import { defineConfig } from "@trigger.dev/sdk/v3"; | ||
| import { libreoffice } from "@trigger.dev/build/extensions/libreoffice"; | ||
|
|
||
| export default defineConfig({ | ||
| project: "proj_libreoffice_example", | ||
| build: { | ||
| extensions: [ | ||
| // Installs libreoffice-writer and libreoffice-impress (headless, no X11) | ||
| // along with fonts-liberation and fonts-dejavu-core for accurate rendering. | ||
| libreoffice(), | ||
| ], | ||
| }, | ||
| }); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| { | ||
| "compilerOptions": { | ||
| "target": "ES2023", | ||
| "module": "Node16", | ||
| "moduleResolution": "Node16", | ||
| "esModuleInterop": true, | ||
| "strict": true, | ||
| "skipLibCheck": true, | ||
| "customConditions": ["@triggerdotdev/source"], | ||
| "lib": ["DOM", "DOM.Iterable"], | ||
| "noEmit": true | ||
| }, | ||
| "include": ["./src/**/*.ts", "trigger.config.ts"] | ||
| } |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 URL extension extraction fallback
?? "docx"never triggers for extensionless URLsWhen
documentUrlhas no file extension in its pathname (e.g.,https://api.example.com/files/123),urlPath.split(".").pop()returns the entire pathname string (e.g.,/files/123), notnull/undefined. The?? "docx"fallback therefore never fires. This results inextbeing set to the full pathname, producing an invalidinputPathlike/tmp/lo-1234/input./files/123and anoutputPath(input.pdf) that won't match LibreOffice's actual output filename.Reproduction and fix
A safer approach would be to extract the basename first and check for a dot:
Was this helpful? React with 👍 or 👎 to provide feedback.