Skip to content

Commit 4c94c8e

Browse files
feat: AI Video summerization and subtitle generation with Lingo (Closes #1761) (#1891)
* initial setup * backend code * updated env and requirements due to some junk files * fronted setup with lingo with some vibe code * Read me update * added lingo sdk in backend + translated text * updated requirements * lang send from frontend * ui update * readme update * gitignore update * change set added * signed commit * ai comments resolve * added vue18-n * issue fix * remove change set file and rerun empty file --------- Co-authored-by: Sumit Saurabh <62152915+sumitsaurabh927@users.noreply.github.com>
1 parent e605b20 commit 4c94c8e

File tree

27 files changed

+3336
-0
lines changed

27 files changed

+3336
-0
lines changed

.changeset/petite-files-allow.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
---
2+
---
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Logs
2+
logs
3+
*.log
4+
npm-debug.log*
5+
yarn-debug.log*
6+
yarn-error.log*
7+
pnpm-debug.log*
8+
lerna-debug.log*
9+
10+
node_modules
11+
dist
12+
dist-ssr
13+
*.local
14+
15+
# Editor directories and files
16+
.idea
17+
.DS_Store
18+
*.suo
19+
*.ntvs*
20+
*.njsproj
21+
*.sln
22+
*.sw?
23+
24+
.env
25+
.venv
26+
.venv-new
27+
__pycache__
28+
.vscode

community/video-lingo-ai/README.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Video Lingo AI – Demo
2+
3+
## Overview
4+
5+
**Video Lingo AI** is a web application that allows users to:
6+
7+
* Upload a video file.
8+
* Extract audio and transcribe it using Whisper AI.
9+
* Optionally generate a summarized version of the transcript.
10+
* View the transcript with timestamps.
11+
* Due to limited resources please upload any english spoked video.
12+
13+
14+
This tool is built to demonstrate how AI can help automatically understand, summarize, and localize video content.
15+
16+
---
17+
18+
## Features Highlighted
19+
20+
* **Video Processing & Transcription:** Uses Whisper AI to convert spoken content into text segments with timestamps.
21+
* **Summarization:** Generates concise summaries of video content (via a language model).
22+
* **Dynamic Transcript Display:** Shows each segment with start/end times for easy navigation.
23+
* **Multi-language Support :** Designed to translate transcript and summary into other languages, demonstrating the power of Lingo.dev for localization.
24+
* **Interactive Vue3 Frontend:** Drag-and-drop video uploads, toggles for summarization, and responsive results display.
25+
26+
---
27+
28+
## Tech Stack
29+
30+
* **Backend:** FastAPI, Python 3.10+, Whisper AI, OpenAI API (for summarization & translation).
31+
* **Frontend:** Vue 3, Composition API, Axios for API requests.
32+
* **Other Libraries:** Pydantic, Requests, pathlib, shutil.
33+
34+
---
35+
36+
## How to Run Locally
37+
38+
1. **Clone the repository:**
39+
40+
```bash
41+
git clone <your-repo-url>
42+
cd video-lingo-ai/api
43+
```
44+
45+
2. **Create and activate a virtual environment:**
46+
47+
```bash
48+
python -m venv .venv
49+
source .venv/bin/activate # Linux/Mac
50+
.venv\Scripts\activate # Windows
51+
```
52+
53+
3. **Install dependencies:**
54+
55+
```bash
56+
pip install -r requirements.txt
57+
```
58+
59+
4. **Set environment variables:**
60+
61+
Create a `.env` file
62+
63+
```env
64+
GROQ_API_KEY=<your_openai_api_key>
65+
LINGODOTDEV_API_KEY=<your_lingo_api_key>
66+
```
67+
68+
5. **Run the backend:**
69+
70+
```bash
71+
uvicorn src.main:app --reload
72+
```
73+
74+
6. **Run the frontend:**
75+
76+
```bash
77+
npm install
78+
npm run dev
79+
```
80+
81+
7. **Access the app in your browser:**
82+
83+
```
84+
http://localhost:5173
85+
```
86+
87+
* Drag and drop a video file to upload.
88+
* Toggle summarization if needed.
89+
* View the transcript and summary in English/Hindi.
90+
91+
---
92+
93+
## Lingo.dev Integration
94+
95+
The app is designed to highlight Lingo.dev features:
96+
97+
* Translating dynamic content such as transcripts and summaries.
98+
* Supporting multiple languages for global accessibility.
99+
* Easy integration with web apps for localization workflows.
100+
101+
---
102+
103+
## Notes
104+
105+
* The summarization feature may take a few seconds depending on video length.
106+
107+
Thank you for Organizing this
108+
109+
---
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
annotated-doc==0.0.4
2+
annotated-types==0.7.0
3+
anyio==4.12.1
4+
av==16.1.0
5+
certifi==2026.1.4
6+
charset-normalizer==3.4.4
7+
click==8.3.1
8+
coloredlogs==15.0.1
9+
ctranslate2==4.6.3
10+
distro==1.9.0
11+
dotenv==0.9.9
12+
exceptiongroup==1.3.1
13+
fastapi==0.128.0
14+
faster-whisper==1.2.1
15+
filelock==3.20.3
16+
flatbuffers==25.12.19
17+
fsspec==2026.1.0
18+
groq==1.0.0
19+
h11==0.16.0
20+
hf-xet==1.2.0
21+
httpcore==1.0.9
22+
httpx==0.28.1
23+
huggingface-hub==1.3.3
24+
humanfriendly==10.0
25+
idna==3.11
26+
lingodotdev==1.3.0
27+
mpmath==1.3.0
28+
nanoid==2.0.0
29+
numpy==2.2.6
30+
onnxruntime==1.23.2
31+
packaging==26.0
32+
pip==22.0.2
33+
protobuf==6.33.4
34+
pydantic==2.12.5
35+
pydantic-core==2.41.5
36+
python-dotenv==1.2.1
37+
python-multipart==0.0.21
38+
pyyaml==6.0.3
39+
requests==2.32.5
40+
setuptools==59.6.0
41+
shellingham==1.5.4
42+
sniffio==1.3.1
43+
starlette==0.50.0
44+
sympy==1.14.0
45+
tokenizers==0.22.2
46+
tqdm==4.67.1
47+
typer-slim==0.21.1
48+
typing-extensions==4.15.0
49+
typing-inspection==0.4.2
50+
urllib3==2.6.3
51+
uvicorn==0.40.0

community/video-lingo-ai/api/src/__init__.py

Whitespace-only changes.
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
from dotenv import load_dotenv
2+
load_dotenv()
3+
4+
from fastapi import FastAPI, UploadFile, File
5+
from fastapi.responses import JSONResponse
6+
from pathlib import Path
7+
import shutil
8+
import tempfile
9+
10+
from .models import whisper, generate_text
11+
from .utils.utils import extract_audio, translate_text_with_lingo
12+
from fastapi.middleware.cors import CORSMiddleware
13+
14+
app = FastAPI(title="Video Lingo AI API")
15+
app.add_middleware(
16+
CORSMiddleware,
17+
allow_origins=["*"],
18+
allow_credentials=True,
19+
allow_methods=["*"],
20+
allow_headers=["*"],
21+
)
22+
# for demo project above i have allowed all origins.
23+
24+
UPLOAD_DIR = Path(tempfile.gettempdir()) / "video_lingo_ai"
25+
UPLOAD_DIR.mkdir(parents=True, exist_ok=True)
26+
27+
@app.post("/process-video")
28+
async def process_video(
29+
file: UploadFile = File(...),
30+
issummarize: bool = False,
31+
lang: str = "en"
32+
):
33+
video_path = UPLOAD_DIR / file.filename
34+
with open(video_path, "wb") as f:
35+
shutil.copyfileobj(file.file, f)
36+
37+
audio_file = extract_audio(str(video_path))
38+
39+
segments, _info = whisper.transcribe(str(audio_file))
40+
segments = list(segments)
41+
42+
full_text = " ".join(seg.text for seg in segments)
43+
44+
if issummarize:
45+
summary = generate_text(f"Summarize this video:\n{full_text}")
46+
if lang != "en":
47+
summary = await translate_text_with_lingo(summary, lang)
48+
return JSONResponse({"summary": summary})
49+
50+
result = []
51+
for seg in segments:
52+
text = seg.text
53+
if lang != "en":
54+
text = await translate_text_with_lingo(text, lang)
55+
56+
result.append({
57+
"start": seg.start,
58+
"end": seg.end,
59+
"text": text
60+
})
61+
62+
return JSONResponse({"segments": result})
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
from faster_whisper import WhisperModel
2+
import os
3+
from groq import Groq
4+
import re
5+
6+
whisper = WhisperModel(
7+
"tiny",
8+
device="cpu", # i dont have a gpu so it can be slow
9+
compute_type="int8"
10+
)
11+
12+
client = Groq(
13+
api_key=os.getenv("GROQ_API_KEY")
14+
)
15+
16+
17+
def extract_final_answer(text: str) -> str:
18+
if "FINAL_ANSWER:" in text:
19+
text = text.split("FINAL_ANSWER:", 1)[1]
20+
21+
text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)
22+
23+
text = text.strip().strip('"').strip("'")
24+
text = re.sub(r"^[^A-Za-z0-9]+", "", text)
25+
26+
return text.strip()
27+
28+
def generate_text(prompt: str) -> str:
29+
response = client.chat.completions.create(
30+
model="qwen/qwen3-32b",
31+
messages=[
32+
{
33+
"role": "user",
34+
"content": (
35+
"Summarize the following transcript.\n\n"
36+
"STRICT RULES:\n"
37+
"- No reasoning\n"
38+
"- No explanations\n"
39+
"- No analysis\n"
40+
"- Output ONLY the final summary\n"
41+
"- Start with: FINAL_ANSWER:\n\n"
42+
+ prompt
43+
)
44+
}
45+
],
46+
temperature=0.2,
47+
max_tokens=512,
48+
)
49+
50+
raw = response.choices[0].message.content
51+
return extract_final_answer(raw)
52+
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
import subprocess
2+
import os
3+
from lingodotdev import LingoDotDevEngine
4+
5+
def extract_audio(video_path, output_audio="temp_audio.wav"):
6+
7+
command = ["ffmpeg", "-y", "-i", video_path, "-ac", "1", "-ar", "16000", output_audio]
8+
subprocess.run(command, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
9+
10+
if not os.path.exists(output_audio):
11+
raise RuntimeError("Audio extraction failed")
12+
13+
return output_audio
14+
15+
16+
async def translate_text_with_lingo(text: str, target_lang: str) -> str:
17+
result = await LingoDotDevEngine.quick_translate(
18+
text,
19+
api_key=os.getenv("LINGODOTDEV_API_KEY"),
20+
target_locale=target_lang
21+
)
22+
return result
1.2 MB
Binary file not shown.

community/video-lingo-ai/i18n.json

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"version": "1.11",
3+
"locale": {
4+
"source": "en",
5+
"targets": ["hi", "es", "fr"]
6+
},
7+
"buckets": {
8+
"json": {
9+
"include": [
10+
"i18n/[locale].json"
11+
]
12+
}
13+
},
14+
"$schema": "https://lingo.dev/schema/i18n.json"
15+
}

0 commit comments

Comments
 (0)