Over the past 3 to 15 years, TUG has produced a wealth of visual material — architecture models, framework diagrams, conceptual illustrations, presentation visuals — embedded across hundreds of PowerPoint decks, Word documents, and PDFs. Today this visual IP is effectively dark: buried inside documents, scattered across folders, invisible to search.
The Digital Asset Manager creates a searchable catalog so anyone on the team can find what already exists — either for direct reuse or to trace it back to the editable source.
Phases
1
Rendering
Rendering pipeline validated. PPTX, DOCX, PDF all convert to browsable images. Domain model defined. Storage budget measured.
Complete
2
Team Tool
Deploy on Vercel with API routes. Migrate data layer to SQLite. Persist curation decisions across sessions. Catalog full Marketing folder.
In Progress
3
Intelligent Library
AI tagging pipeline (Gemini vs. Claude evaluation). Semantic search by concept. Incremental indexing of new files. Extend beyond Marketing.
Planned
Architecture
UI Layer
React viewer — grid browse, search, comparison, keep/dismiss/delete curation on every asset
API Layer
Next.js API routes on Vercel — serves catalog, thumbnails, previews, accepts curation decisions
Domain Layer
Business logic in Python — extraction, rendering, dedup, tagging, curation rules
Renderer
Media Extractor
Thumbnail Gen
Deduplicator
AI Tagger
Curator
Data Layer
JSON (Phase 1) → SQLite (Phase 2) — asset catalog, source pointers, curation decisions, learned patterns
Phase 1 Results — February 2026
Rendering pipeline validated against 11 sample files. LibreOffice headless → PDF → pdftoppm at 150 DPI produces sharp, usable images. Wireframes, diagrams, screenshots, colored text, and logos all render clearly.
| Type | Files | Pages | Avg KB/page | Dimensions | Notes |
| PPTX | 4 | 193 | 134 | 1500 × 900 | 1–11s conversion |
| DOCX | 3 | 12 | 150 | 1275 × 1651 | <1s conversion |
| PDF (standard) | 3 | 106 | 141 | 1500 × 844 | No conversion needed |
| PDF (wide) | 1 | 31 | 467 | 3150 × 1153 | Double-page spread |
Storage budget: ~4.8 GB for full Marketing folder (91 PPTX, 117 DOCX, 767 PDF, ~3,200 standalone images). 4.7 GB rendered previews + 236 MB grid thumbnails. Well within Dropbox capacity.
Design Properties
Non-destructive
Never modifies, moves, or deletes original files. All outputs live in asset-manager/. "Delete" only flags a catalog entry.
Layered & Swappable
Renderer, tagger, data layer, and UI can each be replaced independently. Defined interfaces between layers.
Model-Agnostic
Tagging pipeline defines an interface (image in, tags + description out) that any vision model can implement. No provider lock-in.
Incremental
Re-runs skip unchanged files. Catalog grows over time rather than being rebuilt from scratch.
Code Inventory
domain/models.py
Core data model — Source, Asset, CurationDecision, Collection, Catalog with JSON serialization
~420 lines
render.py
Main rendering pipeline — crawl folders, convert PPTX/DOCX→PDF→images, generate thumbnails, detect duplicates via perceptual hash
~505 lines
extractor/extract.py
Advanced extraction — pull embedded media from PPTX/DOCX/PDF, standalone image processing, Claude Vision auto-tagging, duplicate detection
~765 lines
← Back to Library Architecture