Digital Asset Manager — Understanding Workbench

Over the past 3 to 15 years, TUG has produced a wealth of visual material — architecture models, framework diagrams, conceptual illustrations, presentation visuals — embedded across hundreds of PowerPoint decks, Word documents, and PDFs. Today this visual IP is effectively dark: buried inside documents, scattered across folders, invisible to search.

The Digital Asset Manager creates a searchable catalog so anyone on the team can find what already exists — either for direct reuse or to trace it back to the editable source.

Phases

Rendering

Rendering pipeline validated. PPTX, DOCX, PDF all convert to browsable images. Domain model defined. Storage budget measured.

Complete

Team Tool

Deploy on Vercel with API routes. Migrate data layer to SQLite. Persist curation decisions across sessions. Catalog full Marketing folder.

In Progress

Intelligent Library

AI tagging pipeline (Gemini vs. Claude evaluation). Semantic search by concept. Incremental indexing of new files. Extend beyond Marketing.

Planned

Architecture

UI Layer

React viewer — grid browse, search, comparison, keep/dismiss/delete curation on every asset

API Layer

Next.js API routes on Vercel — serves catalog, thumbnails, previews, accepts curation decisions

Domain Layer

Business logic in Python — extraction, rendering, dedup, tagging, curation rules

Renderer Media Extractor Thumbnail Gen Deduplicator AI Tagger Curator

Data Layer

JSON (Phase 1) → SQLite (Phase 2) — asset catalog, source pointers, curation decisions, learned patterns

Phase 1 Results — February 2026

Rendering pipeline validated against 11 sample files. LibreOffice headless → PDF → pdftoppm at 150 DPI produces sharp, usable images. Wireframes, diagrams, screenshots, colored text, and logos all render clearly.

Type	Files	Pages	Avg KB/page	Dimensions	Notes
PPTX	4	193	134	1500 × 900	1–11s conversion
DOCX	3	12	150	1275 × 1651	<1s conversion
PDF (standard)	3	106	141	1500 × 844	No conversion needed
PDF (wide)	1	31	467	3150 × 1153	Double-page spread

Storage budget: ~4.8 GB for full Marketing folder (91 PPTX, 117 DOCX, 767 PDF, ~3,200 standalone images). 4.7 GB rendered previews + 236 MB grid thumbnails. Well within Dropbox capacity.

Design Properties

Non-destructive

Never modifies, moves, or deletes original files. All outputs live in asset-manager/. "Delete" only flags a catalog entry.

Layered & Swappable

Renderer, tagger, data layer, and UI can each be replaced independently. Defined interfaces between layers.

Model-Agnostic

Tagging pipeline defines an interface (image in, tags + description out) that any vision model can implement. No provider lock-in.

Incremental

Re-runs skip unchanged files. Catalog grows over time rather than being rebuilt from scratch.

Code Inventory

domain/models.py Core data model — Source, Asset, CurationDecision, Collection, Catalog with JSON serialization ~420 lines

render.py Main rendering pipeline — crawl folders, convert PPTX/DOCX→PDF→images, generate thumbnails, detect duplicates via perceptual hash ~505 lines

extractor/extract.py Advanced extraction — pull embedded media from PPTX/DOCX/PDF, standalone image processing, Claude Vision auto-tagging, duplicate detection ~765 lines

← Back to Library Architecture