The Understanding Group — Internal

Digital Asset Manager

Catalog and manage TUG's visual intellectual property

Over the past 3 to 15 years, TUG has produced a wealth of visual material — architecture models, framework diagrams, conceptual illustrations, presentation visuals — embedded across hundreds of PowerPoint decks, Word documents, and PDFs. Today this visual IP is effectively dark: buried inside documents, scattered across folders, invisible to search.

The Digital Asset Manager creates a searchable catalog so anyone on the team can find what already exists — either for direct reuse or to trace it back to the editable source.

Phases
1
Rendering
Rendering pipeline validated. PPTX, DOCX, PDF all convert to browsable images. Domain model defined. Storage budget measured.
Complete
2
Team Tool
Deploy on Vercel with API routes. Migrate data layer to SQLite. Persist curation decisions across sessions. Catalog full Marketing folder.
In Progress
3
Intelligent Library
AI tagging pipeline (Gemini vs. Claude evaluation). Semantic search by concept. Incremental indexing of new files. Extend beyond Marketing.
Planned
Architecture
UI Layer
React viewer — grid browse, search, comparison, keep/dismiss/delete curation on every asset
API Layer
Next.js API routes on Vercel — serves catalog, thumbnails, previews, accepts curation decisions
Domain Layer
Business logic in Python — extraction, rendering, dedup, tagging, curation rules
Renderer Media Extractor Thumbnail Gen Deduplicator AI Tagger Curator
Data Layer
JSON (Phase 1) → SQLite (Phase 2) — asset catalog, source pointers, curation decisions, learned patterns
Phase 1 Results — February 2026

Rendering pipeline validated against 11 sample files. LibreOffice headless → PDF → pdftoppm at 150 DPI produces sharp, usable images. Wireframes, diagrams, screenshots, colored text, and logos all render clearly.

TypeFilesPagesAvg KB/pageDimensionsNotes
PPTX41931341500 × 9001–11s conversion
DOCX3121501275 × 1651<1s conversion
PDF (standard)31061411500 × 844No conversion needed
PDF (wide)1314673150 × 1153Double-page spread
Storage budget: ~4.8 GB for full Marketing folder (91 PPTX, 117 DOCX, 767 PDF, ~3,200 standalone images). 4.7 GB rendered previews + 236 MB grid thumbnails. Well within Dropbox capacity.
Design Properties
Non-destructive
Never modifies, moves, or deletes original files. All outputs live in asset-manager/. "Delete" only flags a catalog entry.
Layered & Swappable
Renderer, tagger, data layer, and UI can each be replaced independently. Defined interfaces between layers.
Model-Agnostic
Tagging pipeline defines an interface (image in, tags + description out) that any vision model can implement. No provider lock-in.
Incremental
Re-runs skip unchanged files. Catalog grows over time rather than being rebuilt from scratch.
Code Inventory
domain/models.py Core data model — Source, Asset, CurationDecision, Collection, Catalog with JSON serialization ~420 lines
render.py Main rendering pipeline — crawl folders, convert PPTX/DOCX→PDF→images, generate thumbnails, detect duplicates via perceptual hash ~505 lines
extractor/extract.py Advanced extraction — pull embedded media from PPTX/DOCX/PDF, standalone image processing, Claude Vision auto-tagging, duplicate detection ~765 lines
← Back to Library Architecture