Librarian Skills — Understanding Workbench

The Librarian pipeline applies formal information science methods to analyze and organize collections of documents. It's built on the same discipline that professional librarians use when they receive a new collection — profiling what's there, assessing condition, determining what analysis is feasible, and presenting options to the human who will make decisions.

Each skill produces a structured artifact that the next skill consumes. You don't have to run all four — start with Ingestion and it will tell you what's worth doing next.

Skills

Ingestion

Profile a corpus — document count, vocabulary analysis, prior organization, quality flags, and feasibility assessment for different analysis methods. Produces the Collection Record that all other skills reference.

collection-record.json stdlib only working

Enrichment

Classify documents by audience persona, concepts, pain points, and content format. Compute related articles via TF-IDF similarity. Run gap analysis to find underserved personas and thin content areas.

data.json scikit-learn working

graph-data.json scikit-learn + numpy working

Briefing

Generate a human-readable capabilities briefing from a Collection Record. Presents 2-3 recommended approaches with effort, output, and honest tradeoffs. The human decides what to do.

briefing.md + package.json stdlib only working

Quick Start

Prepare your documents

Get documents as .txt files in a single folder. If they're PDFs, web pages, or CSV — extract text first. Naming patterns like category--title.txt will be auto-detected.

Run Ingestion

Tell Claude: "I have a folder of documents at [path]. Run the librarian ingestion skill." This produces a Collection Record profiling your corpus.

Review the Collection Record

Check document count, vocabulary characteristics, quality flags, feasibility assessments, and prioritized recommendations.

Choose next steps

Based on the recommendations, run Enrichment (to classify), Similarity (to map relationships), or Briefing (to get structured options).

View results in the Organizer Console

Copy output JSON files to organizer-data/ and open the Organizer Console for interactive exploration.

Theoretical Foundation

The Librarian pipeline is grounded in Robert Glushko's framework from The Discipline of Organizing (MIT Press). Every organizing system involves identifying resources, describing and classifying them, designing the interactions they support, and maintaining the organization over time.

The three-phase human workflow — Reference Interview, Capabilities Briefing, Collaborative Triage — is adapted from professional library science practice. The reference interview encodes the insight that the first question asked is rarely the actual need. A reference librarian works backward from the question to the task, from the task to the need, from the need to the situation.

The pipeline also draws on S.R. Ranganathan's faceted classification tradition, which holds that resources should be analyzed along multiple independent dimensions and recombined rather than forced into a single hierarchy. In practice, this means a document might be classified by persona, topic, content type, and concept simultaneously — and any of those facets can be the entry point for retrieval.

← Back to Library Architecture