Resources

Research papers, tools, and references from the talk.

Research Papers

RAG vs Fine-Tuning: Pipelines, Tradeoffs, and a Case Study

Microsoft, EMNLP 2024. The head-to-head comparison that showed RAG at 87.5% accuracy vs fine-tuning at 50.4%. The paper that validated our pivot.

arxiv.org/abs/2312.05934

Physics of Language Models: Knowledge Storage and Extraction

Allen-Zhu & Li, ICLR 2025. Proved that each fact needs 100-1,000 training exposures to be reliably learned. We gave ours about 15. Mystery solved.

arxiv.org/abs/2404.05405

LIMA: Less Is More for Alignment

Zhou et al., 2023. Showed that fine-tuning primarily teaches style and format, not new knowledge. The model learns how you talk, not what you know.

arxiv.org/abs/2305.11206

FineTuneBench: How Well Do Commercial Fine-Tuning APIs Infuse Knowledge?

Stanford, 2024. Found a 37% ceiling on new knowledge absorption through fine-tuning even with commercial APIs. The problem isn't technique, it's approach.

arxiv.org/abs/2311.07059

RAFT: Adapting Language Model to Domain Specific RAG

SMU/Microsoft, 2024. A hybrid fine-tuning + RAG approach. Interesting research, but pure RAG proved simpler and sufficient for our use case.

arxiv.org/abs/2403.10131

Tools We Used

Run large language models locally. The engine behind our AI inference. Download a model, run it, talk to it through an API. That simple.

A self-hosted chat interface for AI. This is what our clinicians actually interact with. Clean UI, conversation history, file uploads.

Perl / Mojolicious

A real-time web framework for Perl. The backbone of RAG Proxy. Handles HTTP interception, async requests, and WebSocket connections.

mojolicious.org

PDL (Perl Data Language)

Fast numerical computing for Perl. Does the vector math: cosine similarity calculations across thousands of document embeddings in milliseconds.

A serverless database engine. Stores our document chunks and metadata. No daemon, no config, no port. Just a file. Perfect for our scale.

The Stack

bge-m3 Embedding Model

BAAI's multilingual embedding model. Converts text into 1,024-dimension vectors. Runs locally through Ollama. Handles English, French, and mixed-language queries.

huggingface.co/BAAI/bge-m3

A tiny SQLite wrapper for Mojolicious. Gives us migrations, connection pooling, and a clean query interface. The glue between Brain.pm and the database.

metacpan.org/pod/Mojo::SQLite

UofT Context

TechKnowFile Conference

University of Toronto's annual IT knowledge-sharing conference. Where this talk is being presented, May 6-7, 2026.

techknowfile.utoronto.ca

Faculty of Dentistry

Canada's largest dental faculty. Home to the RAG Proxy system, where clinicians and staff use it daily for policy lookups.

dentistry.utoronto.ca