Resources
Research papers, tools, and references from the talk.
Research Papers
RAG vs Fine-Tuning: Pipelines, Tradeoffs, and a Case Study
Microsoft, EMNLP 2024. The head-to-head comparison that showed RAG at 87.5% accuracy vs fine-tuning at 50.4%. The paper that validated our pivot.
arxiv.org/abs/2312.05934
Physics of Language Models: Knowledge Storage and Extraction
Allen-Zhu & Li, ICLR 2025. Proved that each fact needs 100-1,000 training exposures to be reliably learned. We gave ours about 15. Mystery solved.
arxiv.org/abs/2404.05405
LIMA: Less Is More for Alignment
Zhou et al., 2023. Showed that fine-tuning primarily teaches style and format, not new knowledge. The model learns how you talk, not what you know.
arxiv.org/abs/2305.11206
FineTuneBench: How Well Do Commercial Fine-Tuning APIs Infuse Knowledge?
Stanford, 2024. Found a 37% ceiling on new knowledge absorption through fine-tuning even with commercial APIs. The problem isn't technique, it's approach.
arxiv.org/abs/2311.07059
RAFT: Adapting Language Model to Domain Specific RAG
SMU/Microsoft, 2024. A hybrid fine-tuning + RAG approach. Interesting research, but pure RAG proved simpler and sufficient for our use case.
arxiv.org/abs/2403.10131
Tools We Used
Ollama
Run large language models locally. The engine behind our AI inference. Download a model, run it, talk to it through an API. That simple.
ollama.com
Open WebUI
A self-hosted chat interface for AI. This is what our clinicians actually interact with. Clean UI, conversation history, file uploads.
openwebui.com
Perl / Mojolicious
A real-time web framework for Perl. The backbone of RAG Proxy. Handles HTTP interception, async requests, and WebSocket connections.
mojolicious.org
PDL (Perl Data Language)
Fast numerical computing for Perl. Does the vector math: cosine similarity calculations across thousands of document embeddings in milliseconds.
pdl.perl.org
SQLite
A serverless database engine. Stores our document chunks and metadata. No daemon, no config, no port. Just a file. Perfect for our scale.
sqlite.org
The Stack
bge-m3 Embedding Model
BAAI's multilingual embedding model. Converts text into 1,024-dimension vectors. Runs locally through Ollama. Handles English, French, and mixed-language queries.
huggingface.co/BAAI/bge-m3
Mojo::SQLite
A tiny SQLite wrapper for Mojolicious. Gives us migrations, connection pooling, and a clean query interface. The glue between Brain.pm and the database.
metacpan.org/pod/Mojo::SQLite
UofT Context
TechKnowFile Conference
University of Toronto's annual IT knowledge-sharing conference. Where this talk is being presented, May 6-7, 2026.
techknowfile.utoronto.ca
Faculty of Dentistry
Canada's largest dental faculty. Home to the RAG Proxy system, where clinicians and staff use it daily for policy lookups.
dentistry.utoronto.ca