Glossary

AI and RAG terms explained like you're talking to a smart colleague who hasn't touched AI yet. Read top to bottom and the picture builds.

Core Concepts (Start Here)

RAG Proxy

The local AI middleman we built. It sits between the chat interface and the language model. Every question gets enriched with the right policy excerpts before the AI sees it, so the AI answers from real documents instead of inventing things. Speaks both the OpenAI and Ollama APIs, so any chat client can use it.

RAG (Retrieval Augmented Generation)

Instead of teaching the AI your facts, you search your documents first and paste the relevant passages into the prompt. The AI reads them on the spot and answers from what it just read. Like an open-book exam instead of memorization. This is the core technique behind RAG Proxy.

Brain

A self-contained knowledge domain. One brain holds the dental policies; another could hold HR, IT, or library docs. Each brain is a folder with a SQLite database, a vector matrix, and the source documents. Knowledge stays compartmentalized, so the dental brain never accidentally answers HR questions.

Embedding

Turning text into a list of numbers that captures its meaning. Similar sentences get similar numbers. This is the magic that lets the system understand that "needlestick protocol" and "what to do after getting poked by a needle" are asking about the same thing.

Vector

A list of numbers that represents a piece of text in mathematical space. Each number captures some aspect of meaning. Vectors that are close together represent text with similar meanings. Our bge-m3 model produces vectors with 1,024 dimensions.

Chunking

Splitting a long document into smaller pieces (usually a few paragraphs each) so the AI can search through them efficiently. Think of it like cutting a textbook into index cards. Each card is small enough to find quickly but big enough to still make sense on its own.

Cosine Similarity

A math formula that measures how similar two things are by comparing the angle between their vectors. A score of 1.0 means identical, 0.0 means completely unrelated. It's how the system decides which document chunks are most relevant to your question.

Semantic Search

Finding documents based on meaning rather than exact keywords. Traditional search requires you to guess the right words. Semantic search understands that "infection control after needle incident" matches a document titled "Sharps Injury Protocol" even though the words are different.

Hallucination

When an AI confidently generates information that sounds right but is completely made up. A fine-tuned model might invent an email address that doesn't exist or cite a form that was never created. RAG reduces this by giving the AI real documents to read before answering.

Fine-Tuning

Retraining an AI model on your own data so it absorbs new knowledge permanently. Like trying to teach someone a textbook by having them memorize it. Works well for learning a writing style, but unreliable for learning specific facts unless you repeat each fact hundreds of times.

LoRA (Low-Rank Adaptation)

A technique for fine-tuning that only changes a small fraction of the model's parameters instead of all of them. Makes fine-tuning faster and cheaper, but doesn't solve the fundamental problem of the model inventing facts it was supposed to learn.

RAFT (Retrieval Augmented Fine-Tuning)

A hybrid approach from Microsoft that combines fine-tuning with RAG-style training data. The model learns to answer questions by reading provided documents rather than memorizing facts. Promising in research, but we found pure RAG simpler and effective enough for our needs.

Inference

The act of an AI model generating a response. When you ask a question and the AI answers, that's inference. In our system, inference happens locally on the Mac Studio, not on some server in another country.

LLM (Large Language Model)

The AI brain that reads text and generates responses. "Large" because these models have billions of parameters (adjustable settings) learned from massive amounts of text. Examples include Llama, Mistral, and Qwen. We run ours locally through Ollama.

Token

The basic unit an AI model reads and writes. Roughly a word or a piece of a word. "Dentistry" is one token. "Faculty of Dentistry" is about four. Models have a context window measured in tokens, which limits how much text they can process at once.

Compliance & Stack

PHIPA

Personal Health Information Protection Act. Ontario legislation that governs how health information custodians handle patient data. Sending patient-adjacent queries to cloud AI services would create compliance risk, which is why everything stays on-premises.

FIPPA

Freedom of Information and Protection of Privacy Act. Ontario legislation that governs how public institutions (like universities) collect, use, and disclose personal information. One of the key reasons we built a local AI system instead of using a cloud service.

Ollama

An open-source tool that makes it easy to download and run AI models on your own computer. Think of it as the engine that powers the AI. It handles the heavy math (inference) and exposes a simple API that other tools can talk to.

PDL (Perl Data Language)

A Perl extension for fast number-crunching, similar to NumPy in Python. We use it for the vector math in our semantic search: calculating cosine similarities across thousands of document chunks in milliseconds.

Vector Database

A storage system optimized for saving and searching vectors. Ours is simple: SQLite for the text and metadata, PDL matrices for the vector math. Big companies use specialized databases like Pinecone or Weaviate, but for 94 documents, SQLite is more than enough.