Module 3 — Data and Architecture

Exam weight: 10–15% Study time: 30–45 minutes Lessons: Data flow · Suggestion lifecycle & LLM limits

Exam tactic. This module is more technical than the others. The exam asks about data flow and LLM limitations. A good study habit: sketch the data flow on paper and memorize the names of every limitation.

L01 — Data flow and processing

What data does GitHub Copilot use?

Copilot collects context to build a prompt for the LLM. Sources of context:

SourceDescription
Active fileCode around the cursor
Open filesOther files open in the IDE (limited)
Recent editsGit history and recent changes
CommentsComments adjacent to code
Imports / requiresTop of the file (language dependencies)
# referencesSources the user added manually

How is the data shared?

Key exam point. Business and Enterprise do not use data for training. This is one of the most frequently tested facts on GH-300.

Prompt building pipeline

1. User types code or sends a Chat message
2. Copilot extension gathers context (active file, open files, history)
3. Context is tokenized (turned into numbers the LLM understands)
4. Prompt is assembled: system instructions + context + user input
5. Prompt is sent to the GitHub Copilot service over HTTPS
6. Service forwards the prompt to the LLM
7. LLM generates a response
8. Proxy filters the response (safety filters)
9. Filtered response is returned to the IDE
10. IDE displays the suggestion to the user

The context window is finite. Copilot prioritizes:

Proxy filtering

Pre-processing (before the LLM):

Post-processing (after the LLM):

Data flow — memorize this diagram

IDE (context) → Tokenize → Prompt build → HTTPS → GitHub service
→ Pre-processing (exclusions, privacy) → LLM → Post-processing
(duplication, security) → HTTPS → IDE (suggestion)

L02 — Suggestion lifecycle and LLM limitations

Suggestion lifecycle, step by step

  1. Trigger — typing (inline), Chat message, or action (Agent / Plan Mode).
  2. Context gathering — extension collects code, comments, imports, open files.
  3. Tokenization — context becomes tokens (~3–4 characters each).
  4. Prompt send — over HTTPS, with auth and subscription validation.
  5. LLM inference — model generates a token response.
  6. Filtering — duplication, security, content filters.
  7. Display in IDE — inline grey text or chat message.
  8. User decision — accept (Tab) or reject (Esc); anonymous feedback may be collected (individual plans).

LLM limitations in the Copilot context

Copilot-specific limits

LimitDescription
No internet accessCannot fetch live data from the web
No database accessCannot read databases directly (unless via MCP)
No permission checksDoes not verify whether the user is authorized to use specific APIs or data
No memory by defaultNo automatic memory across sessions
Variable language supportSome programming and markup languages have stronger support than others

Exam-ready checklist (M03)

Official source documents