GitHub Copilot and Large Codebases

Working effectively across a large codebase requires reliable context, predictable patterns, and disciplined decomposition. This chapter explains how Copilot uses code context (local and remote indexing), how to augment that context with instructions and prompt files, and practical chunking strategies that scale.

Local indexing

Modern IDEs index the workspace to improve symbol search, navigation, and context available to Copilot. In addition, language servers (Language Server Protocol, LSP) expose structure—types, signatures, references—that helps Copilot generate syntactically valid and idiomatic code. Local indexing is particularly valuable when working offline, on private repositories, or when rapidly iterating in a branch.

Research continues on code representations optimised for large language models (LLMs), aiming to preserve relationships between files, symbols, and architectural boundaries so that changes remain coherent across a codebase.

Remote indexing

For repositories hosted on GitHub.com, Copilot can leverage repository indexes maintained by GitHub to enrich context. This avoids expensive local scans for very large repositories and improves retrieval of related files during suggestions. Remote indexing is complementary to local indexing; together they provide faster, more relevant context without manual curation.

Augmenting context with instructions and prompts

Before instruction files and prompt files were available, teams often built up chat context step by step to guide the model. With repository-scoped guidance, prompts can be shorter and more reliable.

Recommended approach:

Example progressive prompting (now simplified via instructions):

With instructions and prompt files in place, the prompts can be shorter, for example:

Where supported, prompt files (for example, .github/prompts/improve-test-coverage.prompt.md) can encapsulate multi-step guidance and point to the instruction file and relevant documents.

Chunking strategies

Effective chunking balances sufficient context for the LLM with human reviewability and token limits.

Strategy 1: Domain-driven chunking

Approach: Divide the codebase by business domains or functional areas rather than technical layers.

Implementation:

Benefits:

Example structure:

/customer-domain/
  ├── api/
  ├── services/
  ├── models/
  └── tests/
/billing-domain/
  ├── api/
  ├── services/
  ├── models/
  └── tests/

Strategy 2: Architectural layer chunking

Approach: Separate code by architectural concerns (presentation, business logic, data access).

Implementation:

Benefits:

Example workflow:

  1. Transform data access layer (repositories, DAOs)
  2. Update business logic layer (services, domain models)
  3. Modify presentation layer (controllers, views)
  4. Update cross-cutting concerns (logging, security)

Strategy 3: Dependency-aware chunking

Approach: Chunk code based on dependency relationships to minimise coupling issues.

Implementation:

Benefits:

Copilot integration:

Strategy 4: File size and complexity chunking

Approach: Divide based on file size, cyclomatic complexity, or lines of code.

Implementation:

Benefits:

Practical guidelines:

Strategy 5: Test-driven chunking

Approach: Organise chunks around testable units and existing test boundaries.

Implementation:

Benefits:

Copilot workflow:

  1. Include existing tests in chunk context
  2. Generate new tests alongside code changes
  3. Validate transformed code against test suites

Strategy 6: API and interface chunking

Approach: Chunk around stable API boundaries and public interfaces.

Implementation:

Benefits:

Example for REST APIs:

Strategy 7: Timeline-based chunking

Approach: Divide work by development phases or sprint boundaries.

Implementation:

Benefits:

Copilot planning:

Chunking decision matrix

When selecting a chunking strategy, consider:

Factor Domain Layer Dependency Size Test API Timeline
Business logic preservation ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐ ⭐⭐
Technical complexity ⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐
Team coordination ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐
AI context quality ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐
Risk management ⭐⭐ ⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐

Combining strategies

In practice, successful transformations often combine multiple strategies:

  1. Phase 1: use dependency-aware chunking to identify transformation order
  2. Phase 2: apply domain-driven chunking for business logic areas
  3. Phase 3: use API chunking for public interfaces
  4. Phase 4: apply file-size chunking for remaining components

Copilot-specific considerations

Token window optimisation:

Prompt efficiency:

Quality assurance:

Documentation strategies

Documentation is a multiplier for Copilot effectiveness. Prioritise high-signal, low-noise artefacts that the model can reference consistently:

Practices:

Key Takeaways