GitHub Copilot and Large Codebases

Working effectively across a large codebase requires reliable context, predictable patterns, and disciplined decomposition. This chapter explains how Copilot uses code context (local and remote indexing), how to augment that context with instructions and prompt files, and practical chunking strategies that scale.

Local indexing

Modern IDEs index the workspace to improve symbol search, navigation, and context available to Copilot. In addition, language servers (Language Server Protocol, LSP) expose structure—types, signatures, references—that helps Copilot generate syntactically valid and idiomatic code. Local indexing is particularly valuable when working offline, on private repositories, or when rapidly iterating in a branch.

Research continues on code representations optimised for large language models (LLMs), aiming to preserve relationships between files, symbols, and architectural boundaries so that changes remain coherent across a codebase.

Remote indexing

For repositories hosted on GitHub.com, Copilot can leverage repository indexes maintained by GitHub to enrich context. This avoids expensive local scans for very large repositories and improves retrieval of related files during suggestions. Remote indexing is complementary to local indexing; together they provide faster, more relevant context without manual curation.

Augmenting context with instructions and prompts

Before instruction files and prompt files were available, teams often built up chat context step by step to guide the model. With repository-scoped guidance, prompts can be shorter and more reliable.

Recommended approach:

Example progressive prompting (now simplified via instructions):

With instructions and prompt files in place, the prompts can be shorter, for example:

Where supported, prompt files (for example, .github/prompts/improve-test-coverage.prompt.md) can encapsulate multi-step guidance and point to the instruction file and relevant documents.

Example: basic copilot-instructions.md

# GitHub Copilot Instructions

## Project overview
This repository contains a microservices-based e‑commerce platform using TypeScript, Node.js, and PostgreSQL. Services communicate over HTTP and asynchronous events.

## Coding standards
- Enable TypeScript strict mode
- Follow ESLint rules in .eslintrc.json
- Public functions must include JSDoc
- Prefer dependency injection for testability
- Use consistent, structured logging with context fields (requestId, userId)
- Document architectural decisions and trade-offs in docs/design-decisions.md
- All documentation must be in valid Markdown, British English spelling and grammar; Mermaid diagrams are acceptable inside markdown codeblocks to illustrate complex concepts

## Testing requirements
- Minimum 80% coverage for new or changed code
- Use Jest for unit tests; place tests alongside code as *.test.ts
- Mock network and database calls (jest.mock, in‑memory fakes)
- Include contract tests for public APIs where feasible
- All documentation in markdown must pass linter tests

## Architecture patterns
- Repository pattern for data access
- Service layer encapsulates business rules
- Controllers/handlers deal only with transport concerns
- Prefer Result<T, E> or exceptions consistently for error handling (see libs/result.ts)

## Key documentation
- Architecture overview: docs/architecture.md
- API specifications: docs/api/openapi.yaml
- Database schema: docs/database/schema.sql
- Coding standards: docs/coding-standards.md
- Testing guide: docs/testing.md

## Conventions
- Naming: lowerCamelCase for variables/functions; PascalCase for types/classes
- Errors: map domain errors to HTTP codes in controllers
- Observability: emit metrics for key operations (see docs/observability.md)

Progressive prompting examples

Without instructions:

Generate a new endpoint for user registration in TypeScript (strict mode). Follow the repository pattern, include input validation, return appropriate HTTP status codes, add JSDoc, write unit tests with Jest (mocking the database), and use the shared Result pattern from libs/result.ts.

With instructions in place:

Generate a user registration endpoint with email and password validation.

Chunking strategies

Effective chunking balances sufficient context for the LLM with human reviewability and token limits.

Strategy 1: Domain-driven chunking

Approach: Divide the codebase by business domains or functional areas rather than technical layers.

Implementation:

Benefits:

Example structure:

/customer-domain/
  ├── api/
  ├── services/
  ├── models/
  └── tests/
/billing-domain/
  ├── api/
  ├── services/
  ├── models/
  └── tests/

Strategy 2: Architectural layer chunking

Approach: Separate code by architectural concerns (presentation, business logic, data access).

Implementation:

Benefits:

Example workflow:

  1. Transform data access layer (repositories, DAOs)
  2. Update business logic layer (services, domain models)
  3. Modify presentation layer (controllers, views)
  4. Update cross-cutting concerns (logging, security)

Strategy 3: Dependency-aware chunking

Approach: Chunk code based on dependency relationships to minimise coupling issues.

Implementation:

Benefits:

Copilot integration:

Strategy 4: File size and complexity chunking

Approach: Divide based on file size, cyclomatic complexity, or lines of code.

Implementation:

Benefits:

Practical guidelines:

Strategy 5: Test-driven chunking

Approach: Organise chunks around testable units and existing test boundaries.

Implementation:

Benefits:

Copilot workflow:

  1. Include existing tests in chunk context
  2. Generate new tests alongside code changes
  3. Validate transformed code against test suites

Strategy 6: API and interface chunking

Approach: Chunk around stable API boundaries and public interfaces.

Implementation:

Benefits:

Example for REST APIs:

Strategy 7: Timeline-based chunking

Approach: Divide work by development phases or sprint boundaries.

Implementation:

Benefits:

Copilot planning:

Chunking decision matrix

When selecting a chunking strategy, consider:

Factor Domain Layer Dependency Size Test API Timeline
Business logic preservation ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐ ⭐⭐
Technical complexity ⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐
Team coordination ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐
AI context quality ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐
Risk management ⭐⭐ ⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐

Combining strategies

In practice, successful transformations often combine multiple strategies:

  1. Phase 1: use dependency-aware chunking to identify transformation order
  2. Phase 2: apply domain-driven chunking for business logic areas
  3. Phase 3: use API chunking for public interfaces
  4. Phase 4: apply file-size chunking for remaining components

Copilot-specific considerations

Token window optimisation:

Prompt efficiency:

Quality assurance:

Documentation strategies

Documentation is a multiplier for Copilot effectiveness. Prioritise high-signal, low-noise artefacts that the model can reference consistently:

Practices:

Key Takeaways