Hotspots Architecture Improvements Summary

Date: 2026-02-15
Status: Proposal
Full Document: IMPROVEMENTS.md

Executive Summary

This report outlines 12 architectural improvements to Hotspots, organized by priority and impact. The highest-impact improvements focus on performance and scalability, with potential for 10-100x speedup for typical workflows. Medium-priority improvements enhance maintainability and extensibility, while lower-priority items improve quality of life and developer experience.

High Priority: Performance & Scalability

1. Parallel File Analysis

Impact: 4-8x speedup on multi-core systems
Effort: 2-3 weeks
Approach: Use rayon to parallelize independent file processing
Challenge: Maintain deterministic ordering for output

2. Incremental Analysis & Caching

Impact: 10-100x faster for incremental changes
Effort: 4-6 weeks
Approach: Cache parsed ASTs and CFGs, only re-analyze changed code
Challenge: Cache invalidation strategy and size management

3. Batched Git Operations

Impact: 10-50x faster git operations for many files
Effort: 1-2 weeks
Approach: Batch multiple file queries into single git commands
Challenge: Git command line length limits

4. Optimize Call Graph Algorithms

Impact: 5-10x faster for large call graphs
Effort: 3-4 weeks
Approach: Incremental PageRank, approximate Betweenness, sparse matrices
Challenge: Algorithm correctness and determinism

Combined Impact: 10-100x faster analysis for typical incremental workflows

Medium Priority: Architecture & Maintainability

5. Plugin System for Metrics

Impact: High extensibility, low performance impact
Effort: 3-4 weeks
Approach: Trait-based metric system with plugin registry
Benefit: Extensibility without core changes

6. Policy Trait System

Impact: Better maintainability, eliminates duplication
Effort: 2-3 weeks
Approach: Single evaluation loop with trait-based policies
Benefit: Easier to add new policies, testable logic

7. Dependency Injection for Testability

Impact: Better testability, minimal runtime impact
Effort: 4-6 weeks
Approach: Trait-based abstractions for I/O and git operations
Benefit: Unit tests without real filesystem/git

8. Streaming Output for Large Repos

Impact: Constant memory usage, handles very large repos
Effort: 2-3 weeks
Approach: Stream results as computed, incremental JSON/HTML
Benefit: Enables analysis of repos with 100k+ functions

Lower Priority: Quality of Life

9. Language Plugin System

Impact: Enables community language support
Effort: 6-8 weeks
Approach: Dynamic language registration with plugin API
Benefit: No core changes for new languages

10. Structured Error Types

Impact: Better error handling and debugging
Effort: 2-3 weeks
Approach: Domain-specific error types instead of anyhow
Benefit: Programmatic error handling and recovery

11. AST Storage Optimization

Impact: 30-50% memory reduction
Effort: 2-3 weeks
Approach: Lazy AST parsing, compact representations
Benefit: Better for large repos

12. Configuration Validation & Schema

Impact: Better developer experience
Effort: 1-2 weeks
Approach: JSON Schema for config, validation with clear errors
Benefit: Catch errors early, IDE autocomplete

Implementation Roadmap

Phase 1: Performance (3-6 months)

Focus: Speed and scalability

Parallel file analysis (2-3 weeks)
Batched git operations (1-2 weeks)
Incremental analysis (4-6 weeks)

Expected Outcome: 10-100x faster for incremental workflows

Phase 2: Architecture (6-9 months)

Focus: Maintainability and testability 4. Policy trait system (2-3 weeks) 5. Dependency injection (4-6 weeks) 6. Streaming output (2-3 weeks)

Expected Outcome: Reduced duplication, better testability

Phase 3: Extensibility (9-12 months)

Focus: Plugin systems and extensibility 7. Metric plugin system (3-4 weeks) 8. Language plugin system (6-8 weeks) 9. Structured errors (2-3 weeks)

Expected Outcome: Community extensibility, better error handling

Key Trade-offs

Performance vs. Simplicity

Decision: Start with parallelization, add caching later
Rationale: Parallelization provides immediate benefit with manageable complexity

Extensibility vs. Performance

Decision: Use generics where possible, traits where necessary
Rationale: Balance between zero-cost abstractions and runtime flexibility

Memory vs. Speed

Decision: Make configurable, default to balanced
Rationale: Different repos have different constraints

Backward Compatibility

Decision: Version APIs, provide migration guides
Rationale: Most improvements can be additive, some require breaking changes

Success Metrics

Performance Targets

5-10x faster for large repos (>10k functions)
10-100x faster for incremental analysis
Constant memory usage for streaming output

Maintainability Targets

50% reduction in code duplication
<100 lines to add new metric/language
100% test coverage for core components

Scalability Targets

Handle repos with 100k+ functions
Support 10+ languages via plugins
Sub-second analysis for incremental changes

Risk Assessment

Low Risk

Parallel file analysis (proven pattern, rayon is mature)
Batched git operations (straightforward optimization)
Configuration validation (additive feature)

Medium Risk

Incremental analysis (cache invalidation complexity)
Policy trait system (migration effort)
Dependency injection (large refactoring)

High Risk

Call graph optimizations (algorithm correctness)
Language plugin system (API design complexity)
Streaming output (HTML streaming complexity)

Recommendations

Immediate Actions (Next 3 Months)

Implement parallel file analysis — Highest ROI, low risk
Batch git operations — Quick win, significant speedup
Add configuration validation — Improves DX with minimal effort

Short-term (3-6 Months)

Incremental analysis — Enables faster CI feedback
Policy trait system — Reduces maintenance burden
Dependency injection — Improves testability

Long-term (6-12 Months)

Plugin systems — Enables community contributions
Streaming output — Handles very large repos
Error type improvements — Better error handling

Conclusion

The proposed improvements would transform Hotspots from a fast single-threaded tool into a highly scalable, extensible platform. The Phase 1 performance improvements alone could provide 10-100x speedup for typical workflows, making Hotspots viable for very large codebases and faster CI integration.

The architectural improvements in Phase 2 would reduce maintenance burden and improve testability, while Phase 3's plugin systems would enable community contributions and long-term extensibility.

Priority: Focus on Phase 1 performance improvements first, as they provide the highest immediate value with manageable risk.

Full Details: See IMPROVEMENTS.md for complete specifications, code examples, and implementation details.

Hotspots Architecture Improvements Summary ​

Executive Summary ​

High Priority: Performance & Scalability ​

1. Parallel File Analysis ​

2. Incremental Analysis & Caching ​

3. Batched Git Operations ​

4. Optimize Call Graph Algorithms ​

Medium Priority: Architecture & Maintainability ​

5. Plugin System for Metrics ​

6. Policy Trait System ​

7. Dependency Injection for Testability ​

8. Streaming Output for Large Repos ​

Lower Priority: Quality of Life ​

9. Language Plugin System ​

10. Structured Error Types ​

11. AST Storage Optimization ​

12. Configuration Validation & Schema ​

Implementation Roadmap ​

Phase 1: Performance (3-6 months) ​

Phase 2: Architecture (6-9 months) ​

Phase 3: Extensibility (9-12 months) ​

Key Trade-offs ​

Performance vs. Simplicity ​

Extensibility vs. Performance ​

Memory vs. Speed ​

Backward Compatibility ​

Success Metrics ​

Performance Targets ​

Maintainability Targets ​

Scalability Targets ​

Risk Assessment ​

Low Risk ​

Medium Risk ​

High Risk ​

Recommendations ​

Immediate Actions (Next 3 Months) ​

Short-term (3-6 Months) ​

Long-term (6-12 Months) ​

Conclusion ​