Status Update: We concluded development in November 2025 following the publication
of KNighter (SOSP '24), which validated a similar approach. Read
my full retrospective here.
The Vision
The Linux kernel is a massive, living codebase. Every day, developers fix bugs, but often these
bugs follow repetitive "anti-patterns"—missing locks, incorrect error handling, or uninitialized
variables.
LinuxGuard began with a simple question: Can we teach an AI to read a bug-fix
commit and write a static analysis tool to prevent that bug forever?
Figure 1: The Vision. Utilizing LLMs to bridge the gap between unstructured
commit messages and structured AST Matchers.
Phase 1: The Hurdle
Our initial attempt was straightforward (the "Naive Pipeline"). We fed the diff and commit message
directly into an LLM (like GPT-4) and asked it to "write a Clang AST Matcher."
Figure 2: The Naive Pipeline. A simple single-pass prompt that often led to
compilation errors.
The Result? Failure. The generated C++ code rarely compiled. The LLM would
hallucinate Clang APIs that didn't exist or write matchers that were syntactically correct but
semantically meaningless. We realized that a single inference step was insufficient for the
complexity of the Clang AST.
The Pivot: Strategic Evolution
Faced with these failures, we had to rethink our approach. We analyzed why the model was
failing and realized there was a reasoning gap. A human developer doesn't just
"output code"; they analyze the bug, formulate a logic, and then iteratively refine their code.
We decided to restructure the pipeline to mimic this cognitive process. The diagram below
illustrates our thought process: moving from "direct generation" to a layered approach that
integrates reasoning (Chain-of-Thought) and feedback
(Self-Correction).
Figure 3: Strategic Evolution. Our reasoning process for evolving the
pipeline from simple prompting to a robust, multi-agent system.
The Solution: An Intelligent Pipeline
Guided by the strategy above, we implemented two key architectural improvements that transformed
LinuxGuard into a working tool.
We decoupled analysis from generation. We forced the model to "think" before it coded: first,
explain why the code is buggy; second, describe the pattern in plain English; and only
then, generate the C++ matcher code.
Figure 4: Chain-of-Thought. By explicitly separating reasoning from coding,
we significantly improved semantic accuracy.
To handle syntax errors, we introduced a Repair Agent. When the Clang compiler
throws an error, we capture the `stderr` output and feed it back to the LLM, creating a
closed-loop system that iterates until the checker compiles successfully.
Figure 5: Self-Healing. The system automatically fixes compilation errors
without human intervention.
System Integration
Finally, the generated checkers were not just theoretical. We built a harness that injects these
dynamically generated matchers into the standard Linux kernel build process as full
clang-tidy plugins, enabling us to scan historical kernel versions.
Figure 6: Real-world Integration. Embedding our AI-generated tools into the
LLVM/Clang ecosystem.
The Engine: Pipeline Orchestrator
At the heart of LinuxGuard lies the Orchestrator, a Python-based engine that
manages the entire lifecycle of a checker. The excerpt below highlights the core
generate_checker loop, which drives the Chain-of-Thought generation and manages the
self-healing compilation process.
class PipelineOrchestrator:
"""Orchestrates the complete pipeline with iterative generation and repair."""
def generate_checker(self, commit_hash: str) -> Optional[Dict]:
# ... [Initial setup and context loading omitted] ...
for iteration in range(1, self.max_iterations + 1):
print(f"━━━ Iteration {iteration}/{self.max_iterations} ━━━")
# Stage 1: Multi-commit Pattern Extraction
# We aggregate insights from multiple similar commits to avoid overfitting
candidate_commits = self.collect_candidate_commits(commit_hash)
analysis_result = self.analyze_commits(candidate_commits)
if not analysis_result:
continue
pattern, guidance = analysis_result
# Stage 2 & 3: Synthesis & Implementation (Chain-of-Thought)
# The model first plans the logic (Module 1) then writes the C++ (Module 2)
checker_info = self.implement_checker(pattern, guidance)
# Stage 4: Build & Repair Loop (Self-Healing)
build_success = False
for attempt in range(1, self.max_repair_attempts + 1):
# Try to compile the generated Clang plugin
build_success, errors = self.build_checker(checker_info)
if build_success:
break
# If build fails, feed stderr back to the LLM for repair
print(f"🔧 Attempting repair ({attempt}/{self.max_repair_attempts})...")
repaired = self.repair_checker(checker_info, errors, pattern)
if not repaired:
break
if not build_success:
continue # Retry with a fresh generation if repair fails
# Stage 5: Validation
# Run the compiled checker on historical kernels to verify findings
is_valid = self.validate_checker(checker_info, commit_hash)
if is_valid:
print("✓ SUCCESS: Valid checker generated!")
return checker_info
return None
def repair_checker(self, checker_info: Dict, errors: str, pattern: Dict) -> bool:
"""Use LLM to repair compilation errors by analyzing clang diagnostic output."""
# Load current (broken) source code
h_code, cpp_code = self.load_checker_source(checker_info)
# Create repair prompt with the specific compiler errors
prompt = build_repair_prompt(
code=(h_code, cpp_code),
compiler_errors=errors,
context=pattern
)
# Ask LLM to fix the syntax/API issues
response = self.repair_model.generate_content(prompt)
# ... [Code extraction and file saving omitted] ...
return True
Resources
|
|