LinuxGuard

Turning Commit History into Security Checkers via LLMs

Home  /  Research  /  Read the Retrospective Blog

Status Update: We concluded development in November 2025 following the publication of KNighter (SOSP '24), which validated a similar approach. Read my full retrospective here.
The Vision

The Linux kernel is a massive, living codebase. Every day, developers fix bugs, but often these bugs follow repetitive "anti-patterns"—missing locks, incorrect error handling, or uninitialized variables. LinuxGuard began with a simple question: Can we teach an AI to read a bug-fix commit and write a static analysis tool to prevent that bug forever?

LinuxGuard Paradigm
Figure 1: The Vision. Utilizing LLMs to bridge the gap between unstructured commit messages and structured AST Matchers.

Phase 1: The Hurdle

Our initial attempt was straightforward (the "Naive Pipeline"). We fed the diff and commit message directly into an LLM (like GPT-4) and asked it to "write a Clang AST Matcher."

Prototype Pipeline
Figure 2: The Naive Pipeline. A simple single-pass prompt that often led to compilation errors.

The Result? Failure. The generated C++ code rarely compiled. The LLM would hallucinate Clang APIs that didn't exist or write matchers that were syntactically correct but semantically meaningless. We realized that a single inference step was insufficient for the complexity of the Clang AST.

The Pivot: Strategic Evolution

Faced with these failures, we had to rethink our approach. We analyzed why the model was failing and realized there was a reasoning gap. A human developer doesn't just "output code"; they analyze the bug, formulate a logic, and then iteratively refine their code.

We decided to restructure the pipeline to mimic this cognitive process. The diagram below illustrates our thought process: moving from "direct generation" to a layered approach that integrates reasoning (Chain-of-Thought) and feedback (Self-Correction).

Strategic Evolution Process
Figure 3: Strategic Evolution. Our reasoning process for evolving the pipeline from simple prompting to a robust, multi-agent system.
The Solution: An Intelligent Pipeline

Guided by the strategy above, we implemented two key architectural improvements that transformed LinuxGuard into a working tool.

1. Chain-of-Thought Synthesis

We decoupled analysis from generation. We forced the model to "think" before it coded: first, explain why the code is buggy; second, describe the pattern in plain English; and only then, generate the C++ matcher code.

Enhanced Pipeline with CoT
Figure 4: Chain-of-Thought. By explicitly separating reasoning from coding, we significantly improved semantic accuracy.
2. The Self-Healing Loop

To handle syntax errors, we introduced a Repair Agent. When the Clang compiler throws an error, we capture the `stderr` output and feed it back to the LLM, creating a closed-loop system that iterates until the checker compiles successfully.

Repair Loop
Figure 5: Self-Healing. The system automatically fixes compilation errors without human intervention.
System Integration

Finally, the generated checkers were not just theoretical. We built a harness that injects these dynamically generated matchers into the standard Linux kernel build process as full clang-tidy plugins, enabling us to scan historical kernel versions.

LLVM Structure
Figure 6: Real-world Integration. Embedding our AI-generated tools into the LLVM/Clang ecosystem.
The Engine: Pipeline Orchestrator

At the heart of LinuxGuard lies the Orchestrator, a Python-based engine that manages the entire lifecycle of a checker. The excerpt below highlights the core generate_checker loop, which drives the Chain-of-Thought generation and manages the self-healing compilation process.

pipeline_orchestrator.py (Core Logic Excerpt)
class PipelineOrchestrator:
    """Orchestrates the complete pipeline with iterative generation and repair."""

    def generate_checker(self, commit_hash: str) -> Optional[Dict]:
        # ... [Initial setup and context loading omitted] ...

        for iteration in range(1, self.max_iterations + 1):
            print(f"━━━ Iteration {iteration}/{self.max_iterations} ━━━")

            # Stage 1: Multi-commit Pattern Extraction
            # We aggregate insights from multiple similar commits to avoid overfitting
            candidate_commits = self.collect_candidate_commits(commit_hash)
            analysis_result = self.analyze_commits(candidate_commits)
            
            if not analysis_result:
                continue

            pattern, guidance = analysis_result
            
            # Stage 2 & 3: Synthesis & Implementation (Chain-of-Thought)
            # The model first plans the logic (Module 1) then writes the C++ (Module 2)
            checker_info = self.implement_checker(pattern, guidance)

            # Stage 4: Build & Repair Loop (Self-Healing)
            build_success = False
            for attempt in range(1, self.max_repair_attempts + 1):
                
                # Try to compile the generated Clang plugin
                build_success, errors = self.build_checker(checker_info)

                if build_success:
                    break 

                # If build fails, feed stderr back to the LLM for repair
                print(f"🔧 Attempting repair ({attempt}/{self.max_repair_attempts})...")
                repaired = self.repair_checker(checker_info, errors, pattern)
                
                if not repaired:
                    break

            if not build_success:
                continue # Retry with a fresh generation if repair fails

            # Stage 5: Validation
            # Run the compiled checker on historical kernels to verify findings
            is_valid = self.validate_checker(checker_info, commit_hash)

            if is_valid:
                print("✓ SUCCESS: Valid checker generated!")
                return checker_info

        return None

    def repair_checker(self, checker_info: Dict, errors: str, pattern: Dict) -> bool:
        """Use LLM to repair compilation errors by analyzing clang diagnostic output."""
        
        # Load current (broken) source code
        h_code, cpp_code = self.load_checker_source(checker_info)

        # Create repair prompt with the specific compiler errors
        prompt = build_repair_prompt(
            code=(h_code, cpp_code),
            compiler_errors=errors,
            context=pattern
        )

        # Ask LLM to fix the syntax/API issues
        response = self.repair_model.generate_content(prompt)
        
        # ... [Code extraction and file saving omitted] ...

        return True
Resources

© 2025 Xuming Huang.