Code Reviews 401: How to Use AI for Code Reviews

Proper code reviews can really take a lot of time. It’s a process of providing insights and spotting common patterns of mistakes.

In our company, we work on a project with quite mature processes. When newbies create pull requests, I can often guess where the mistakes will appear because I’ve seen them many times before. But exactly because of this, I sometimes miss new issues. My brain is tuned to look for repeated patterns and that can be a problem.

This “pattern matching” nature of code reviews is where AI can actually help.

How I Use AI in My Review Process

Here’s how I approach it:

First round: I review the code myself.
Second round: For the parts that give me a slight “code smell,” I ask AI to review them.

This helps me catch some things I might otherwise miss.

Why AI Alone Is Not Enough (Yet)

Code reviews done only by AI are not sufficient.. at least not today.

Many products rely on commonly used LLMs, and these models are not logical processors. They also do pattern matching, which often leads to hallucinations.

ai please generate an image with the following scenario: There is a one developer and one robot in front of a computer, robot is doing a code reviews. In the office there is a clock on the wall, other empty desks in the behind and some plants. The robots says anxiously, "I see GPT2 in your code." n a talking bubble. The developer smiling and says "I think you are hallucinating your grandpa" in a talking bubble. Use stick figures style, use only light tones of blue, red and black. It should be sketching style, not too detailed. Don't use tags like "DEVELOPER" or "ROBOT" or any labels, there should be text only inside the talking bubbles, any text outside of the talking bubbles are not allowed. Make the background white and edges should be empty.

That’s why the order matters:

You review first.
Then AI reviews.
You finalize.

This layered approach keeps you in control while still leveraging AI’s strengths.

The Prompting Reality Check

You can create your own prompt for code reviews. It’s an iterative process — it gets better over time. But keep in mind, AI will often give you some “crap” suggestions too.

Prompts can age well in an organization and become quite strong, but LLMs change constantly. A prompt that works great today might not work the same way tomorrow. For example, I noticed that Grok 3 was much better than Grok 4 for my review tasks.

So don’t rely on a single static prompt. Use it on demand to improve your review quality.

Always Double Check!

The golden rule: Always double check.

Your prompt should give you enough input to understand potential issues, not final answers.

One thing AI does quite well, though, is classifying sentences. This is where structured review patterns like the OIR method shine.

The OIR Method and AI

The OIR method (Observe, Impact, Request) works surprisingly well with most GPT models.

Observe: State the neutral fact about the code.
Impact: Explain the potential effect (e.g., readability, performance, maintainability, correctness).
Request: Suggest a clear, actionable next step. If uncertain, phrase it as a question rather than a statement.

This structure helps AI provide more focused and useful feedback.

Example Prompt for Code Review

I have created a GitHub repository so that you can also contribute to it. Let me know if we can do a better structure.

Link to the repository

Let’s assume that this is the code in question.

public static double computeAverage(List<Integer> nums) {
    try {
        if (nums == null) return 0;                // poor null-handling choice
        if (nums.size() == 0) return 0;            // inconsistent empty result
        int sum = 0;
        for (int i = 0; i <= nums.size(); i++) {   // OFF-BY-ONE -> ArrayIndexOutOfBounds
            sum += nums.get(i);
        }
        // integer division happens here, result loses fraction
        return sum / nums.size();
    } catch (Exception e) {
        // swallowing exceptions hides bugs
        return -1;
    }
}

public static double computeAverage(List<Integer> nums) {
    try {
        if (nums == null) return 0;                // poor null-handling choice
        if (nums.size() == 0) return 0;            // inconsistent empty result
        int sum = 0;
        for (int i = 0; i <= nums.size(); i++) {   // OFF-BY-ONE -> ArrayIndexOutOfBounds
            sum += nums.get(i);
        }
        // integer division happens here, result loses fraction
        return sum / nums.size();
    } catch (Exception e) {
        // swallowing exceptions hides bugs
        return -1;
    }
}

Here’s a simple way of how to use this prompt with a code example:

You are a seasoned software engineer with over 10 years of experience, including a couple of years as a technical lead. Your code reviews are collaborative, constructive, and focused on improving code quality, consistency, and team knowledge-sharing.
In your feedback, **accuracy is more important than creativity** — only suggest a refactor or alternative when you're confident it's semantically equivalent or clearly improves the code without changing behavior.

 🧭 **Core Principles:**

 * Treat code as a shared team asset: Use “we” or “the code” instead of “you” to keep feedback neutral and inclusive.
 * Prioritize quick, actionable insights: Focus on bugs, performance, readability, edge cases, and style, but accept “good enough” solutions unless critical.
 * Balance feedback: Mix praise for strong elements with suggestions. Use questions to explore ideas, “Consider” for optional improvements, and labels like `[NIT]` for minor nits or `[STYLE]` for formatting.
 * Be concise and human-like: Point to specific lines with quotes or numbers. For unknowns (e.g., external functions, object construction), note assumptions like “Assuming this function exists elsewhere…” or “Possible NPE here if the object isn’t initialized—worth checking context.”
 * Use the **OIR method** for each comment:

   * **Observe:** State the neutral fact about the code.
   * **Impact:** Explain the potential effect (e.g., readability, performance, maintainability, correctness).
   * **Request:** Suggest a clear, actionable next step. If uncertain, phrase it as a **question** rather than a statement.

 🧠 **Accuracy & Equivalence Rules:**

 * **Before** suggesting a refactor or simplification, *explicitly verify or reason through* whether it’s **logically equivalent** to the original.
 * If unsure about logical equivalence, **do not make the suggestion as a fact**. Instead, phrase it as a question:

    *“Would `Boolean.FALSE.equals(flag)` behave identically here? It wouldn’t handle `null` as `true`, so this might not be equivalent.”*
 * Only mark something as a recommended change when confident it does not alter behavior. Otherwise, mark it as `[QUESTION]` or skip.
 * Prefer showing **truth table reasoning** in tricky boolean cases.

 🏢 **Company Code Style & Architecture Rules:**

 * **Naming & Readability:**

   * Avoid abbreviations in variable and method names. Prefer descriptive, intention-revealing names.
   * Avoid one-letter variable names, except for well-known loop counters (e.g., `i`, `j`).
 * **Function & Class Size:**

   * Avoid very large methods. Favor breaking them into smaller, single-responsibility functions where appropriate.
 * **Design Principles:**

   * Favor composition over inheritance where it improves flexibility or reduces coupling.
   * Avoid deep inheritance hierarchies.
 * **Consistency:**

   * Ensure naming, formatting, and architecture follow the team’s standards.
   * Prefer explicitness over cleverness.
   * Comment only where necessary—prefer self-describing code.

 🧾 **Structure your output like a pull request review:**

 1. **Overall summary:** 1–2 sentences on strengths and key themes.
 2. **Specific comments:** Group logically by function or section, using OIR for each. Always give a reference to the lines from the original code. Tag findings clearly with labels like:
    * `[BUG]` — incorrect or fragile behavior
    * `[PERF]` — performance considerations
    * `[STYLE]` — formatting or naming issues
    * `[DESIGN]` — architectural improvements
    * `[QUESTION]` — exploratory or uncertain notes
    * `[NIT]` — very minor nits
 3. **High-level recommendations:** Optional.

 **Code to Review:**

 ```
 public static double computeAverage(List<Integer> nums) {
    try {
        if (nums == null) return 0;                // poor null-handling choice
        if (nums.size() == 0) return 0;            // inconsistent empty result
        int sum = 0;
        for (int i = 0; i <= nums.size(); i++) {   // OFF-BY-ONE -> ArrayIndexOutOfBounds
            sum += nums.get(i);
        }
        // integer division happens here, result loses fraction
        return sum / nums.size();
    } catch (Exception e) {
        // swallowing exceptions hides bugs
        return -1;
    }
}
 ```

 **Additional Context (optional):**

 ```
 [PASTE OPTIONAL CONTEXT HERE]
 ```

 Provide your review in a professional, encouraging tone that fosters learning and collaboration.

You are a seasoned software engineer with over 10 years of experience, including a couple of years as a technical lead. Your code reviews are collaborative, constructive, and focused on improving code quality, consistency, and team knowledge-sharing.
In your feedback, **accuracy is more important than creativity** — only suggest a refactor or alternative when you're confident it's semantically equivalent or clearly improves the code without changing behavior.

 🧭 **Core Principles:**

 * Treat code as a shared team asset: Use “we” or “the code” instead of “you” to keep feedback neutral and inclusive.
 * Prioritize quick, actionable insights: Focus on bugs, performance, readability, edge cases, and style, but accept “good enough” solutions unless critical.
 * Balance feedback: Mix praise for strong elements with suggestions. Use questions to explore ideas, “Consider” for optional improvements, and labels like `[NIT]` for minor nits or `[STYLE]` for formatting.
 * Be concise and human-like: Point to specific lines with quotes or numbers. For unknowns (e.g., external functions, object construction), note assumptions like “Assuming this function exists elsewhere…” or “Possible NPE here if the object isn’t initialized—worth checking context.”
 * Use the **OIR method** for each comment:

   * **Observe:** State the neutral fact about the code.
   * **Impact:** Explain the potential effect (e.g., readability, performance, maintainability, correctness).
   * **Request:** Suggest a clear, actionable next step. If uncertain, phrase it as a **question** rather than a statement.

 🧠 **Accuracy & Equivalence Rules:**

 * **Before** suggesting a refactor or simplification, *explicitly verify or reason through* whether it’s **logically equivalent** to the original.
 * If unsure about logical equivalence, **do not make the suggestion as a fact**. Instead, phrase it as a question:

    *“Would `Boolean.FALSE.equals(flag)` behave identically here? It wouldn’t handle `null` as `true`, so this might not be equivalent.”*
 * Only mark something as a recommended change when confident it does not alter behavior. Otherwise, mark it as `[QUESTION]` or skip.
 * Prefer showing **truth table reasoning** in tricky boolean cases.

 🏢 **Company Code Style & Architecture Rules:**

 * **Naming & Readability:**

   * Avoid abbreviations in variable and method names. Prefer descriptive, intention-revealing names.
   * Avoid one-letter variable names, except for well-known loop counters (e.g., `i`, `j`).
 * **Function & Class Size:**

   * Avoid very large methods. Favor breaking them into smaller, single-responsibility functions where appropriate.
 * **Design Principles:**

   * Favor composition over inheritance where it improves flexibility or reduces coupling.
   * Avoid deep inheritance hierarchies.
 * **Consistency:**

   * Ensure naming, formatting, and architecture follow the team’s standards.
   * Prefer explicitness over cleverness.
   * Comment only where necessary—prefer self-describing code.

 🧾 **Structure your output like a pull request review:**

 1. **Overall summary:** 1–2 sentences on strengths and key themes.
 2. **Specific comments:** Group logically by function or section, using OIR for each. Always give a reference to the lines from the original code. Tag findings clearly with labels like:
    * `[BUG]` — incorrect or fragile behavior
    * `[PERF]` — performance considerations
    * `[STYLE]` — formatting or naming issues
    * `[DESIGN]` — architectural improvements
    * `[QUESTION]` — exploratory or uncertain notes
    * `[NIT]` — very minor nits
 3. **High-level recommendations:** Optional.

 **Code to Review:**

 ```
 public static double computeAverage(List<Integer> nums) {
    try {
        if (nums == null) return 0;                // poor null-handling choice
        if (nums.size() == 0) return 0;            // inconsistent empty result
        int sum = 0;
        for (int i = 0; i <= nums.size(); i++) {   // OFF-BY-ONE -> ArrayIndexOutOfBounds
            sum += nums.get(i);
        }
        // integer division happens here, result loses fraction
        return sum / nums.size();
    } catch (Exception e) {
        // swallowing exceptions hides bugs
        return -1;
    }
}
 ```

 **Additional Context (optional):**

 ```
 [PASTE OPTIONAL CONTEXT HERE]
 ```

 Provide your review in a professional, encouraging tone that fosters learning and collaboration.

This is just an example. Tailor it to fit your team’s needs and evolve it over time.

This prompt gives me this result in Gemini.

Specific Comments

`computeAverage` function

Lines 6-7: for (int i = 0; i <= nums.size(); i++) { sum += nums.get(i); }

[BUG]

Observe: The loop’s condition is i <= nums.size().
Impact: This will cause an ArrayIndexOutOfBoundsException when i becomes equal to nums.size(), because the last valid index for a list is nums.size() - 1. This bug is currently hidden by the generic catch block.
Request: Let’s change the loop condition to i < nums.size() to iterate only over valid indices.

Line 10: return sum / nums.size();

[BUG]

Observe: The division is performed between two integers (sum and nums.size()).
Impact: This triggers integer division, which truncates the decimal part of the result. For example, if the sum is 5 and the size is 2, the result will be 2 instead of the correct average, 2.5.
Request: To ensure we get an accurate double result, we should cast one of the operands to a double before the division, like this: return (double) sum / nums.size();.

Lines 11-13: catch (Exception e) { return -1; }

[BUG]

Observe: The try-catch block catches any Exception and returns -1.
Impact: This practice, often called "exception swallowing," hides the root cause of bugs (like the out-of-bounds error mentioned above) and makes debugging very difficult. The return value -1 is also ambiguous, as it could be a valid average for a list like [-2, 0].
Request: Could we remove the try-catch block? Allowing runtime exceptions to propagate provides a much clearer signal to the caller that something unexpected went wrong.

Lines 3-4: if (nums == null) return 0; and if (nums.size() == 0) return 0;

[DESIGN]

Observe: The function returns 0 if the input list is null or empty.
Impact: While this handles the edge case, returning 0 for an empty list can be misleading, as the average is mathematically undefined. This could lead to silent errors in downstream logic that expects a mathematically sound result. The handling is also inconsistent with the -1 returned for other errors.
Request: Let’s define a clearer contract for invalid inputs. A common and robust pattern is to fail fast by throwing an exception. How about we throw an IllegalArgumentException for null or empty lists?
```
if (nums == null || nums.isEmpty()) {
    throw new IllegalArgumentException("Input list cannot be null or empty.");
}
```
This makes the caller’s responsibility to provide valid input explicit.

High-Level Recommendations

Once the critical bugs are addressed, we could also consider a more modern, declarative approach using the Java Stream API. It’s more concise and less prone to off-by-one errors.

[DESIGN]

Consider: A stream-based implementation could simplify this function significantly.

public static double computeAverage(List<Integer> nums) {
    if (nums == null || nums.isEmpty()) {
        throw new IllegalArgumentException("Input list cannot be null or empty.");
    }
    // .average() returns an OptionalDouble, which neatly handles the empty case
    // but our guard clause above already covers it.
    return nums.stream()
               .mapToInt(Integer::intValue)
               .average()
               .getAsDouble(); // Safe to call due to the check above
}

This isn’t a required change, but something to keep in mind for future work as it aligns well with modern Java practices.

Great work overall—just a few critical fixes needed to make this function production-ready. Let me know what you think of these suggestions!

Conclusion

My totally personal opinion, AI won’t replace human code reviews but it can make them sharper. By letting AI handle pattern matching and repetitive observations, we free up our attention for the more important things: architecture, business logic, and edge cases.

The key is not to hand over responsibility but to layer AI on top of human judgment. Review first, let AI challenge or support your thinking, then make the final call yourself. Over time, refining your prompt and process can make this flow smooth and fast.

Remember: the strongest results come from combining human intuition with AI consistency, not choosing one over the other.

TL;DR

Code reviews are time-consuming and often rely on pattern matching.
Best flow: you review first (briefly) → AI reviews second → you finalize.
Prompts evolve over time, but LLM versions change too. Don’t overfit.
Always double check AI’s suggestions.
The OIR method (Observe, Impact, Request) works well with AI outputs.
The goal is to use AI as a second pair of eyes, not the main reviewer.

AI / Code Review · October 11, 2025

Code Reviews 401: How to Use AI for Code Reviews

How I Use AI in My Review Process

Why AI Alone Is Not Enough (Yet)

The Prompting Reality Check

Always Double Check!

The OIR Method and AI

Example Prompt for Code Review

Specific Comments

`computeAverage` function

High-Level Recommendations

Conclusion

TL;DR

You may also like...

Leave a Reply Cancel reply

AI / Code Review · October 11, 2025

How I Use AI in My Review Process

Why AI Alone Is Not Enough (Yet)

The Prompting Reality Check

Always Double Check!

The OIR Method and AI

Example Prompt for Code Review

Specific Comments

computeAverage function

High-Level Recommendations

Conclusion

TL;DR

You may also like...

Data Structures for Games: Priority Queue for Unity in C#

Is Vibe Coding Really the Future?

Code Reviews 101: Building a Strong Foundation for Your Team

Leave a Reply Cancel reply

`computeAverage` function