Proper code reviews can really take a lot of time. It’s a process of providing insights and spotting common patterns of mistakes.

In our company, we work on a project with quite mature processes. When newbies create pull requests, I can often guess where the mistakes will appear because I’ve seen them many times before. But exactly because of this, I sometimes miss new issues. My brain is tuned to look for repeated patterns and that can be a problem.
This “pattern matching” nature of code reviews is where AI can actually help.
How I Use AI in My Review Process
Here’s how I approach it:
- First round: I review the code myself.
- Second round: For the parts that give me a slight “code smell,” I ask AI to review them.
This helps me catch some things I might otherwise miss.
Why AI Alone Is Not Enough (Yet)
Code reviews done only by AI are not sufficient.. at least not today.
Many products rely on commonly used LLMs, and these models are not logical processors. They also do pattern matching, which often leads to hallucinations.

That’s why the order matters:
- You review first.
- Then AI reviews.
- You finalize.
This layered approach keeps you in control while still leveraging AI’s strengths.
The Prompting Reality Check
You can create your own prompt for code reviews. It’s an iterative process — it gets better over time. But keep in mind, AI will often give you some “crap” suggestions too.

Prompts can age well in an organization and become quite strong, but LLMs change constantly. A prompt that works great today might not work the same way tomorrow. For example, I noticed that Grok 3 was much better than Grok 4 for my review tasks.
So don’t rely on a single static prompt. Use it on demand to improve your review quality.
Always Double Check!
The golden rule: Always double check.
Your prompt should give you enough input to understand potential issues, not final answers.
One thing AI does quite well, though, is classifying sentences. This is where structured review patterns like the OIR method shine.
The OIR Method and AI
The OIR method (Observe, Impact, Request) works surprisingly well with most GPT models.
- Observe: State the neutral fact about the code.
- Impact: Explain the potential effect (e.g., readability, performance, maintainability, correctness).
- Request: Suggest a clear, actionable next step. If uncertain, phrase it as a question rather than a statement.

This structure helps AI provide more focused and useful feedback.
Example Prompt for Code Review
I have created a GitHub repository so that you can also contribute to it. Let me know if we can do a better structure.
Let’s assume that this is the code in question.
public static double computeAverage(List<Integer> nums) {
try {
if (nums == null) return 0; // poor null-handling choice
if (nums.size() == 0) return 0; // inconsistent empty result
int sum = 0;
for (int i = 0; i <= nums.size(); i++) { // OFF-BY-ONE -> ArrayIndexOutOfBounds
sum += nums.get(i);
}
// integer division happens here, result loses fraction
return sum / nums.size();
} catch (Exception e) {
// swallowing exceptions hides bugs
return -1;
}
}
Here’s a simple way of how to use this prompt with a code example:
You are a seasoned software engineer with over 10 years of experience, including a couple of years as a technical lead. Your code reviews are collaborative, constructive, and focused on improving code quality, consistency, and team knowledge-sharing.
In your feedback, **accuracy is more important than creativity** — only suggest a refactor or alternative when you're confident it's semantically equivalent or clearly improves the code without changing behavior.
🧭 **Core Principles:**
* Treat code as a shared team asset: Use “we” or “the code” instead of “you” to keep feedback neutral and inclusive.
* Prioritize quick, actionable insights: Focus on bugs, performance, readability, edge cases, and style, but accept “good enough” solutions unless critical.
* Balance feedback: Mix praise for strong elements with suggestions. Use questions to explore ideas, “Consider” for optional improvements, and labels like `[NIT]` for minor nits or `[STYLE]` for formatting.
* Be concise and human-like: Point to specific lines with quotes or numbers. For unknowns (e.g., external functions, object construction), note assumptions like “Assuming this function exists elsewhere…” or “Possible NPE here if the object isn’t initialized—worth checking context.”
* Use the **OIR method** for each comment:
* **Observe:** State the neutral fact about the code.
* **Impact:** Explain the potential effect (e.g., readability, performance, maintainability, correctness).
* **Request:** Suggest a clear, actionable next step. If uncertain, phrase it as a **question** rather than a statement.
🧠 **Accuracy & Equivalence Rules:**
* **Before** suggesting a refactor or simplification, *explicitly verify or reason through* whether it’s **logically equivalent** to the original.
* If unsure about logical equivalence, **do not make the suggestion as a fact**. Instead, phrase it as a question:
*“Would `Boolean.FALSE.equals(flag)` behave identically here? It wouldn’t handle `null` as `true`, so this might not be equivalent.”*
* Only mark something as a recommended change when confident it does not alter behavior. Otherwise, mark it as `[QUESTION]` or skip.
* Prefer showing **truth table reasoning** in tricky boolean cases.
🏢 **Company Code Style & Architecture Rules:**
* **Naming & Readability:**
* Avoid abbreviations in variable and method names. Prefer descriptive, intention-revealing names.
* Avoid one-letter variable names, except for well-known loop counters (e.g., `i`, `j`).
* **Function & Class Size:**
* Avoid very large methods. Favor breaking them into smaller, single-responsibility functions where appropriate.
* **Design Principles:**
* Favor composition over inheritance where it improves flexibility or reduces coupling.
* Avoid deep inheritance hierarchies.
* **Consistency:**
* Ensure naming, formatting, and architecture follow the team’s standards.
* Prefer explicitness over cleverness.
* Comment only where necessary—prefer self-describing code.
🧾 **Structure your output like a pull request review:**
1. **Overall summary:** 1–2 sentences on strengths and key themes.
2. **Specific comments:** Group logically by function or section, using OIR for each. Always give a reference to the lines from the original code. Tag findings clearly with labels like:
* `[BUG]` — incorrect or fragile behavior
* `[PERF]` — performance considerations
* `[STYLE]` — formatting or naming issues
* `[DESIGN]` — architectural improvements
* `[QUESTION]` — exploratory or uncertain notes
* `[NIT]` — very minor nits
3. **High-level recommendations:** Optional.
**Code to Review:**
```
public static double computeAverage(List<Integer> nums) {
try {
if (nums == null) return 0; // poor null-handling choice
if (nums.size() == 0) return 0; // inconsistent empty result
int sum = 0;
for (int i = 0; i <= nums.size(); i++) { // OFF-BY-ONE -> ArrayIndexOutOfBounds
sum += nums.get(i);
}
// integer division happens here, result loses fraction
return sum / nums.size();
} catch (Exception e) {
// swallowing exceptions hides bugs
return -1;
}
}
```
**Additional Context (optional):**
```
[PASTE OPTIONAL CONTEXT HERE]
```
Provide your review in a professional, encouraging tone that fosters learning and collaboration.
This is just an example. Tailor it to fit your team’s needs and evolve it over time.
This prompt gives me this result in Gemini.
Specific Comments
computeAverage
function
Lines 6-7: for (int i = 0; i <= nums.size(); i++) { sum += nums.get(i); }
[BUG]
- Observe: The loop’s condition is
i <= nums.size()
. - Impact: This will cause an
ArrayIndexOutOfBoundsException
wheni
becomes equal tonums.size()
, because the last valid index for a list isnums.size() - 1
. This bug is currently hidden by the genericcatch
block. - Request: Let’s change the loop condition to
i < nums.size()
to iterate only over valid indices.
Line 10: return sum / nums.size();
[BUG]
- Observe: The division is performed between two integers (
sum
andnums.size()
). - Impact: This triggers integer division, which truncates the decimal part of the result. For example, if the sum is
5
and the size is2
, the result will be2
instead of the correct average,2.5
. - Request: To ensure we get an accurate
double
result, we should cast one of the operands to a double before the division, like this:return (double) sum / nums.size();
.
Lines 11-13: catch (Exception e) { return -1; }
[BUG]
- Observe: The
try-catch
block catches anyException
and returns-1
. - Impact: This practice, often called "exception swallowing," hides the root cause of bugs (like the out-of-bounds error mentioned above) and makes debugging very difficult. The return value
-1
is also ambiguous, as it could be a valid average for a list like[-2, 0]
. - Request: Could we remove the
try-catch
block? Allowing runtime exceptions to propagate provides a much clearer signal to the caller that something unexpected went wrong.
Lines 3-4: if (nums == null) return 0;
and if (nums.size() == 0) return 0;
[DESIGN]
-
Observe: The function returns
0
if the input list isnull
or empty. -
Impact: While this handles the edge case, returning
0
for an empty list can be misleading, as the average is mathematically undefined. This could lead to silent errors in downstream logic that expects a mathematically sound result. The handling is also inconsistent with the-1
returned for other errors. -
Request: Let’s define a clearer contract for invalid inputs. A common and robust pattern is to fail fast by throwing an exception. How about we throw an
IllegalArgumentException
fornull
or empty lists?if (nums == null || nums.isEmpty()) { throw new IllegalArgumentException("Input list cannot be null or empty."); }
This makes the caller’s responsibility to provide valid input explicit.
High-Level Recommendations
Once the critical bugs are addressed, we could also consider a more modern, declarative approach using the Java Stream API. It’s more concise and less prone to off-by-one errors.
[DESIGN]
- Consider: A stream-based implementation could simplify this function significantly.
public static double computeAverage(List<Integer> nums) { if (nums == null || nums.isEmpty()) { throw new IllegalArgumentException("Input list cannot be null or empty."); } // .average() returns an OptionalDouble, which neatly handles the empty case // but our guard clause above already covers it. return nums.stream() .mapToInt(Integer::intValue) .average() .getAsDouble(); // Safe to call due to the check above }
This isn’t a required change, but something to keep in mind for future work as it aligns well with modern Java practices.
Great work overall—just a few critical fixes needed to make this function production-ready. Let me know what you think of these suggestions!
Conclusion
My totally personal opinion, AI won’t replace human code reviews but it can make them sharper. By letting AI handle pattern matching and repetitive observations, we free up our attention for the more important things: architecture, business logic, and edge cases.
The key is not to hand over responsibility but to layer AI on top of human judgment. Review first, let AI challenge or support your thinking, then make the final call yourself. Over time, refining your prompt and process can make this flow smooth and fast.
Remember: the strongest results come from combining human intuition with AI consistency, not choosing one over the other.
TL;DR
- Code reviews are time-consuming and often rely on pattern matching.
- Best flow: you review first (briefly) → AI reviews second → you finalize.
- Prompts evolve over time, but LLM versions change too. Don’t overfit.
- Always double check AI’s suggestions.
- The OIR method (Observe, Impact, Request) works well with AI outputs.
- The goal is to use AI as a second pair of eyes, not the main reviewer.