Context-Aware Legal RAG System

Context-Aware Legal RAG System

Context-Aware Legal RAG System

Date

Date

Date

Dec 2025

Dec 2025

Dec 2025

Service

Service

Service

RAG, LLM

RAG, LLM

RAG, LLM

Event

Event

Event

Personal Project

Personal Project

Personal Project

High-Precision Retrieval with Double-Hop Query Reformulation

The Problem

Legal documents are written in complex, formal terminology ("legalese"), while users typically ask questions in casual, colloquial language. Standard RAG systems often fail to bridge this semantic gap; a vector search for "My boss cut my pay" might miss the relevant document titled "Wage Deduction Regulations" because the keywords do not align. This mismatch leads to poor retrieval accuracy and model hallucinations.

The Solution

I engineered a specialized "Double-Hop" RAG architecture designed to translate user intent into legal precision. The system does not just search; it interprets. By implementing an intermediate reasoning layer, the system reformulates ambiguous user questions (e.g., "Can I sell waqf land?") into standardized legal queries (e.g., "Legal status of waqf land transfer under Law No. 41/2004") before executing the final search.

Technical Architecture

The pipeline utilizes a multi-stage retrieval strategy to maximize precision:

  • Query Reformulation Agent: An LLM (Qwen 3 32B) analyzes the initial user input and rewrites it into formal legal queries, bridging the vocabulary gap.

  • Double-Hop Retrieval: The system performs a broad initial search to gather context, refines the query based on that context, and then executes a second "precision hop" to retrieve specific regulations.

  • Cross-Encoder Reranking: To ensure the most relevant regulation appears at the top, the final retrieved documents undergo a reranking process, achieving a 93% Mean Reciprocal Rank (MRR).

Performance & Impact

The system was rigorously benchmarked using an "LLM-as-a-Judge" framework on a custom Indonesian legal dataset.

  • 100% Hit Rate: Successfully retrieved the correct legal document for every test query.

  • 98.6% Faithfulness: The logic layers significantly reduced hallucinations, ensuring answers were strictly grounded in the retrieved statutes.

  • Validation: Proven capability to map highly informal inputs (e.g., "curhat" style questions) to exact legal articles without losing context.

Core Stack

Python, Qwen 3 (via Groq), FAISS, Google Embedding-004, Modal (Serverless Deployment).

More projects

Got questions?

I’m always excited to collaborate on innovative and exciting projects!

E-mail

hafiz@abdielz.tech

Got questions?

I’m always excited to collaborate on innovative and exciting projects!

E-mail

hafiz@abdielz.tech

Got questions?

I’m always excited to collaborate on innovative and exciting projects!

E-mail

hafiz@abdielz.tech

©2026 Muhammad Abdiel Al Hafiz

©2026 Muhammad Abdiel Al Hafiz

©2026 Muhammad Abdiel Al Hafiz

Create a free website with Framer, the website builder loved by startups, designers and agencies.