Back to list

Context-Aware Legal RAG System

Date

Dec 2025

Service

RAG, LLM

Event

Personal Project

Live Preview

High-Precision Retrieval with Double-Hop Query Reformulation

The Problem

Legal documents are written in complex, formal terminology ("legalese"), while users typically ask questions in casual, colloquial language. Standard RAG systems often fail to bridge this semantic gap; a vector search for "My boss cut my pay" might miss the relevant document titled "Wage Deduction Regulations" because the keywords do not align. This mismatch leads to poor retrieval accuracy and model hallucinations.

The Solution

I engineered a specialized "Double-Hop" RAG architecture designed to translate user intent into legal precision. The system does not just search; it interprets. By implementing an intermediate reasoning layer, the system reformulates ambiguous user questions (e.g., "Can I sell waqf land?") into standardized legal queries (e.g., "Legal status of waqf land transfer under Law No. 41/2004") before executing the final search.

Technical Architecture

The pipeline utilizes a multi-stage retrieval strategy to maximize precision:

Query Reformulation Agent: An LLM (Qwen 3 32B) analyzes the initial user input and rewrites it into formal legal queries, bridging the vocabulary gap.
Double-Hop Retrieval: The system performs a broad initial search to gather context, refines the query based on that context, and then executes a second "precision hop" to retrieve specific regulations.
Cross-Encoder Reranking: To ensure the most relevant regulation appears at the top, the final retrieved documents undergo a reranking process, achieving a 93% Mean Reciprocal Rank (MRR).

Performance & Impact

The system was rigorously benchmarked using an "LLM-as-a-Judge" framework on a custom Indonesian legal dataset.

100% Hit Rate: Successfully retrieved the correct legal document for every test query.
98.6% Faithfulness: The logic layers significantly reduced hallucinations, ensuring answers were strictly grounded in the retrieved statutes.
Validation: Proven capability to map highly informal inputs (e.g., "curhat" style questions) to exact legal articles without losing context.

Core Stack

Python, Qwen 3 (via Groq), FAISS, Google Embedding-004, Modal (Serverless Deployment).

More projects

Echo Bsmart

Dec 2025

Echo Bsmart

Dec 2025

Echo Bsmart

Dec 2025

Know Your Sight

Dec 2024

Know Your Sight

Dec 2024

Know Your Sight

Dec 2024

All projects

Got questions?

I’m always excited to collaborate on innovative and exciting projects!

E-mail

hafiz@abdielz.tech

Phone

+628970889429

Schedule a call

Got questions?

I’m always excited to collaborate on innovative and exciting projects!

E-mail

hafiz@abdielz.tech

Phone

+628970889429

Schedule a call

Got questions?

I’m always excited to collaborate on innovative and exciting projects!

E-mail

hafiz@abdielz.tech

Phone

+628970889429

Schedule a call