M.Tech Dissertation · Mid Semester Presentation · April 2026

Road Quality Assessment
Using Computer Vision
and GPS-Based Mapping

Student

Anish A. · 2024TM93051

Program

M.Tech Software Engg · BITS Pilani WILP

Supervisor

Dr. Kavya Manohar

Additional Examiner

Mr. Ashik Salahudeen

The Problem

Navigation systems ignore road surface quality

Maps

Optimise distance & time only

Road surface quality is invisible to every major navigation system today.

🇮🇳 India

Uniquely complex road conditions

High visual clutter, monsoon damage, diverse surfaces. Existing models trained on Japan, US and European datasets perform poorly here.

Surveys

Government data is not publicly accessible

Municipal and government road condition data is not synced or available for general public use.

Can dashcam footage and AI automatically classify road quality and map it at scale for Indian roads?

End-to-End Pipeline

System Architecture

Dashcam Video

4 cameras
Kerala roads

›

OCR + Filter

~34k frames/cam
→ ~26k after filter

›

Crowdsourced Consensus Annotation

3,211 images · 53 contributors
Gamified platform

›

Model Training

15 architectures
72 HP trials

›

GPS Snapping

OSM / osmnx
UTM projection

›

Interactive Map

Folium / Leaflet
808 segments

Key finding

YOLO detection doesn't work

Very poor mAP — and finding individual potholes is not the objective. Segment classification is.

Dataset

3,211 images

Annotated by crowdsourcing · 53 contributors

Best model

Swin-Small

81.1% acc · 77% F1 (3-class)
53.8% acc · 47% F1 (5-class)

Output

808 road segments mapped

Kerala · Excellent / Good / Fair / Poor / Invalid

Model Selection

15 architectures tested — Swin-Small wins

Rank	Model	Accuracy	Macro F1
1	swin_small ✓	53.8%	47.0%
2	swin_tiny	51.5%	46.1%
3	resnet34	51.1%	45.4%
4	convnext_tiny	50.4%	44.8%
5	resnet18	49.7%	44.4%
6	vit_small	49.8%	44.0%
7–13	EfficientNet, ViT-Base, ConvNeXt…	45–50%	39–42%

Fixed: lr=1e-4 · weighted loss · 5-class · patience=10 · runtime: 1.6 hours

Why Swin beats ViT here

Global attention (ViT) is overkill for road texture — a local problem. Swin's hierarchical windows fit the task and generalise better on limited data.

Why not EfficientNet or ResNet?

Pure CNNs lack the long-range context to understand scene-level road quality. Swin combines local attention with multi-scale feature extraction.

architectures evaluated in a single systematic sweep

Architecture Deep Dive

Swin Transformer — swin_small_patch4_window7_224

Liu et al., Microsoft Research · "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" · ICCV 2021

Input

224×224×3

↓

Patch Embed
→56×56×96

›

Stage 1

W-MSA
56×56 C=96

SW-MSA

Merge
28×28×192

›

Stage 2

W-MSA
28×28 C=192

SW-MSA

Merge
14×14×384

›

Stage 3

W-MSA
14×14 C=384

SW-MSA ×6

Merge
7×7×768

›

Stage 4

W-MSA
7×7 C=768

SW-MSA

AvgPool→FC
3 classes

W-MSA — Window Self-Attention (7×7)

SW-MSA — Shifted Window (cross-boundary context)

Model Specs

Params~49M
Input224×224
Patch4×4
Window7×7
PretrainIN-22k
LR3e-5

Why this fits

Local texture: potholes are local — 7×7 windows match this.

Shifted windows: prevents attention blind spots at boundaries.

Hierarchical: multi-scale features for fine detail + scene context.

Hyper Parameter Tuning

72 trials · ~14 hours · optimal config found

Average F1 by parameter value

Learning Rate

3e-5

51.9% ✓

1e-4

51.7%

3e-4

51.7%

Freeze

None

52.1% ← best

Smoothing

None

51.9% ← best

Weighted Sampling

Unweighted

52.0% ← best

Optimal Configuration

modelswin_small
lr3e-5
freezenone
smoothing0.0
lossunweighted CE
schedulercosine

Best trial (5-class)

53.2%

Accuracy

56.8%

Macro F1

Final Results

3-class merging and retraining delivers the breakthrough

5-Class · Excellent / Good / Fair / Poor / Invalid

53%

Accuracy

57%

Macro F1

Excellent↔Good and Fair↔Poor boundaries are visually ambiguous. Annotators disagreed on these pairs — fine-grained classes don't exist cleanly in the data.

3-Class · Bad / Good / Invalid — Adopted ✓

81%

Accuracy

77%

Macro F1

Excellent+Good→Good · Fair+Poor→Bad. Directly answers the navigation question: "Is this segment safe to drive?"

Note — Hyperparameters

Trained separately on merged labels, but reuses 5-class hyperparameters. No dedicated HPO run yet — tuning for 3-class is a clear next step.

5-class: 53%

→

3-class: 81%

+28 pp · further gains expected with dedicated HPO

Live Demonstration

Demo

Crowdsourced Annotation Platform

Road Quality Maps
5-class & 3-class

Label Studio Labeling Tool

YOLO Detection Examples

Remaining Work

All tasks complete by May 1 — final report submission

Fix bugs in Maps

Resolve 3-class map rendering issues · validate GPS snapping accuracy · test both aggregation modes (pessimistic / majority)

Merge remaining classification / dataset

Consolidate any outstanding annotations · finalise 3-class label mapping · re-train on full merged dataset

End-to-End Integration Test

Full pipeline run: dashcam video → OCR → filter → classify → GPS snap → Folium map output · validate results on held-out test data

Anonymise & release annotated dataset

Remove personally identifiable information from frames · prepare dataset card · open release for research community

Report writing

Complete dissertation: methodology, results, discussion, conclusions, future work · supervisor review and final submission

May 1

Final report submission deadline

All 5 tasks targeted for completion
within April 2026.

No critical risks. GPS, model, and map
pipeline are all operational.

M.Tech Dissertation · BITS Pilani WILP · 2025–2026

Thank You

Road Quality Assessment System Using Computer Vision and GPS-Based Mapping

Student

Anish A. · 2024TM93051

Supervisor

Dr. Kavya Manohar

Additional Examiner

Mr. Ashik Salahudeen

Road Quality Assessment Using Computer Vision and GPS-Based Mapping

Navigation systems ignore road surface quality

System Architecture

15 architectures tested — Swin-Small wins

Swin Transformer — swin_small_patch4_window7_224

72 trials · ~14 hours · optimal config found

3-class merging and retraining delivers the breakthrough

Demo

All tasks complete by May 1 — final report submission

Thank You

Road Quality Assessment
Using Computer Vision
and GPS-Based Mapping