M.Tech Dissertation · Mid Semester Presentation · April 2026

Road Quality Assessment
Using Computer Vision
and GPS-Based Mapping

Student
Anish A. · 2024TM93051
Program
M.Tech Software Engg · BITS Pilani WILP
Supervisor
Dr. Kavya Manohar
Additional Examiner
Mr. Ashik Salahudeen
The Problem

Navigation systems ignore road surface quality

Maps
Optimise distance & time only
Road surface quality is invisible to every major navigation system today.
🇮🇳 India
Uniquely complex road conditions
High visual clutter, monsoon damage, diverse surfaces. Existing models trained on Japan, US and European datasets perform poorly here.
Surveys
Government data is not publicly accessible
Municipal and government road condition data is not synced or available for general public use.
Can dashcam footage and AI automatically classify road quality and map it at scale for Indian roads?
End-to-End Pipeline

System Architecture

1
Dashcam Video
4 cameras
Kerala roads
2
OCR + Filter
~34k frames/cam
→ ~26k after filter
3
Crowdsourced Consensus Annotation
3,211 images · 53 contributors
Gamified platform
4
Model Training
15 architectures
72 HP trials
5
GPS Snapping
OSM / osmnx
UTM projection
6
Interactive Map
Folium / Leaflet
808 segments
Key finding
YOLO detection doesn't work
Very poor mAP — and finding individual potholes is not the objective. Segment classification is.
Dataset
3,211 images
Annotated by crowdsourcing · 53 contributors
Best model
Swin-Small
81.1% acc · 77% F1 (3-class)
53.8% acc · 47% F1 (5-class)
Output
808 road segments mapped
Kerala · Excellent / Good / Fair / Poor / Invalid
Model Selection

15 architectures tested — Swin-Small wins

Rank Model Accuracy Macro F1
1 swin_small ✓ 53.8% 47.0%
2swin_tiny51.5%46.1%
3resnet3451.1%45.4%
4convnext_tiny50.4%44.8%
5resnet1849.7%44.4%
6vit_small49.8%44.0%
7–13EfficientNet, ViT-Base, ConvNeXt…45–50%39–42%
Fixed: lr=1e-4 · weighted loss · 5-class · patience=10 · runtime: 1.6 hours
Why Swin beats ViT here
Global attention (ViT) is overkill for road texture — a local problem. Swin's hierarchical windows fit the task and generalise better on limited data.
Why not EfficientNet or ResNet?
Pure CNNs lack the long-range context to understand scene-level road quality. Swin combines local attention with multi-scale feature extraction.
15
architectures evaluated in a single systematic sweep
Architecture Deep Dive

Swin Transformer — swin_small_patch4_window7_224

Liu et al., Microsoft Research · "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" · ICCV 2021
Input
224×224×3
Patch Embed
→56×56×96
Stage 1
W-MSA
56×56 C=96
SW-MSA
Merge
28×28×192
Stage 2
W-MSA
28×28 C=192
SW-MSA
Merge
14×14×384
Stage 3
W-MSA
14×14 C=384
SW-MSA ×6
Merge
7×7×768
Stage 4
W-MSA
7×7 C=768
SW-MSA
AvgPool→FC
3 classes
W-MSA — Window Self-Attention (7×7)
SW-MSA — Shifted Window (cross-boundary context)
Model Specs
Params~49M
Input224×224
Patch4×4
Window7×7
PretrainIN-22k
LR3e-5
Why this fits
Local texture: potholes are local — 7×7 windows match this.

Shifted windows: prevents attention blind spots at boundaries.

Hierarchical: multi-scale features for fine detail + scene context.
Hyper Parameter Tuning

72 trials · ~14 hours · optimal config found

Average F1 by parameter value
Learning Rate
3e-5
51.9% ✓
1e-4
51.7%
3e-4
51.7%
Freeze
None
52.1% ← best
Smoothing
None
51.9% ← best
Weighted Sampling
Unweighted
52.0% ← best
Optimal Configuration
modelswin_small
lr3e-5
freezenone
smoothing0.0
lossunweighted CE
schedulercosine
Best trial (5-class)
53.2%
Accuracy
56.8%
Macro F1
Final Results

3-class merging and retraining delivers the breakthrough

5-Class · Excellent / Good / Fair / Poor / Invalid
53%
Accuracy
57%
Macro F1
Excellent↔Good and Fair↔Poor boundaries are visually ambiguous. Annotators disagreed on these pairs — fine-grained classes don't exist cleanly in the data.
3-Class · Bad / Good / Invalid — Adopted ✓
81%
Accuracy
77%
Macro F1
Excellent+Good→Good · Fair+Poor→Bad. Directly answers the navigation question: "Is this segment safe to drive?"
Note — Hyperparameters
Trained separately on merged labels, but reuses 5-class hyperparameters. No dedicated HPO run yet — tuning for 3-class is a clear next step.
5-class: 53%
3-class: 81%
+28 pp · further gains expected with dedicated HPO
Live Demonstration

Demo

1
Crowdsourced Annotation Platform
2
Road Quality Maps
5-class & 3-class
3
Label Studio Labeling Tool
4
YOLO Detection Examples
Remaining Work

All tasks complete by May 1 — final report submission

1
Fix bugs in Maps
Resolve 3-class map rendering issues · validate GPS snapping accuracy · test both aggregation modes (pessimistic / majority)
2
Merge remaining classification / dataset
Consolidate any outstanding annotations · finalise 3-class label mapping · re-train on full merged dataset
3
End-to-End Integration Test
Full pipeline run: dashcam video → OCR → filter → classify → GPS snap → Folium map output · validate results on held-out test data
4
Anonymise & release annotated dataset
Remove personally identifiable information from frames · prepare dataset card · open release for research community
5
Report writing
Complete dissertation: methodology, results, discussion, conclusions, future work · supervisor review and final submission
May 1
Final report submission deadline
All 5 tasks targeted for completion
within April 2026.

No critical risks. GPS, model, and map
pipeline are all operational.
M.Tech Dissertation · BITS Pilani WILP · 2025–2026

Thank You

Road Quality Assessment System Using Computer Vision and GPS-Based Mapping
Student
Anish A. · 2024TM93051
Supervisor
Dr. Kavya Manohar
Additional Examiner
Mr. Ashik Salahudeen