Back to projects

Passenger Behavior Analysis Pipeline

Passenger behavior analysis from large-scale telemetry data. CPU (Dask) vs GPU (RAPIDS cuDF/cuSpatial) comparison.

Data ScienceDaskRAPIDSGPUPython

Details

About the project

Built an end-to-end pipeline for analyzing passenger behavior from large-scale telemetry data. Developed both CPU-parallel and GPU-accelerated versions focusing on scalability and performance.

Pipeline includes data cleaning, spatial/temporal filtering, and user-based metric aggregation. Ran the same logic on different backends to compare performance.

Highlights

Key features

  • Processed ~42 million rows of data
  • CPU-parallel (Dask) and GPU-accelerated (RAPIDS) implementations
  • Spatial filtering on GPU with cuSpatial
  • Achieved significant speedup moving from CPU to GPU

Tech Stack

Tools used

PythonDaskNVIDIA RAPIDScuDFcuSpatialPandasGeoPandas