Back to projects
Passenger Behavior Analysis Pipeline
Passenger behavior analysis from large-scale telemetry data. CPU (Dask) vs GPU (RAPIDS cuDF/cuSpatial) comparison.
Data ScienceDaskRAPIDSGPUPython
Details
About the project
Built an end-to-end pipeline for analyzing passenger behavior from large-scale telemetry data. Developed both CPU-parallel and GPU-accelerated versions focusing on scalability and performance.
Pipeline includes data cleaning, spatial/temporal filtering, and user-based metric aggregation. Ran the same logic on different backends to compare performance.
Highlights
Key features
- Processed ~42 million rows of data
- CPU-parallel (Dask) and GPU-accelerated (RAPIDS) implementations
- Spatial filtering on GPU with cuSpatial
- Achieved significant speedup moving from CPU to GPU
Tech Stack
Tools used
PythonDaskNVIDIA RAPIDScuDFcuSpatialPandasGeoPandas