Back to Arena
Medium
150 PTS
3 Days
142 Solvers
Data Pipeline Vectorization
Objective
Rewrite the given pure Python loop for processing the 10GB astronomy dataset using numpy vectorization. Our current data ingestion pipeline uses vanilla Python nested for-loops to filter and normalize star coordinate data. It currently takes 45 minutes to run. Vectorize the mathematical operations using NumPy to bring the execution time under 30 seconds.
Requirements
- NumPy
- Pandas
- Execution time < 30s