Boon Chin Look
  • Projects
    • Code Projects
    • Written Projects
  • Professional Development
  • Experiences & Growth
  • Future Goals
  • About
  • CV

Product Proposal

Python
Author

Boon Chin Look

Published

April 11, 2025

Product proposal report PDF Link to code & datasets used

This project encompassed both collaborative and individual components. As a team, we developed a product concept and identified a dataset relevant to our idea. The task involved pitching our application, designed specifically for students, focusing on finance, innovation, and enterprise. Our pitch was presented as a video, which I’ve embedded on the first page of the PDF above containing my individual report.

Individually, we were responsible for diving deeper into the technical side of our pitch. This included conducting data analysis, identifying the market gap, understanding target customers, and outlining a strategic approach to product sales. While teamwork was encouraged, each member was expected to make a unique and meaningful contribution in their final report.

Below are key snippets of code I developed during the data analysis phase, which played a critical role in shaping our product’s direction. I led this part of the project, as it closely aligned with my strengths. Using a preliminary dataset, I informed our pricing strategy, while an Uber Analysis Dataset enabled me to identify market gaps and opportunities. Below, I’ve included snippets from the latter dataset to demonstrate its role in our decision-making process.

Importing Data and Intial Analysis

Importing librarys
import matplotlib.pyplot as plt
import pandas as pd
import plotly.express as px
import seaborn as sns
Importing dataset
df = pd.read_csv("UberDataset.csv")
df.head()
START_DATE END_DATE CATEGORY START STOP MILES PURPOSE
0 01/01/2016 21:11 01/01/2016 21:17 Business Fort Pierce Fort Pierce 5.1 Meal/Entertain
1 01/02/2016 01:25 01/02/2016 01:37 Business Fort Pierce Fort Pierce 5.0 NaN
2 01/02/2016 20:25 01/02/2016 20:38 Business Fort Pierce Fort Pierce 4.8 Errand/Supplies
3 01/05/2016 17:31 01/05/2016 17:45 Business Fort Pierce Fort Pierce 4.7 Meeting
4 01/06/2016 14:42 01/06/2016 15:49 Business Fort Pierce West Palm Beach 63.7 Customer Visit
Looking at data types of columns
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1155 entries, 0 to 1154
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   START_DATE  1155 non-null   object 
 1   END_DATE    1155 non-null   object 
 2   CATEGORY    1155 non-null   object 
 3   START       1155 non-null   object 
 4   STOP        1155 non-null   object 
 5   MILES       1155 non-null   float64
 6   PURPOSE     653 non-null    object 
dtypes: float64(1), object(6)
memory usage: 63.3+ KB
Standardize column names to lowercase for consistency
df.columns = df.columns.str.lower()
df.head()
start_date end_date category start stop miles purpose
0 01/01/2016 21:11 01/01/2016 21:17 Business Fort Pierce Fort Pierce 5.1 Meal/Entertain
1 01/02/2016 01:25 01/02/2016 01:37 Business Fort Pierce Fort Pierce 5.0 NaN
2 01/02/2016 20:25 01/02/2016 20:38 Business Fort Pierce Fort Pierce 4.8 Errand/Supplies
3 01/05/2016 17:31 01/05/2016 17:45 Business Fort Pierce Fort Pierce 4.7 Meeting
4 01/06/2016 14:42 01/06/2016 15:49 Business Fort Pierce West Palm Beach 63.7 Customer Visit
Checking for missing values in the dataset
df.isnull().sum()
#no missing values found
start_date      0
end_date        0
category        0
start           0
stop            0
miles           0
purpose       502
dtype: int64
Simple descriptive statistics of each numerical column
df.describe()
miles
count 1155.000000
mean 10.566840
std 21.579106
min 0.500000
25% 2.900000
50% 6.000000
75% 10.400000
max 310.300000

Data Cleaning

Checked & removed duplicates
duplicated_data = df[df.duplicated()]
print(duplicated_data.shape)

df = df.drop_duplicates(keep='first')
(1, 7)
Encode categorical variables using one-hot encoding
data_encoded = pd.get_dummies(df, columns=['category', 'purpose'], drop_first=True)
Converted date columns into date time
df["start_date"] = pd.to_datetime(df["start_date"], errors="coerce")
df["end_date"] = pd.to_datetime(df["end_date"], errors="coerce")
Handling missing values
# Handle missing values by dropping rows with missing 'start' or 'stop' and filling 'purpose' with 'unknown'
df.dropna(subset=['start', 'stop'], inplace=True)
df['purpose'] = df['purpose'].fillna("Unknown") 
Feature engineering, created new column for trip date
df["date"] = df["start_date"].dt.date
Ensured the dataset is sorted by date
df = df.sort_values('start_date')
df.set_index('start_date', inplace=True)

Analysis

Returns first few rows of dataset to check
df.head()
end_date category start stop miles purpose date
start_date
2016-01-01 21:11:00 2016-01-01 21:17:00 Business Fort Pierce Fort Pierce 5.1 Meal/Entertain 2016-01-01
2016-01-02 01:25:00 2016-01-02 01:37:00 Business Fort Pierce Fort Pierce 5.0 Unknown 2016-01-02
2016-01-02 20:25:00 2016-01-02 20:38:00 Business Fort Pierce Fort Pierce 4.8 Errand/Supplies 2016-01-02
2016-01-05 17:31:00 2016-01-05 17:45:00 Business Fort Pierce Fort Pierce 4.7 Meeting 2016-01-05
2016-01-06 14:42:00 2016-01-06 15:49:00 Business Fort Pierce West Palm Beach 63.7 Customer Visit 2016-01-06
Plot trip frequency over time plot code
plt.figure(figsize=(11, 7))
sns.lineplot(x=df["date"].value_counts().sort_index().index,
             y=df["date"].value_counts().sort_index().values, marker="o")
plt.xlabel("date")
plt.ylabel("Number of Trips")
plt.title("Uber Trips Over Time")
plt.xticks(rotation=45)
plt.show()

Top 10 most common pickup and drop-off locations plot code
top_pickups = df["start"].value_counts().head(10)
top_dropoffs = df["stop"].value_counts().head(10)

fig, axes = plt.subplots(1, 2, figsize=(9, 3))

sns.barplot(x=top_pickups.values, y=top_pickups.index, ax=axes[0], palette="Blues_r", hue=top_pickups.index, dodge=False)  
axes[0].set_title("Top 10 Pickup Locations")
axes[0].set_xlabel("Number of Trips")
axes[0].legend([],[], frameon=False)  
sns.barplot(x=top_dropoffs.values, y=top_dropoffs.index, ax=axes[1], palette="Greens_r", hue=top_dropoffs.index, dodge=False)
axes[1].set_title("Top 10 Drop-off Locations")
axes[1].set_xlabel("Number of Trips")
axes[1].legend([],[], frameon=False) 

plt.tight_layout()
plt.show()

Code for plot of count of trips by purpose
trip_purpose_counts = df["purpose"].value_counts()
plt.figure(figsize=(9, 5))
sns.barplot(x=trip_purpose_counts.index, y=trip_purpose_counts.values, hue=trip_purpose_counts.index, palette="Oranges", legend=False) 
plt.xlabel("Purpose of Trip")
plt.ylabel("Number of Trips")
plt.title("Trip Purpose Distribution")
plt.xticks(rotation=45)
plt.show()

#Main purpose isn't known but second is a meeting which could be relative to commuting with commuting having little, below we see it has the highest average distance by purpose, which backs up our product in relation a student carpooling application which would aid in their commute.

Code for plot of average trip distance by purpose
avg_miles_purpose = df.groupby("purpose")["miles"].mean().sort_values()
plt.figure(figsize=(9, 5))
sns.barplot(x=avg_miles_purpose.index, y=avg_miles_purpose.values, hue=avg_miles_purpose.index, dodge=False, palette="Blues", legend=False)  
plt.xlabel("Purpose of Trip")
plt.ylabel("Average Miles Traveled")
plt.title("Average Trip Distance by Purpose")
plt.xticks(rotation=45)
plt.show()

Plot for hourly trip analysis of the day
df["hour"] = df.index.hour
plt.figure(figsize=(9, 5))
sns.countplot(x=df["hour"], hue=df["hour"], dodge=False, palette="coolwarm", legend=False) 
plt.xlabel("Hour of the Day")
plt.ylabel("Number of Trips")
plt.title("Trips by Hour of the Day")
plt.show()
#hours align similarly with college hours.

Code for plot of Trip category to number of trips distribution
plt.figure(figsize=(9, 5))
sns.countplot(x=df["category"], hue=df["category"], palette="viridis", legend=False)
plt.xlabel("Trip Category")
plt.ylabel("Number of Trips")
plt.title("Business vs Personal Trips")
plt.show()

Back to top

© Boon Chin Look, 2025

 

Built with Quarto