🚀 Projects
📊 Customer Churn Prediction | Link
- Objective: Built a classification model to predict telecom customer churn using a real-world dataset.
- Approach:
- Cleaned and preprocessed data: handled missing values, label-encoded categorical features, normalized numerical columns.
- Trained multiple models (Logistic Regression, Decision Tree, Random Forest).
- Selected the best-performing model using cross-validated F1-score.
- Metrics Used: Accuracy, Precision, Recall, F1-score, Confusion Matrix.
- Libraries/Tools: Python, Pandas, Scikit-learn
- Dataset: ~7,000 rows
- 🎯 Delivered actionable insights such as how contract type and monthly charges affect churn.
🌍 Cost of International Education | Link
- Objective: Predicted international education costs using regression modeling to uncover pricing dynamics.
- Approach:
- Performed EDA to identify key factors (e.g., country, degree type, institution type).
- Engineered features: imputation, encoding, scaling.
- Compared ANN model with Linear Regression and Random Forest.
- Metrics Used: MAE, RMSE
- Libraries/Tools: Python, Pandas, Scikit-learn, Seaborn, Plotly
- Dataset: ~1,000 rows
- 🧠 Achieved ~90% test accuracy using ANN.
💼 LinkedIn Data Jobs Analysis | Link
- Objective: Explored hiring trends, regional insights, and in-demand skills from LinkedIn job postings.
- Approach:
- Cleaned inconsistent job titles using RegEx.
- Performed grouped analysis to extract trends and demand.
- Summarized technical/soft skills from descriptions.
- Visuals: Bar plots, heatmaps, distributions.
- Libraries/Tools: Python, Pandas, Seaborn, Plotly, RegEx
- Dataset: ~300 rows
- 🔍 Offered a data-driven view of the data science job market.
📰 Real vs Fake News Classification | Link
- Objective: Developed a binary text classification model to detect fake news using semantic embeddings.
- Approach:
- Preprocessed text (tokenization, stop-word removal, lemmatization).
- Generated 300-dim Word2Vec embeddings from Google News corpus.
- Built document-level vectors by averaging word embeddings.
- Trained and tuned SVM and Logistic Regression models.
- Metrics Used: Accuracy, Precision, Recall, F1-score
- Libraries/Tools: Python, Gensim, Scikit-learn, NLP
- Dataset: ~17,000 rows
- 🤖 Outperformed traditional bag-of-words classifiers.