Skip to content

📄 Download My Resume

Muhammad Ihtisham Ul Haq

Data Scientist | NLP Engineer

📍 Lahore, Pakistan
📞 +92-303-9229203
📧 ahtiisham.maliik@gmail.com

LinkedIn GitHub Kaggle


👨‍💼 Profile Summary

Data Scientist with a strong focus on Natural Language Processing (NLP), machine learning, and predictive modeling. Skilled in data preprocessing, feature engineering, and data visualization using Python, SQL, and leading ML libraries. Transitioned from a software engineering background, offering robust programming and analytical problem-solving skills. Experienced in developing end-to-end data science workflows, extracting insights from both structured and unstructured data. Passionate about using data to uncover patterns, solve real-world problems, and support evidence-based decision-making in collaborative teams.


💼 Professional Experience

Data Science Internship | SkilledScore

🗓️ March 2025 – May 2025
- Completed a data science internship focused on solving real-world problems, working on five end-to-end projects involving churn prediction, sentiment analysis, fraud detection, time series forecasting, and data pipeline development.
- Applied machine learning, NLP, and time series techniques using Python and SQL; delivered insights through dashboards and automation, enhancing both technical proficiency and business understanding.

Software Engineer | i2c Inc.

🗓️ May 2023 – January 2025
- Performed data analysis and automated reporting using SQL and Python to monitor critical applications and support decision-making.
- Leveraged tools like Dynatrace, Xymon, Wazuh, and Nagios to analyze application metrics, identify anomalies, and ensure SLA compliance.


🎓 Education

BS Information Technology
PUCIT – Punjab University College of Information Technology
📅 October 2018 – July 2022


🛠️ Skills

  • Programming Languages: Python, SQL
  • Data Tools: Excel, Power BI
  • Libraries/Frameworks: Scikit-learn, PyTorch, TensorFlow, NumPy, Pandas, Matplotlib, Seaborn, Plotly, SpaCy, Gensim, fastText
  • Concepts & Techniques: Statistical Machine Learning, Deep Learning, Natural Language Processing (NLP), Time Series Analysis, Data Visualization, Predictive Modeling, Feature Engineering

📂 Projects

📊 Customer Churn Prediction

  • Objective: Built a classification model to predict telecom customer churn using a real-world dataset.
  • Approach:
    • Cleaned and preprocessed data: handled missing values, label-encoded categorical features, normalized numerical columns.
    • Trained multiple models (Logistic Regression, Decision Tree, Random Forest).
    • Selected the best-performing model using cross-validated F1-score.
  • Metrics Used: Accuracy, Precision, Recall, F1-score, Confusion Matrix.
  • Libraries/Tools: Python, Pandas, Scikit-learn
  • Dataset: ~7,000 rows
  • 🎯 Delivered actionable insights such as how contract type and monthly charges affect churn.

🌍 Cost of International Education

  • Objective: Predicted international education costs using regression modeling to uncover pricing dynamics.
  • Approach:
    • Performed EDA to identify key factors (e.g., country, degree type, institution type).
    • Engineered features: imputation, encoding, scaling.
    • Compared ANN model with Linear Regression and Random Forest.
  • Metrics Used: MAE, RMSE
  • Libraries/Tools: Python, Pandas, Scikit-learn, Seaborn, Plotly
  • Dataset: ~1,000 rows
  • 🧠 Achieved ~90% test accuracy using ANN.

💼 LinkedIn Data Jobs Analysis

  • Objective: Explored hiring trends, regional insights, and in-demand skills from LinkedIn job postings.
  • Approach:
    • Cleaned inconsistent job titles using RegEx.
    • Performed grouped analysis to extract trends and demand.
    • Summarized technical/soft skills from descriptions.
  • Visuals: Bar plots, heatmaps, distributions.
  • Libraries/Tools: Python, Pandas, Seaborn, Plotly, RegEx
  • Dataset: ~300 rows
  • 🔍 Offered a data-driven view of the data science job market.

📰 Real vs Fake News Classification

  • Objective: Developed a binary text classification model to detect fake news using semantic embeddings.
  • Approach:
    • Preprocessed text (tokenization, stop-word removal, lemmatization).
    • Generated 300-dim Word2Vec embeddings from Google News corpus.
    • Built document-level vectors by averaging word embeddings.
    • Trained and tuned SVM and Logistic Regression models.
  • Metrics Used: Accuracy, Precision, Recall, F1-score
  • Libraries/Tools: Python, Gensim, Scikit-learn, NLP
  • Dataset: ~17,000 rows
  • 🤖 Outperformed traditional bag-of-words classifiers.

📜 Certifications

  • Deep Learning: Beginner to Advanced – CodeBasics (May 2025)
  • Smart Tips: Soft Skills for Technical Professionals – Udemy (February 2025)
  • The Data Science Course: Complete Bootcamp 2024 – Udemy (November 2024)