Profile

Thathsara Rajapaksha

Data Scientist & AI/ML Engineer

About Me

Results-driven Data Science Graduate with First Class Honours and practical industry experience as an AI Engineer. Proven expertise in developing and deploying machine learning models, building data pipelines, and creating interactive visualizations using Python, TensorFlow, PyTorch, and SQL. Proficient in transforming complex datasets into actionable insights through statistical analysis and visualization tools like PowerBI and Tableau. Adept at leveraging analytical thinking and innovative problem-solving approaches to drive data-driven decision-making in dynamic professional environments.

Education

BSc (Honours) in Data Science - UGC Approved

National School of Business Management (NSBM) - Homagama, Sri Lanka | Dec 2025

GPA: 3.76/4.0 (First Class Honours)

Coursework: Machine Learning, Deep Learning, Big Data Analytics, Statistical Methods, Data Visualization, Cloud Computing, Natural Language Processing

GCE Advanced Level (Physical Science Stream - Mathematics)

Mahinda Rajapaksha College - Homagama, Sri Lanka | 2020

3 passes in GCE Advanced Level

Com. Maths: C, Physics: S, Chemistry: S, English: B

GCE Ordinary Level Examination | 2016

5As and 4Bs

Professional Certifications

Achievements

1st Place - DataXplore 2025

Won 1st place in a national-level Data Science competition organized by the Statistics Society of the University of Sri Jayewardenepura. The multi-stage challenge covered domains such as Exploratory Data Analysis, Machine Learning, Model Explainability, Time Series Analysis, and Statistical Reasoning.

Research Publication - ICACT International Conference (2025)

Shared our research on urban computing for sustainable university development at NSBM's 2nd International Conference on Advanced Computing.

Being part of exploring the future of AI and intelligent innovations was truly inspiring.

Experience

AI Engineer Intern

Codebell PVT LTD, Dikwella Rd, Beliatta, Sri Lanka | Feb 2025 - Aug 2025

  • Developed end-to-end ML pipelines spanning computer vision (CNN transfer learning), NLP (BERT/DistilBERT for text classification, Sentiment Analysis), and traditional ML (XGBoost, hybrid recommenders, contextual bandits) across all production projects achieving nearly 80-90% model accuracy.
  • Optimized models for production deployment using quantization (40% size reduction), distillation, caching strategies, and implemented fairness-aware ML with bias mitigation.
  • Delivered production-ready models with comprehensive documentation through cross-functional collaboration, demonstrating capabilities in model development, optimization, ethical AI practices, and stakeholder communication.

Customer Relations Officer

LB Finance, Sri Lanka | November 2021 - April 2022

  • Managed financial reporting processes and maintained accurate customer account documentation.
  • Provided customer service and resolved client inquiries, developing strong communication and stakeholder management skills.

Projects

Credit Card Fraud Detection - MLOps Pipeline

End-to-end fraud detection system featuring automated experiment tracking, model registry management, and production-grade drift monitoring.

Python MLflow Docker Flask
View Details

IoT Energy Consumption Anomaly Detection

Developed a Lambda Architecture-based system for analyzing IoT energy consumption data with batch and real-time processing for anomaly detection.

Apache Hadoop Apache Spark Kafka Docker
View Details

Autonomous AI Agent Framework

Architected a cyclic AI agent using LangGraph and Gemini 2.5 to perform autonomous web research. Features a Dockerized FastAPI microservice with a Streamlit UI for real-time, grounded task execution.

Python LangGraph Docker FastAPI Gemini 2.5
View Details

Booking Cancellation Analytics

Analyzed 27,500 booking records to identify patterns in cancellations and no-shows, optimizing revenue and resource planning.

Python Scikit-learn Data Analysis Customer Segmentation
View Details

AI Portfolio Intelligence

A Neuro-Symbolic financial agent that orchestrates deterministic tools (Prophet, Risk Metrics) with LLM reasoning. Features autonomous multi-asset analysis, ensemble forecasting, and hybrid execution modes.

Python LangGraph Streamlit Docker Gemini
View Details

Agent Data Analyst

An autonomous AI agent that writes, debugs, and executes Python code to analyze raw data. Features a cyclic "Reason-Act-Observe" architecture using LangGraph and Gemini 2.5.

Python LangGraph FastAPI Docker Gemini
View Details

AutoML Classification System

A complete AutoML system training 6 models (XGBoost, LightGBM, CatBoost) with Optuna optimization, achieving 86.89% accuracy. Split architecture: Colab training + Streamlit deployment.

Python Colab XGBoost Streamlit
View Details

Strategic Customer Intelligence

A hybrid AI system for precision marketing. Combines Unsupervised Learning (K-Means) for segmentation with Supervised Learning (XGBoost) to predict campaign buy-in with 90% AUC.

Python Scikit-Learn XGBoost Looker
View Details

Optimized Sentiment Service

High-performance financial sentiment analysis using a Quantized (Int8) DistilBERT model. Reduced model size by 75% and boosted CPU inference speed by 3.5x using ONNX Runtime.

Python Hugging Face ONNX Gradio
View Details

Recommeddation system (MLOps)

An end-to-end recommendation engine featuring a decoupled microservices architecture. Includes XGBoost inference, SHAP explainability, and automated CI/CD pipelines.

Python FastAPI CI/CD
View Details

Transformer Fine-Tuning Hub

A progressive collection of NLP projects mastering LLM adaptation, from full BERT fine-tuning to parameter-efficient LoRA (95% reduction).

Python PyTorch Hugging Face
View Details

NLP Question Answering

A comparative study of traditional ML and Transformer architectures (DistilBERT) on the SQuAD dataset, achieving 98% accuracy in question classification.

Python Hugging Face Scikit-learn
View Details

SentiViz: Text Sentiment Analysis Tool

Developed an interactive sentiment analysis application that combines RoBERTa and VADER technologies with real-time visualizations.

Python Hugging Face Flask
View Details

Colombo Apartments Pricing Analysis

Conducted comprehensive exploratory data analysis on Colombo's rental apartment market, analyzing 250+ property listings.

Python Pandas Matplotlib Scikit-learn
View Details

Binary Image Classification

Utilizing transfer learning with MobileNetV2/VGG16 architectures to identify brain tumors in MRI scans.

Python TensorFlow Transfer Learning Computer Vision
View Details

Revenue Forecasting Tool

Developed and deployed a Streamlit-based web application that uses LSTM models to predict stock prices for multiple brands.

Python Streamlit LSTM
View Details

Smart ATS Resume Optimizer

User-friendly web application using Streamlit and Google's Generative AI to enhance resume quality across different experience levels.

Python Streamlit Generative AI NLP
View Details

Amazon Sentiment Analysis

Analyzing user reviews from Amazon to determine sentiment towards products using pre-trained BERT model.

Python BERT NLP Hugging Face
View Details

Customer Churn Prediction

Predicts customer churn using XGBoost, Random Forest, and Logistic Regression with hyperparameter tuning.

XGBoost Random Forest Hyperparameter Tuning
View Details

Business Decision-Making Dashboard

Translated business requirements into interactive dashboards for real-time tracking of key performance indicators (KPIs).

Power BI Tableau Data Visualization
View Details

Customer Segmentation

A model that uses KMeans algorithm for customer segmentation by implementing the Elbow method.

Python Scikit-learn KMeans Clustering
View Details

Technical Expertise

Programming Languages

Python 95%
SQL 90%
R 85%
Java 60%

Machine Learning & AI

TensorFlow PyTorch Keras MLflow Scikit-learn PySpark Hugging Face LangGraph LangChain GenAI

Data Visualization

Power BI Tableau Looker Matplotlib Seaborn Plotly

Cloud & Databases

Hadoop AWS Azure Google Cloud Snowflake MS SQL Server Docker

Web Technologies

Flask FastAPI HTML CSS JavaScript Gradio Git

Soft Skills

Problem Solving

Skilled at analyzing complex problems and devising effective solutions

Technical Communication

Able to convey complex technical concepts to diverse audiences

Collaboration

Experienced in working effectively within cross-functional teams

Adaptability

Quick to learn and adapt to new technologies and environments

Continuous Learning

Committed to staying updated with industry trends and advancements