Sankalp Biswal

Sankalp Biswal

|
10+ Projects
2.5 Yrs Experience
~$200k Impact

About Me

I'm a data scientist with expertise in Machine Learning, Statistical Analysis, Predictive Modeling, and Data Visualization. With over 2.5 years of experience developing predictive algorithms, conducting statistical testing, building interactive dashboards, and creating intelligent systems using RAG and LLMs across cloud platforms, I excel at transforming complex datasets into actionable business insights. My background from consulting and data science positions me with the right balance of client-facing and technical skills, enabling me to bridge the gap between advanced analytics and strategic business value using Python, SQL, and machine learning frameworks.

Programming

Python SQL R JavaScript

Machine Learning

scikit-learn PyTorch TensorFlow Keras LangChain MLflow

Cloud & Tools

AWS Azure GCP Docker Snowflake Databricks

Analytics

Tableau Power BI Pandas NumPy A/B Testing Statistics

Experience & Education

Work Exp
Data Scientist Intern
Allagash Brewing Company, Boston, MA
Jan 2025 - Mar 2025
Developed real-time truck recommendation Streamlit app reducing shipping costs by 10% ($150K annually).
Work Exp
Data Scientist Intern - Conversational AI
DataWorksAI, Boston, MA
Apr 2024 - Jun 2024
Built and deployed RAG-based LLM chatbot for university students.
Education
Master's in Data Analytics (Artificial Machine Intelligence)
Northeastern University, Boston, MA
Sep 2023 - Mar 2025
GPA: 4.00/4.00. Specialized in AI System Technologies, LLM+RAG, Predictive Analytics, and Healthcare Data Applications.
Work Exp
Data Scientist
Falcon Cables, New Delhi, India
Feb 2022 - Aug 2023
Designed ETL workflows and Power BI dashboards improving efficiency by 20%.
Work Exp
Analyst - Technology Consulting
Ernst & Young LLP, Gurugram, India
Jun 2021 - Jan 2022
Researched NBFCs, shaped $11M transformation.
Education
Bachelor of Technology in Computer Science
Jaypee Institute of Information Technology, Delhi, India
Jul 2017 - Aug 2021
CGPA: 7.3/10. Co-authored research paper on diabetic retinopathy detection.

Featured Projects

Healthcare Chatbot Cover
Text-to-SQL Agentic Healthcare Chatbot
Built a RAG-based LLM assistant using LangChain, Pinecone, and Snowflake to generate insights and visualizations from structured healthcare data via SQL generation with Gemini. Performed ETL on 2M+ rows of HIPAA-compliant NY SPARCS data.
LangChain Pinecone Snowflake Gemini ReAct Agent
Customer Churn Cover
Customer Churn Prediction with MLOps
Built a comprehensive churn prediction pipeline on 500K+ customer records using PySpark and AWS services with full MLOps implementation. Implemented CI/CD pipeline with GitHub Actions and performance monitoring in Grafana and MLflow.
PySpark AWS Glue SageMaker MLflow Grafana
A/B Testing Cover
Ad Campaign A/B Testing & Optimization
Applied causal inference via A/B testing on 580K+ users to identify statistically significant conversion drivers and optimize ad campaigns. Measured 43% relative uplift in conversion for test group.
A/B Testing Chi-Squared Mann-Whitney U Causal Inference Python
Chicago Traffic Crash Analysis Cover
Chicago Traffic Crash Analysis
Analyzed 1.85M+ traffic crash records using Databricks, Delta Lake, and Azure Blob Storage. Merged datasets to uncover crash trends by weather, lighting, and causes, and built an interactive Databricks dashboard for safety insights.
Databricks Azure Blob Storage Delta Lake PySpark SQL
Student Math Score Prediction Cover
Student Math Score Prediction – End-to-End ML Deployment
Built an end-to-end ML pipeline (R² 0.88) with automated model selection across multiple regressors. Deployed a real-time app on AWS using Flask, Docker, and GitHub Actions CI/CD.
Python Scikit-learn XGBoost CatBoost Flask Docker AWS EC2 GitHub Actions
Allagash Brewing Logistics Optimization
Logistics Optimization Suite – Allagash Brewing Company
Built a Logistics Optimization Suite with a Smart Truck Recommender WebApp (Streamlit + weather & routing APIs) and Power BI dashboards to cut freight costs, improve route efficiency, and optimize pallet utilization.
Streamlit Google Maps API OpenWeather API Python (pandas, requests) Power BI Logistics Optimization
2008 Crisis & Heart Disease Mortality
Impact of the 2008 Financial Crisis on Heart-Disease Mortality (DiD)
Applied Difference-in-Differences to evaluate the effect of the 2008 financial crisis on heart-disease mortality, combining CDC, BLS, World Bank, and Yahoo Finance data to measure economic and health impacts.
Python Pandas Statsmodels Difference-in-Differences CDC BLS World Bank Yahoo Finance
Airbnb Booking and Pricing Dashboard
Airbnb Booking & Pricing Dashboard
Developed an interactive Tableau dashboard to analyze Airbnb bookings and pricing trends across New York City. Visualized total bookings, average prices, and room-type distribution with dynamic filters and geographical heatmaps to support data-driven pricing and strategy decisions for property managers.
Tableau Data Visualization Interactive Dashboard Airbnb Data

Services Offered

Data Science
End-to-end data science solutions from data collection and preprocessing to advanced analytics, predictive modeling, and actionable insights that drive strategic business decisions.
Data Engineering
Designing and automating scalable data pipelines and infrastructure using cloud-native services and orchestration tools for both batch and streaming workflows.
Machine Learning
Building, validating, and deploying ML models to automate predictions and uncover patterns in large-scale structured datasets with domain-aligned business value.
Data Visualization
Developing insightful dashboards and visual stories that make complex data accessible and actionable for decision-makers across business domains.
Chatbot Development
Creating intelligent conversational AI solutions including RAG-based chatbots, voice AI agents, and automated customer service systems using cutting-edge LLM technologies.
N8N Workflow Development
Designing and implementing automated workflows and business process automation using N8N and other no-code/low-code platforms for operational efficiency.

Certifications

Power BI LinkedIn Badge
Power BI for Data Analysts
Issuer: LinkedIn
Issued: March 2025
Skills: Microsoft Power BI, DAX, Data Visualization
Show Credential
Azure Logo
Azure Administration Essential Training
Issuer: LinkedIn
Issued: June 2024
Skills: Microsoft Azure, Cloud Administration
Show Credential
Northeastern University Badge
Introductory AI Literacy
Issuer: Northeastern University
Awarded: April 2024
Skills: Artificial Intelligence Fundamentals, AI Ethics
Show Credential
Northeastern University Badge
Data Visualization and Storytelling Basics
Issuer: Northeastern University
Awarded: April 2024
Skills: Data Visualization, Storytelling with Data
Show Credential
Northeastern University Badge
Introduction to Analytics
Issuer: Northeastern University
Awarded: November 2023
Skills: Analytics Fundamentals, Data Analysis
Show Credential

Get In Touch

Email

sankalpbiswal99@gmail.com

Phone

+1 (617) 256 1543

Location

Boston, Massachusetts

Status

Available for freelance or full-time opportunities