Projects
Over time, I have got hands dirty in projects from multiple realms. Here is a snapshot.
SRT : Shiprocket || WHJ : Whitehat Jr || EXL: EXL Services || NVDA: Nvidia
Professional
Generative AI
Algo/Tools/Libraries Used: langchain
,GPTs
, OpenAI
, Vector DataStores
,whisper
,gpt-4
,LLaMA
[NVDA] Advanced RAG System for Comprehensive Code Analysis and Generation
[NVDA] Advanced Multimodal Assistant with Dynamic Information Retrieval
[SRT] Engineered “SR-Copilot”, a RAG-based application, to streamline e-commerce and SR product interactions by efficiently addressing user queries and suggesting pertinent products and support ticket updates. Link
[SRT] Prototyped an automated pipeline for processing customer support calls, rating interaction quality, transcribing content, identifying key pointers, and assessing buyer sentiment. (ASR→NLP→TTS)
Impact: Reduced the number of agent call service requests by 7.6%.[SRT] Created a LLM-based chat app for non-technical stakeholders, to answer & execute database queries over Snowflake and Redshift in natural language.
Tabular Machine Learning
Algo/Tools/Libraries Used: pandas
,polars
,xgboost
,catboost
,scikit-learn
,rapids
,dask
,AWS Sagemaker
,S3
,ELK
- [SRT] An ML pipeline that predicts a customers propensity to reject an ecommerce order at delivery. Link
Impact: Saved over 35MM in shipping costs for over 250+ D2C sellers. - [SRT] An ML solution combining NLP and tabular data analysis to identify fraudulent seller behaviors.
Impact: Achieved 18% reduction in fraud cases of multiple categories (KYC, Weight Fraud etc). - [WHJ] A classifier model to identify promising retention leads for sales pitch prioritization.
Impact: Enhanced retention performance by 8%. - [EXL] A suite of 12 ML models for an insurance client, that aims to mark a potential customers:
- propensity to respond and convert to a marketing campaign.
- estimated chargeable premium and loss ratio, if converted.
Implemented an automated 1-window customer selection and offer allocation framework on top of models.
Impact: The process boosted performance by +27%.
Natural Language Processing
Algo/Tools/Libraries Used: PyTorch
, BERT
, Word2Vec
, NER
, nltk
, flashtext
, gensim
, rapidfuzz
,transformers
,sentiment analysis
,semantic search
, clustering
, text similarity
- [SRT] An unsupervised learning framework to enchance, standardise and validate over 6MM delivery addresses.
Impact: A SoTA address intelligence agent that can successfully parse over 100K localities from 500+ Indian districts. - [SRT] Developed an address deduplication and syntax correction pipeline, whole suite led to +20% increased deliveries.
Impact: Successful identification of customers belonging to same household, leading to enhanced customer segmentations.
- [SRT] Engineered an intelligent system for instant product categorization, categorising over 1.3 million uniquely named products.
Impact: Categorisation helped in better targeting for our other products.
Clustering and Segmentation
Algo/Tools/Libraries Used: DBSCAN
,k-means
,sklearn
,ANOVA
,noSQL
,mongo
- [SRT] Created a custom seller segmentation for 100K+ D2C sellers registered on the platform.
- [SRT] Designed a novel clustering method to segregate over 100 million unique buyers for tagging and behavioral analysis.
- [WHJ] An SQL-based segmentation for improved student-teacher mappings, for better learning outcomes.
Forecasting and Geo-Spatial
Algo/Tools/Libraries Used: fb-prophet
,time-series analysis
, ARIMA
, kepler
, geopandas
,scipy
,statmodels
,spatial-clustering
- [SRT] Probed hyper-local e-commerce geo-coordinate data to identify ideal locations for dark stores.
- [SRT] Undertook demand forecasting for fast moving goods for enhanced control over inventory and order management.
Miscellaneous
Algo/Tools/Libraries Used: SQL
,Redshift
,Tableau
,Snowflake
,Excel
,Google Workspace APIs
- [ALL] Data Pipelines to establish data for training and to update the features for inference.
- [ALL] Custom dashboards and reports in GSheets and Tableau to track performance of models and raise flags for changes.
- [EXL] Uncoverted unoptimised SAS-based code to R and Python modules, for faster processing and cheaper executions by saving license costs.
- [EXL] Optimised SQL data processing pipelines from Oracle to Redshift.
Personal
Generative AI
- A audio to text ML app that converts expenses to JSON and builds a custom report. Link