KSHITIJ

my projects

  • corrective rag with adaptive retrieval

      Built an advanced Corrective RAG system implementing adaptive retrieval with document grading, hallucination detection, and web search fallback. Features LangGraph workflow orchestration, FireCrawl web scraping, Tavily search integration, and GPT4All embeddings. Includes comprehensive grading pipeline to assess document relevance and answer quality with automatic correction mechanisms.

    tech used: Python, LangGraph, LangChain, Transformers, PyTorch, Tavily API, FireCrawl, GPT4All, Ollama, Llama 3.2, ChromaDB, Hallucination Detection, Document Grading, Web Search Integration, LangSmith

  • repopulation rag with evaluation pipeline

      Developed a comprehensive RAG system with dynamic database updates and extensive evaluation metrics. Implemented retrieval evaluation (MRR, nDCG@k, Precision@k, Recall@k) and generation metrics (F1, EM, ROUGE-L, BLEU, Faithfulness). Used BGE embeddings and Llama 3.2 3B for generation with automated testing and performance benchmarking.

    tech used: Python, LangChain, ChromaDB, Hugging Face, Ollama, PyMuPDF, Pytest, SentenceTransformer, Transformers, PyTorch, YAML, BGE Embeddings, Llama 3.2

  • vanilla rag (retrieval augmented generation)

      Built a Retrieval-Augmented Generation (RAG) system that processes markdown documents, creates vector embeddings using SentenceTransformer, and stores them in ChromaDB. Features document chunking, similarity search, and automated workflow with comprehensive logging and error handling.

    tech used: Python, LangChain, Unstructured-Markdown, ChromaDB, SentenceTransformer, Transformers, PyTorch, Pandas, Matplotlib, Seaborn, YAML, Streamlit

  • hindi speech emotion recognition

      Built a comprehensive emotion recognition system for Hindi speech using MFCC feature extraction and multiple ML/DL models. Implemented data augmentation techniques (pitch shifting, time-stretching, noise addition) to address dataset limitations. Developed comparative analysis across CNN, MLP, Random Forest, and KNN models, achieving 64% accuracy with Random Forest. Created Streamlit web application for real-time emotion prediction with MFCC visualization.

    tech used: Python, Speech Processing, MFCC, CNN, Random Forest, KNN, MLP, Data Augmentation, Streamlit, Audio Signal Processing, Feature Engineering, Model Comparison, Hindi NLP

  • cattle CVD detection via retina image classification

      Developed a deep learning system to detect cardiovascular disease in cattle through retina image analysis. Implemented advanced preprocessing pipeline with green channel extraction, CLAHE, and vessel enhancement. Built ResNet101-based classifier with Grad-CAM interpretability, DVC pipeline management, and Flask web interface. Achieved model deployment with Docker containerization despite addressing complex class imbalance challenges.

    tech used: Python, TensorFlow, Keras, CNN, ResNet50v2, VGG16, ResNet101, CLAHE, Grad-CAM, DVC, Flask, Docker, TensorBoard, Computer Vision

  • coccidiosis detection in chickens

      Built a CNN model using VGG16 (transfer learning) to detect coccidiosis from chicken fecal images. Pre-processed data with augmentation and normalization, managed configs via YAML & logging, and deployed with Flask, Docker, and CI/CD using GitHub Actions.

    tech used: Python, TensorFlow, Keras, Scikit-learn, ImageDataGenerator, DVC, Flask, Docker, GitHub Actions

  • updated portfolio (2025)

      Used T3-Stack, to use the versatile serverless capabilities of Next.js along with freedom of mono-repo. Now, implementing an Admin Portal for updating data directly in the database.

    tech used: Next.js, TypeScript, Tailwind CSS, Framer Motion, Monorepo-TurboRepo, Vercel

  • my old portfolio (2024)

      not importing all of that here again...., old projects are mentioned there.

    tech used: React.js, Chakra UI, Markdown It, Framer Motion