Aarushi Singh

cs · ml research · engineering

about

i'm a computer science undergraduate specializing in artificial intelligence, with a strong interest in backend engineering, distributed systems, and end-to-end machine learning pipelines. i enjoy exploring ai engineering workflows, experimenting with langchain, agents, and model integration, and designing scalable system architectures.

i have worked with c++, python, scala and tools like docker, kafka, and various cloud platforms with industrial exposure during my internship at microsoft. my current intellectual pursuits are centered around machine learning research!

some of my recent research interests include:

  • multimodal llms and vision-language integration.
  • contextual engineering and retrieval-augmented generation.
  • agentic infrastructure and multi-agent systems.

open source

experience

software engineer intern jun – aug 2025
microsoft

worked on the azure data spark dev native execution engine (nee) using c++, scala, velox, gluten, docker, azure devops. integrated fuzz-testing pipelines, improved operator reliability, and enhanced ci/cd diagnostics for large-scale distributed sql execution.

undergraduate ml researcher aug 2024 – present
bennett university

researched recommender systems and computer vision using pytorch, tensorflow. improved ndcg/mrr on mf models and benchmarked cnn/transformer architectures for emotion recognition on fer2013.

projects

fine-tuning & evaluation of pegasus transformers

fine-tuned pegasus on the aeslc dataset for abstractive summarization in low-resource email domains, benchmarking domain transferability.

supply chain algorithmic optimization

designed optimal path algorithms using dijkstra's and custom c++ regression models for yield prediction and demand forecasting.

enterprise ai workflows

end-to-end workflow system integrating azure openai and ai search for enterprise-grade automation.

distributed log aggregation system

high-throughput log ingestion and search system handling 50k+ logs/sec using go concurrency and kafka.

seq2seq summarization

sequence-to-sequence summarization model using encoder-decoder transformers.

volatility surface modelling (gan/vae)

generative models to produce smooth volatility surfaces for option pricing.

clipdb

privacy-first clipboard history manager with fuzzy search and aes-256 encryption.