aarushi singh

about

i'm a computer science undergraduate specializing in artificial intelligence, with a strong interest in backend engineering, distributed systems, and end-to-end machine learning pipelines. i enjoy exploring ai engineering workflows, experimenting with langchain, agents, and model integration, and designing scalable system architectures.

i have worked with c++, python, scala and tools like docker, kafka, and various cloud platforms with industrial exposure during my internship at microsoft. my current intellectual pursuits are centered around machine learning research!

some of my recent research interests include:

methods for improving interpretability and controllability of large models after training.
techniques for making ml models robust to noisy, real-world data.
lightweight models and optimization methods for faster inference.

experience

software engineer intern

microsoft · june - august 2025

worked on the azure data spark dev native execution engine (nee) using c++, scala, velox, gluten, docker, azure devops. integrated fuzz-testing pipelines, improved operator reliability, and enhanced ci/cd diagnostics for large-scale distributed sql execution.

undergraduate ml researcher

bennett university · august 2024 - ongoing

researched recommender systems and computer vision using pytorch, tensorflow. improved ndcg/mrr on mf models and benchmarked cnn/transformer architectures for emotion recognition on fer2013.

skills

c++ python go java scala bash docker git azure devops kafka fastapi postgresql mysql pytorch tensorflow keras hugging face langchain numpy pandas opencv linux jupyter jira

projects

enterprise ai workflows

python, fastapi, azure openai, azure ai search, docker, react

end-to-end workflow system integrating azure openai and ai search for enterprise-grade automation.

distributed log aggregation system

go, kafka, postgresql, docker

high-throughput log ingestion and search system handling 50k+ logs/sec using go concurrency and kafka.

seq2seq summarization

python, transformers, seq2seq

sequence-to-sequence summarization model using encoder-decoder transformers.

volatility surface modelling (gan/vae)

python, gan/vae, quantlib, numpy/pandas

generative models to produce smooth volatility surfaces for option pricing.

clipdb

python, cli, open-source

privacy-first clipboard history manager with fuzzy search and aes-256 encryption.