First research experience in AI

ft. Viveka 1.0 - sophomore year

Background and Motivation

I knew how to read math research papers from school. But I used to feel very lost while reading AI research papers. The way authors casually write loss functions – which gave me no intuition on first read. Those plots in which bars also had lines above and below as if they were ground water pumps ~~(standard deviation)~~. Super long papers with random technical words in appendix. And loads of references and citations — scary, isn’t it.

Everyone comes to college with a dream, I wanted to create impact. To build something meaningful and to solve problems that matter (very generic but yea). And from everything that I tried in my freshie year, math in AI seemed most my thing.

As I read more papers, the fear went. But still couldn’t understand papers properly. But something worse was implementation! God forbid, I would have never wanted to code if GPT did not exist. (The GPT from back then was so trashy at coding). Implementing papers was very hard for me.

Especially the knowledge gap. Being open to not knowing and learning stuff on the way. Accepting it every second while you do something — is not easy. At least not for first timers. Definitely, not for me personally as a girl in IIT Madras. The reason I say this is — yes it is hard to find girls around with strong passion in tech, even if it is IIT. Comparatively, the scenario is very opposite for guys. This does create fomo¹ on speed.

All these issues got sorted with— my AI club mini project Pix2Pix, and increasing interest and thirst for reading GNN papers.

Comes closer – summer. While looking for summer program/ research opportunities in AI, I came across MATS- Machine Learning Alignment & Theory Scholars Program. Looked very ambitious and interesting from FAQs, project streams. Read a couple of blogs by Neel Nanda both technical and personal. Wanted to apply, so I clicked on the link. It asked for my domains of interest- Alignment, Mechanistic interpretability, Policy, AI Security, AI theory, etc. etc. ~~Aren’t all of these the same?~~ Went to the alumni page, saw all PhDs, and never opened the site again for a few days.

Viveka 1.0

Out of nowhere, I came to know that the AI club has a project next year, “Viveka” - Aiming to mitigate hallucinations in LLMs through a mechanistic interpretability approach led by the best two seniors in AI from 2023 batch, Jayden and Harshith. And that they also have fellowship and Mentor onboarded. Now obviously, this is my best bet and this is what I am doing next year.

Comes the application process. You can access the application here.

Deadly app, given that I had to do Appian hackathon final round, math club coordinator app, and something else (idr²) simultaneously.
Insane competition
At home, so couldn’t just not sleep.

All the more reasons to app, isn’t it?

Was the app fun- insanely (but also a torture to some extent).
Was it math- definitely.
Got a very steep learning curve- obviously.
Comfortably reading papers now- at the speed and in the way I want- For sure.
Got better at implementation- yes, to a great extent.
Did I know I would get rejected- Obviously!

I mean, so oversmart of me to not put AI club coordinator ~~and instead put math coord~~. I assume - because I did not put a coordinator app and Club people would want coordinators, so any slightly less good project app but a better coord app will get in, I will get rejected.

Did I do okay in the interview- yes. Did I do a good job in the Post interview task³ - Very good, they say. Did I get in? Yess!!!!

How has the tenure been- Cannot quantify how much I learned. This blog is a way to attempt walking through the year. It has been a grind that I always wanted, and shall cherish. Looking forward to more! (Viveka 2.0)

Technical outcomes – will be released after the internship drive and summer is over, but they will be found here.

The Tenure

Upskilling. Our PLs⁴ realized that the app already upskilled us a lot. To overcome the competition, all of us already read interpretability papers and blogs. So we just had a small upskilling task on transformer lens- A Library for Mechanistic Interpretability of Generative Language Models.

Literature review. Notion had a paper dump the PLs and few experienced PMs (again I felt fomo because I was not among them) made and everyone had to take 3-4 of these and present them to the entire team (11 people in total) in meet.
4 hours of meeting on the first day, only half done. Another 4 hours next day. ~~Hallucination hallucination all over the brain.~~ Crazy ideas inflow. We had a viveka group, a viveka Idea dump group, divided into subgroups- idea dump there as well. The energy was crazy. Crazy team.

The first set of experiments. Jayden and me were working on factual recall circuits. Arsh, Sharan, Sriram took up Truth is Universal and similar papers. Eshika, Saahil took up Truthflow, Samrudh, Sriram took up Sparse Auto encoder. Later Pakshal joined me and Jayden, Vedant joined … (I don’t remember)

We all worked over the summer break. When we came to insti⁵, our in-person meets were so full of energy. Everyone explained their setup, what worked and what did not, cross questioning each other, asking doubts. After a sequence of such meets, we packed some ideas and came up with some new set ups. (I remember Sharan coming to an in-person meet despite having fever!)

Some IOI⁶ in circuits, tuned lens, Norm lens. (Oh tuned lens was another crazy experience!- Intense memory optimization learned. First experience on Jarvis⁷. First all nighter for project ;) )
Later Pakshal and I wrote a blog.

Non-linear truth representation. I would go attend all subgroup meets. For Truthflow, we somehow did a slightly different implementation by accident and got significantly better accuracy. None of us understood Flow models for long, so we gave up.

Then our new experiment set-ups were a result of lots of assumptions in literature back then. We hypothesized that general truth, it exists, lies in a low dimensional subspace and is non linear.

We came up with a rigorous experiment design after lots of iterations. We will release the pre-print/technical report soon post summer.

But apart from this, obviously a team of 11 people were not all working on this. We eventually packed⁸ Circuits (which I was working on) as well – it was very microscopic; did not generalize and made assumptions on residual stream activations having a language interpretation.

Sick of language – toy models. Playing with Language, without a mathematical framework for internals of large transformers became: annoying. Just because it worked good enough, why was the model space (middle layers) of the transformer assumed to have a token space (language) interpretation. Most intuitively to me, it would not.
With hallucinations, good enough does not work and the good that existed in literature back then was – not so good.

We strongly wanted some concrete mathematical framework. For me, there was just one question – however in the world, whatever in the world – figure out how to detect hallucinations robustly from a transformer residual stream. So I was trying exactly that. We looked into various domains, frontier mech interp research, and read an insane amount of papers. And comes up the Simplex Progress Report.

Being a math lover and finding that transformers represent fractals like structure linearly in residual stream is – Incredible!

Over the past semester, a sub group of our team has been working with toy models where the ground truth distribution is known (unlike language) and relatively more rigorous to play with and find in transformers. We tried understanding various behaviours like ICL, Superposition and extending the framework itself further. Will release blogs/ papers in a month after the fifth semester starts and the intern drive is over.

I also feel privileged to have found out about and read the beautiful math behind Statistical (and Singular) learning theory, Computational Mechanics and a lot more in the course of this tenure (Which also motivated my mini-project SOLSTICE at Math club!).

Moreover, I am thankful to have developed a lot of meta-life and research skills, including– applying to Research Internships, how to not get burnt out, how to balance life and research, when to stop going ahead in a particular direction.

Viveka team is crazy. I consistently felt a dedicated, passionate and motivated energy. The super long meetings on ideating and thinking through hypotheses and experiments has been the most fun part for me. Our meets never ended at CFI terrace. We again stopped and discussed at the ground floor inside CFI⁹. Then, just outside. Then at the turn that connects CFI to the academic road. Then at the turn from RJN to the hostel area. Wherever on campus any of us met, any event, all that we discussed was Viveka.

Thanks a lot Jayden, Harshith for the project and everyone in the team for the great tenure!
Really grateful to Grad Capital for the Atomic grant and ExceptionRaised! for funding, Dr. Nitish Mital, Professor Avishek Chatterjee for mentoring us through the project.

Looking forward to my tenure as AI Club strategist and obviously, Viveka 2.0!

Fear of missing out ↩
I don’t remember ↩
Presenting this paper in 2.5 days of preparation- when I had just understood the transformer architecture and just trained my first LSTM on sentiment classification. ↩
Project leads ↩
Insti is IITM lingo for institute (basically university/ college) ↩
Indirect object identification ↩
Jarvis- A GPU computing cluster where we can rent GPUs, based out of Chennai ↩
Stopped working on Circuits ↩
Center for Innovation, IIT M ↩