Artificial Intelligence Thoughts

This is written as if directly spoken. Not very structured. A mix of philosophy and transformer knowledge which occurred as 3 am thoughts. I do think that “dynamic effective context” is what will lead us to (even if suboptimal) world models. Furthermore, test time training/adaption and latent space reasoning combine with these to give “general” intelligence? At the end of the day, it’s Occam’s Razor law.

It is beautiful to see and try to understand the current era.

Humans only learn what they perceive in some form. There are no gradients we compute to back propagate. We do not retrieve from our weights, just in context. That is why all of us have different world models.

In a transformer, in context learning is an emergent property [1], [2]. Suppose the model is extremely large. In this regime, model does not “recall” “facts” from its weights anymore. So now the weights are used at a different level of abstraction. Yes different contexts are “recalled” differently, for example eating food, or some similar skill is much different to memorizing St. Venance principle. But isn’t eating food like context induced prediction itself? I can recollect the way my muscles should move better when food is in front of me than when just idle. And when we leave the world, the weights prolly remain unchanged? Or better yet, the human body comes with a set of weights. And God (optimizer) trains your weights every time when you go from one life to next based on the loss and accuracy achieved during the previous life. (So the context you see is the test set (train for the next life) but generated autoregressively by yourself! Only your birthplace is given (the starting state)) How you did in this epoch, so will be your weights in the next life. But the entire life is only one batch, one epoch. Weights remain unchanged in a life.

Attention to future tokens is not masked. In rare instincts, it is possible to predict the future. (Not entirely sure how to explain this yet, but sometimes we can think in dimensions such that we possess the ability to make thoughts come to real life in this heavily stochastic world. This somehow relates to being able to see tokens in future but wait, autoregressive generation not anymore)

Under anxious states the context range we can effectively perceive decreases. The set of weights which act upon our beliefs (activations) are much more in a confined subspace.

This being said, if we were to construct world models, although sub-optimally, it seems most tractable to create heavy in-context learners, so much so that the contexts define the world model.

Dynamic effective in-context learners, I believe, is what we need. Going from Out of Distribution Generalization [2] to General Intelligence…

References

[1] Language Models are Few-Shot Learners
[2] In-Context Learning Strategies Emerge Rationally