Published: 2023-05-08

<aside> <img src="/icons/push-pin_gray.svg" alt="/icons/push-pin_gray.svg" width="40px" /> This post reflects my understanding as of May 2023. Some parts are more fleshed out than others. Feedback is welcome.

</aside>

<aside> <img src="/icons/push-pin_gray.svg" alt="/icons/push-pin_gray.svg" width="40px" /> I looked into this as part of my work with New Science.

</aside>

Author: Yaroslav Shipilov (@TheSlavant)

TL;DR

Given differences between organic brains and Transformers and human cognitive limits, it is likely that LLMs learn performance-boosting token relationships in training data that are not discernible to humans. This may lead LLMs to correct solutions through seemingly nonsensical, humanly-uninterpretable token sequences, or Alien Chains of Thought.
Alien CoT is likely if AI labs optimize for answer accuracy or token efficiency in the future. Can LLMs produce Alien CoT today despite RLHF? I propose that it may be possible if we simulate the right optimization pressure via prompt engineering.
Eliciting Alien CoT would be significant. On the one hand, it would help us understand the limits of our reliance on LLMs and help define how we align future LLMs. On the other hand, we may be able to use newly discovered relationships to advance our own knowledge.
Reach out if you are interested in working on this.

Contents

LLMs can use chains of thought that don’t make sense to humans

Just as there are odors that dogs can smell and we cannot, as well as sounds that dogs can hear and we cannot, so too there are wavelengths of light we cannot see and flavors we cannot taste. Why then, given our brains wired the way they are, does the remark, "Perhaps there are thoughts we cannot think," surprise you?

Richard Hamming, The Unreasonable Effectiveness of Mathematics

We may say they are travelers to unimaginable lands — lands of which otherwise we should have no idea or conception.