-
Notifications
You must be signed in to change notification settings - Fork 5
release truth vectors? #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here are the t_g and t_p truth vectors for I've done some initial tests on these and seem to be getting sensible results. For example, if I look at I can post other results on this thread including truth vectors for other models and/or can consolidate these into a pull request if there is interest. |
Out of curiosity I generated truth vectors for the new Most of the stats seemed reasonable - the only surprise was that the separation scores were lower. But the shape was right and they still peaked around layer 12. My plan is to examine data that scores high (or low) in cosine similarity to t_g in this model but not in the llama-3-8b-chat model to get a sense of what the 'diff' between these truth vectors might be. |
Hey Tom, thanks a lot for doing these very interesting experiments. Great to see that the DeepSeek Distill Llama also has this internal truthfulness representation. I am very sorry that I did not respond until now. I went on vacation and then forgot. I will monitor issues that have been opened in this repo more closely now :) |
Thanks for providing code to replicate the experiments!
Could you also provide the (optimal) truth vectors for the supported models?
The text was updated successfully, but these errors were encountered: