What exactly does llama_get_embeddings return? #3643
-
Hi, I've been going through the code trying to understand what llama_get_embeddings returns, but I can't figure it out. I'm trying to use stablebeluga-13b.Q8_0.gguf as an encoder, to populate a database of embeddings of texts. I've already read some issues mentioning that using Llama models for this task gives bad results, but I wanted to try it anyway. I'm using the python wrapper, and basically running this code:
I've tracked the calls in the python wrapper code, and it seems to end up calling llama_cpp.llama_get_embeddings, so that's why I'm asking in this repository. I'm not sure where the embedding values come from. Obviously, I'm interested in getting a representation of the whole text (or N texts) passed as input to the function. I've done some basic tests using the embeddings and the results are weird. I wanted to post them here, since I haven't seen similar results. The following plots are embeddings of a text reshaped into a matrix, so that each pixel is just a value of the embedding. The subplot on the left is the embedding of an auto-generated sentence, and the one on the right is the embedding of the same sentence shuffling its words. I'm just posting a few examples, but I've generated many more. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
There might be an issue with the embedding functionality when using CUDA: #3625 |
Beta Was this translation helpful? Give feedback.
-
Just to chime in here - I'm interested in an answer to some of the questions @Trawczynski posed above! What exactly is being returned in the embeddings? I can't quite decode what's happening under the hood. Is this the last hidden state or the pooled hidden state? Or something else? |
Beta Was this translation helpful? Give feedback.
-
Just in case anyone is still wondering, the question has been answered in #7087. Phi3ForCausalLM(
(model): Phi3Model(
(embed_tokens): Embedding(32064, 3072, padding_idx=32000)
(embed_dropout): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0-31): 32 x Phi3DecoderLayer(
(self_attn): Phi3Attention(
(rotary_emb): Phi3RotaryEmbedding()
(o_proj): QuantLinear()
(qkv_proj): QuantLinear()
)
(mlp): Phi3MLP(
(activation_fn): SiLU()
(down_proj): QuantLinear()
(gate_up_proj): QuantLinear()
)
(input_layernorm): Phi3RMSNorm()
(resid_attn_dropout): Dropout(p=0.0, inplace=False)
(resid_mlp_dropout): Dropout(p=0.0, inplace=False)
(post_attention_layernorm): Phi3RMSNorm()
)
)
(norm): Phi3RMSNorm()
)
(lm_head): Linear(in_features=3072, out_features=32064, bias=False)
) Embeddings are returned before the |
Beta Was this translation helpful? Give feedback.
Just in case anyone is still wondering, the question has been answered in #7087.
llama_get_set_embeddings
returns the embeddings in the last hidden layer and thus the embeddings are contextualized (i.e. have been processed by the transformer) and should be meaningful.For example, in Phi3: