What exactly does llama_get_embeddings return? #3643

Trawczynski · 2023-10-16T14:07:15Z

Trawczynski
Oct 16, 2023

Hi, I've been going through the code trying to understand what llama_get_embeddings returns, but I can't figure it out.

I'm trying to use stablebeluga-13b.Q8_0.gguf as an encoder, to populate a database of embeddings of texts. I've already read some issues mentioning that using Llama models for this task gives bad results, but I wanted to try it anyway.

I'm using the python wrapper, and basically running this code:

model = Llama(..., embedding=True)
embeddings_raw = model.create_embedding(a_list_of_texts)
embeddings = [x['embedding'] for x in embeddings_raw['data']]

I've tracked the calls in the python wrapper code, and it seems to end up calling llama_cpp.llama_get_embeddings, so that's why I'm asking in this repository.

I'm not sure where the embedding values come from. Obviously, I'm interested in getting a representation of the whole text (or N texts) passed as input to the function.
Are the returned embeddings simply the contextual embedding of each token, with a mean pooling applied? As far as I know, this is how most of the sentence-transformers models work.

I've done some basic tests using the embeddings and the results are weird. I wanted to post them here, since I haven't seen similar results.

The following plots are embeddings of a text reshaped into a matrix, so that each pixel is just a value of the embedding. The subplot on the left is the embedding of an auto-generated sentence, and the one on the right is the embedding of the same sentence shuffling its words.
There are some values that clearly dominate the rest (they are either very high or very low), and the strange thing is that they seem to be independent of the input text (they repeat in all plots).

I'm just posting a few examples, but I've generated many more.

Answered by adamamer20

May 25, 2024

Just in case anyone is still wondering, the question has been answered in #7087. llama_get_set_embeddings returns the embeddings in the last hidden layer and thus the embeddings are contextualized (i.e. have been processed by the transformer) and should be meaningful.
For example, in Phi3:

Phi3ForCausalLM(
  (model): Phi3Model(
    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x Phi3DecoderLayer(
        (self_attn): Phi3Attention(
          (rotary_emb): Phi3RotaryEmbedding()
          (o_proj): QuantLinear()
          (qkv_proj): QuantLinear()
        )
        (mlp): Phi3MLP(
     …

View full answer

ggerganov · 2023-10-17T18:20:52Z

ggerganov
Oct 17, 2023
Maintainer

There might be an issue with the embedding functionality when using CUDA: #3625
Retry your tests with CPU-only to make sure you are not hitting the same problem

0 replies

bpben · 2023-12-28T19:29:21Z

bpben
Dec 28, 2023

Just to chime in here - I'm interested in an answer to some of the questions @Trawczynski posed above! What exactly is being returned in the embeddings? I can't quite decode what's happening under the hood. Is this the last hidden state or the pooled hidden state? Or something else?

0 replies

adamamer20 · 2024-05-25T08:50:29Z

adamamer20
May 25, 2024

Just in case anyone is still wondering, the question has been answered in #7087. llama_get_set_embeddings returns the embeddings in the last hidden layer and thus the embeddings are contextualized (i.e. have been processed by the transformer) and should be meaningful.
For example, in Phi3:

Phi3ForCausalLM(
  (model): Phi3Model(
    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x Phi3DecoderLayer(
        (self_attn): Phi3Attention(
          (rotary_emb): Phi3RotaryEmbedding()
          (o_proj): QuantLinear()
          (qkv_proj): QuantLinear()
        )
        (mlp): Phi3MLP(
          (activation_fn): SiLU()
          (down_proj): QuantLinear()
          (gate_up_proj): QuantLinear()
        )
        (input_layernorm): Phi3RMSNorm()
        (resid_attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_mlp_dropout): Dropout(p=0.0, inplace=False)
        (post_attention_layernorm): Phi3RMSNorm()
      )
    )
    (norm): Phi3RMSNorm()
  )
  (lm_head): Linear(in_features=3072, out_features=32064, bias=False)
)

Embeddings are returned before the lm_head, after the norm layer.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What exactly does llama_get_embeddings return? #3643

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

What exactly does llama_get_embeddings return? #3643

Trawczynski Oct 16, 2023

Replies: 3 comments

ggerganov Oct 17, 2023 Maintainer

bpben Dec 28, 2023

adamamer20 May 25, 2024

Trawczynski
Oct 16, 2023

ggerganov
Oct 17, 2023
Maintainer

bpben
Dec 28, 2023

adamamer20
May 25, 2024