Min P style sampling - an alternative to Top P/TopK #27670

kalomaze · 2023-11-23T13:06:27Z

Feature request

This is a sampler method already present in other LLM inference backends that aims to simplify the truncation process & help accomodate for the flaws/failings of Top P & Top K.
Min P.

What Min P is doing is simple: we are setting a minimum percentage value that a token must reach to be considered during sampling. However, this is not a hard limit. The minimum will 'scale' based on the top token's probability. So, if you have a Min P value of 0.1 (for example), that would mean your base Min P requirement is 10%. So if your top token is 25%, that means it will only consider tokens that have at least 2.5% probability.

This method subjectively seems to improve results across the board with no noticeable downside, and has been merged into the following FOSS LLM backends:

llama.cpp
vllm
text-generation-webui (through both the HF loaders and llama-cpp-python)

I would suggest a default of 0.05.

Motivation

I noticed certain 'flaws' in the popular Top P sampling method:

When the model does not have sufficient confidence/concentration on the next token candidate(s), it's possible for the sampler to consider many tokens that are highly unlikely compared to the few choices it has confidence in.
Top K helps limit the amount of 'low confidence' tokens period as a supplement to Top P, but this often comes at a cost of token choice diversity (often arbitrarily).
In addition to this, Top P can sometimes cut reasonable tokens. What if there's a 90.1% probability token, followed by a 9% probability token? A Top P value of 0.90 would completely gloss over the 9% token in this instance.

For this reason I made Min P which seems to have positive reception across the board.

Your contribution

I may consider making a PR for this.

ArthurZucker · 2023-11-23T14:33:15Z

fyi @gante 🤗

gante · 2023-11-29T11:21:51Z

Hi @kalomaze 👋 Thank you for opening this issue!

In addition to Temperature, Top p, and Top k, which apply distribution-agnostic transformations, we have three other distribution-aware transformations:

These techniques do a similar thing to what you mention: they apply a "Top p"-like transformation, adjusted by the probability distribution.

Since we already have similar techniques, backed up by papers with benchmarks, I'm reluctant to add this technique without further benchmarks. Maintenance is a heavy long-term burden in transformers that we want to contain 🤗

kalomaze · 2023-11-30T16:51:54Z

Hi @kalomaze 👋 Thank you for opening this issue!

In addition to Temperature, Top p, and Top k, which apply distribution-agnostic transformations, we have three other distribution-aware transformations:

Typical P Decoding

Epsilon Sampling

Eta Sampling

These techniques do a similar thing to what you mention: they apply a "Top p"-like transformation, adjusted by the probability distribution.

Since we already have similar techniques, backed up by papers with benchmarks, I'm reluctant to add this technique without further benchmarks. Maintenance is a heavy long-term burden in transformers that we want to contain 🤗

The scaleability of Min P in comparison to Top P seems to objectively be more consistent beyond just theorycrafting.

Min P also highly interpretable in comparison to Locally Typical sampling which gets into denser, more subjective interpretations of information theory, which begs to question whether or not it's overdesigned. This makes Typical sampling less intuitive to use for the end user.

In addition to this, Typical sampling, Epsilon sampling, and Eta sampling as techniques have seen extremely limited real world adoption in terms of open source LLM interfaces, which, at large, have continued to use Top K and Top P in their wake. If not those two, Mirostat has seen mild popularity, but I would argue the latter two samplers (Epsilon sampling, Eta sampling) are perhaps less proven in terms of subjective quality.

In conclusion, Min P:

Is more interpretable to end users and developers compared to the methods you listed. This has less risk of unintended behavior in terms of achieving the same goal as Top P / Top K when compared to typical sampling, which is less proven in the 'real world'.
It has been proven to scale more consistently in comparison to Nucleus sampling in practice, as mentioned earlier
It has consistently seen positive reception and adoption from the open source language model community at large to the point where most inference backends (vllm, llama.cpp, exllamav2, text-generation-webui's HF loaders, etc) have adopted it:

I will also note that a common issue for open source language models is the lack of truly objective metrics for testing beyond manual human analysis; so any apparently 'standard' testing metrics should be given serious scrutiny before they are considered absolute and final measures in which to compare sampler methods.

If there are any specific metrics you would like to see on any specific models, I can try to provide them to support my case beyond the subjective results and widespread adoption of the technique (which I figured would stand out on their own, but having numbers would be beneficial... assuming we can trust the numbers, which is an assumption I'm hesitant to make without sufficient fundamental evidence for their use beyond "arxiv papers used it")

gante · 2023-11-30T17:38:01Z

@kalomaze precisely because in the past we've added techniques that had some results but ended up not having much use (like Eta sampling) I'm asking for additional validation :) For instance, Eta sampling had a blind human preference test, where it was shown as preferred over top p, with a relatively low sample size (N=294). However, the upside (and the marketing) was not large enough, so the community decided to stick with simpler, established techniques like top p.

Just because other repos have merged your technique, it does not make it inherently good. ML is a data-driven science, so let's collect data -- I have yet to see any data beyond a few examples. Note that this is nothing against your creation, I actually agree with it in principle. transformers is a large library with a few maintainers, we have to be conscious of what we add here.

A good test would be to compare your technique against others with blind human preference 🤗 There is nothing better than human preference -- I'd be happy to participate in the evaluation.

kalomaze · 2023-11-30T17:42:46Z

A good test would be to compare your technique against others with blind human preference 🤗 There is nothing better than human preference -- I'd be happy to participate in the evaluation.

Do we have enough people who are willing to test / evaluate this to rule out the margin of error, though? The main thing we are looking for is to minimize the included outliers when improving the truncation schemes (and those are usually low probability to begin with), and outliers are going to be hard to test for without sufficient data if you sample normally, unless we change the sampler to only pick the least likely token (as a way to measure the truncation consistency directly).

I've done exactly that before for Top P and Min P and I saw that Min P was an obvious improvement. Would you like me to reproduce that experiment but with Typical sampling? (Llama.cpp, my inference engine of choice, has a broken implementation of Typical sampling at the moment but there is a PR to fix that I can use, and Eta/Epsilon just aren't adopted anywhere else in the LLM world so I'd have to learn how to use Transformers to test those, which seems like it will be necessary for my future LLM tests)

I'm also aware that an appeal to popularity isn't hard evidence, but I think it's a stronger marker in this case than it would otherwise be given the context of LLM benchmarks and especially certain metrics (e.g perplexity) being dubiously unreliable markers of quality in the ML space.

gante · 2023-11-30T18:18:45Z

Do we have enough people who are willing to test / evaluate this to rule out the margin of error, though?

Between your reddit and my twitter/LI reaches, we will definitely have more than enough people to run a proper study. If you agree to build the interface for the study (e.g. through a HF spaces), I'd be more than happy to promote it! I also have the power to allocate GPUs to a space in order to run the study 💪

The main thing we are looking for is to minimize the included outliers when improving the truncation schemes (and those are usually low probability to begin with), and outliers are going to be hard to test for without sufficient data if you sample normally, unless we change the sampler to only pick the least likely token (as a way to measure the truncation consistency directly).

I agree that the biggest difference is in the outliers. However, each output may have tens or hundreds of tokens, so the effect of bad "1% probability tokens" is not that hard to observe :) If there is noticeable human preference after >1000 samples, then we can be sure that it makes a difference.

Also, if the test turns out to be a success, you'd gain much more power over the distribution of your technique :D There are no questions over human preference.

especially certain metrics (e.g perplexity) being dubiously unreliable markers of quality in the ML space.

100% agreed

kalomaze · 2023-11-30T20:22:18Z

Do we have enough people who are willing to test / evaluate this to rule out the margin of error, though?

Between your reddit and my twitter/LI reaches, we will definitely have more than enough people to run a proper study. If you agree to build the interface for the study (e.g. through a HF spaces), I'd be more than happy to promote it! I also have the power to allocate GPUs to a space in order to run the study 💪

The main thing we are looking for is to minimize the included outliers when improving the truncation schemes (and those are usually low probability to begin with), and outliers are going to be hard to test for without sufficient data if you sample normally, unless we change the sampler to only pick the least likely token (as a way to measure the truncation consistency directly).

I agree that the biggest difference is in the outliers. However, each output may have tens or hundreds of tokens, so the effect of bad "1% probability tokens" is not that hard to observe :) If there is noticeable human preference after >1000 samples, then we can be sure that it makes a difference.

Also, if the test turns out to be a success, you'd gain much more power over the distribution of your technique :D There are no questions over human preference.

especially certain metrics (e.g perplexity) being dubiously unreliable markers of quality in the ML space.

100% agreed

Understood; I've never made a HF space, so that'd be new territory for me, though I'll look into it for sure (since having empirical data would be helpful.)

What would be a fair comparison value to Top P? Or would you prefer something where all methods all evaluated (that might be too aggressive, though?) The next problem, I think, is finding an 'equivalent scale' for all methods. The scale of Min P is obvious and understood; but for Epsilon & etc it's difficult for me to determine...

gante · 2023-12-01T10:30:20Z

@kalomaze I'd suggest to start simple, going against top p alone. Less work and straight to the point. If we realize we're gathering enough participants, then we can expand it to multiple models and multiple strategies, for a better overview.

I can help you with any roadblock or questions you have along the way: the results are very much of my interest! 💛

(and I'm crossing my fingers for Min P to be successful!)

kalomaze · 2023-12-01T16:38:15Z

@kalomaze I'd suggest to start simple, going against top p alone. Less work and straight to the point. If we realize we're gathering enough participants, then we can expand it to multiple models and multiple strategies, for a better overview.

I can help you with any roadblock or questions you have along the way: the results are very much of my interest! 💛

(and I'm crossing my fingers for Min P to be successful!)

I see, that's very doable.

How about:

Top P 0.98 vs Min P 0.02
Top P 0.95 vs Min P 0.05
Top P 0.90 vs Min P 0.1
Top P 0.80 vs Min P 0.2

At temperature 1.0?

gante · 2023-12-01T18:05:56Z

@kalomaze sounds good (I'm assuming you're more sensible than me to what a good pairing looks like :) )

I'd perhaps suggest lowering the temperature a bit, to 0.7-0.8 (which is what most LLMs use by default nowadays)

kalomaze · 2023-12-03T03:56:34Z

I'd perhaps suggest lowering the temperature a bit, to 0.7-0.8 (which is what most LLMs use by default nowadays)

The API docs for OpenAI suggest either lowering temperature or using Top P, but not both, which seems to imply truncation sampling was intended for use with a standard temperature (which makes sense to me); and the default provided is also 1.0 for GPT in the first place.
Temperature 1.0 is also representative of the original logit scores transformed into probabilities, and isn't an arbitrary transformation, so it makes the most sense to me at least, to compare at this value (unless you have other reasons for it).

gante · 2023-12-05T10:40:09Z

@kalomaze temperature can be seen as a post-hoc calibration of the model logits -- an underconfident model should use a temperature below 1.0 and vice-versa. You can also see it as sharpening (<1.0) or flattening (>1.0) the probability distribution. It does have some overlap with top p, with the difference that top p acts on the probabilities and temperature on log probabilities -- after top p, you can end with the same possible tokens, but the temperature will have an impact on their relative distribution.

The optimal temperature changes across models and tasks, with llama models excelling around ~0.7 for most tasks. For instance, the starcoder model is recommended to be used with temperatures around ~0.3 :) My suggestion for 0.7-0.8 assumed the use of models like llama or mistral

menhguin · 2024-04-08T09:35:07Z

hi @gante , just an update on this. I'm Minh, coauthor of kalomaze's research paper introducing Min P. I've started running EleutherAI's eval harness on Min P, and early results seem quite strong.

This is a test of Top_P = 0.9 vs Min_P = 0.9, for GSM8K_COT (8-shot), exact match on the EleutherAI eval harness.

Strangely enough, Min_P = 0.9 is optimal at high temps. Min_P =0.1 at temp = 2 will get you 6%, which is better than Top_P's 0%, but nowhere near as impressive as 0.40%.

We'll be conducting more tests to isolate confounding variables, optimise performance, run other evals etc. But for now you can replicate this quite easily with the following settings (min_p = 0.9, top_p =1 (disabled) and temp =2 to 3). There might be some bugs/errors, but getting basically zero performance drop at temp = 3 seems quite significant.

!python /content/lm-evaluation-harness/lm_eval \
    --model vllm \
    --model_args pretrained=mistralai/Mistral-7B-v0.1,dtype=auto \
    --batch_size "auto" \
    --tasks gsm8k_cot \
    --num_fewshot 8 \
    --wandb_args project=lm-eval-harness-integration \     # delete if you don't want to use wandb to log results
    --log_samples \
    --output_path ./lm-eval-output/ \
    --gen_kwargs min_p=0.9,top_p=1,temperature=3,do_sample=True \
    --device cuda

Test colab used (used VLLM since this is not available on HF). Trying to figure out how to share wandb results directly.

Edit: here's a colab where you can replicate the Min_P tests at temperature = 9 https://colab.research.google.com/drive/1-gGcr7AyU9BdgkTxF8CTVQ9MpoqWvZhJ

So far these are just mathematical reasoning evals I can run quickly on colab (vs setting up human preference). Do let us know if you have any creativity-focused evals in mind, or if you'd still prefer setting up human preference evals resembling LMSys Chatbot Arena.

menhguin · 2024-04-08T11:08:39Z

updated with better labelling, more optimised settings:

I tested Min_P = 0.9 vs Top_P = 0.1 for consistency, in practice scores in 0.7-0.8 range were comparable and warrant testing
With Min_P = 0.9, there was negligible performance degradation from temp 0-5, and marginal degradation at 5-10
Score degradation is basically linear from temp 5 to 30+. Will optimise more later, I've only spent like 4 hours tweaking settings. It's hard to find the optimal test settings that make sense, but every head-to-head matchup seems to be favourable for min_p (0.9 vs 0.1, 0.8 vs 0.2, 0.5 vs 0.5, 0.1 vs 0.9)

There's an argument to be made that no one would use 0.1 top_p anyway, but hmmmmmn it's hard to figure out what settings should represent "realistic" user behaviour, since we don't ... have much info on user sampler preferences?

Will now test other evals, try to quantify creativity/diversity and try human preference evals. Again, any suggestions welcome.

gante · 2024-04-19T10:35:52Z

@menhguin thank you for the thorough investigation -- this is the sort of data I was looking for! There seems a parameterization range in which min_p excels, which means we should add it to 🤗 transformers.

@menhguin @kalomaze would any of you be interested in opening a PR?

since we don't ... have much info on user sampler preferences

I struggle with this limitation myself 💔 It would be cool to have an LMSYS-like table for generation parameterization!

menhguin · 2024-04-19T11:38:06Z

@menhguin thank you for the thorough investigation -- this is the sort of data I was looking for! There seems a parameterization range in which min_p excels, which means we should add it to 🤗 transformers.

@gante Just an update on evals done on VLLM:

I did a closer investigation of actual user preferences (based on the SillyTavern Discord). Users tend to prefer 0.05 and 0.1 min p. Here's the updated graph. It's not as hilariously outmatched, but still very significant.

here is another independent eval done by the creator of AQLM, this time on EQ Bench's creative writing eval (from https://t.me/senior_augur/76 )

We're almost done with evals (just beefing up the methodology to pre-empt the critiques of random peer reviewers), and hope to finalise the paper by the end of the month.

Again, we've proven quantitative improvements on the relevant benchmarks and we have user reports of their preferences. Theoretically, the only thing we don't have is a user preference ranking LMSys-style, but that feels ... outside our scope.

@kalomaze says he'll do a PR when he wakes up

I feel like the most convincing argument is that min p has no noticeable downside on more deterministic settings (lower temp, less selective sampling), and noticable upside on less deterministic settings (higher temp, more selective sampling). So it's arguably a straight improvement, unless someone discovers something really new.

Hellisotherpeople · 2024-04-22T19:07:21Z

This is a rare massive L from huggingface for putting Kalomaze and the related folks through what is useful but ultimately unnecessary work to prove what the community already knows - which is that Min P style sampling is awesome.

Huggingface is the "atlas" holding up the rest of the NLP ecosystem. They have a duty to support as many samplers as possible, even if there were worse justifications for implementing them.

Witnessing this has made me quite sad for the state of LLMs in general. Dynamic Samplers (and especially Min_P) are straight up better than top_p/top_k and even more sophisticated stuff like typicality sampling (which is dynamic) is still known to be not as good by a lot of actual users on r/localllama today.

Slowing down the proliferation of techniques like this does a whole lot to hurt the general perception of the quality of open source LLMs, and incentivizes the community to push towards infinitely scaling paramater counts as the only "solution" to issues with LLM output.

Hellisotherpeople · 2024-04-22T19:11:57Z

Also @menhguin and @kalomaze , I'm extremely interested in helping out on the research paper you two are writing in any way that I can. I have access to significant amounts of compute resources and a rather large network of professionals who will be more easily persuaded about the merits of this technique than the folks in this PR section.

amyeroberts · 2024-04-22T19:31:27Z

Thank you @menhguin for such detailed deep-dives into this and the detailed graphs!

@Hellisotherpeople It might seem counter-intuitive, but being selective about what we do and don't add to the library actually helps us move faster. We receive lots of PRs, issues and feature requests a day, and every new addition has a maintenance burden. It's therefore important that we're selective to make sure additions have high impact and the time spent adding and maintaining it is valuable. Asking for results and/or evidence of community interest is pretty standard. Even if those are proven, sometimes there's other reasons it makes sense to add later.

It might be frustrating to not see something you want immediately added to the library. The great thing about open-source is you can freely build and share your own code adding this feature!

gante · 2024-05-03T09:58:02Z

@Hellisotherpeople Another way of seeing it is as follows: we are selective about what we add here, and yet I took 2 weeks to get back at this message -- other requests, bugfixes, and general maintenance got in the way.

How much time would it take us, the maintainers, to have reasonable reply times if we were to accept most suggestions? How would you, a user, be able to find the tool that you need, in a vast sea of tools and flags? Curation through evidence ends up helping both sides, especially in the long run 🤗

gante · 2024-05-03T10:02:58Z

@menhguin @kalomaze let me know if you have the bandwidth to open a PR, otherwise I'd be happy to do so 🤗

menhguin · 2024-05-03T14:26:26Z

@menhguin @kalomaze let me know if you have the bandwidth to open a PR, otherwise I'd be happy to do so 🤗

@gante Kalo is away rn, I'm gonna guess the answer is "yes"

Kalo's currently working on Quadratic Sampling: ggml-org/llama.cpp#6445
I'm trying to finish up the actual Min P paper within the next 2 weeks + grinding some leetcode for AI Safety research programs + prepping my research skills for my Hume AI internship.

I'm new + not super familiar with the HF Transformer repo, so it might end up in limbo this month. Honestly I don't mind try to do it before my internship on the 20th, but I'm trying to not break your prod with weird edge case bugs, so IDM you doing it haha.

You can reference the code from other inference engine PRs here:

ggml-org/llama.cpp#3841
vllm-project/vllm#1642
This is the exact implementation I'm referencing for the paper (see sample.hijack.py): https://github.com/oobabooga/text-generation-webui/pull/4449/files#diff-51532f26c3cdbf11129835e199b8f4f621d2ec5968ce71760d149e251ff51526

gante · 2024-05-03T15:26:52Z

@menhguin #30639 I'm double-checking the quality of the outputs, but that should be it!

If you can, have a look at the PR 😉

menhguin · 2024-05-03T18:26:27Z

@gante I've reviewed it. It seems fine at a glance since you mainly referenced the original implementation + changed the relevant HF transformers files.

Main comments are regarding the whole min_p values being opposite of comparable top_p values and how that might confuse users, but that's not a blocker. The functionality is there so it seems OK.

The only part I might sorta worry about is logits_process (#30639 (comment)) due to aforementioned values issue. I can attempt to figure out what that does tmr, if you haven't by then.

gante · 2024-05-09T13:37:44Z

Added 👍

Here's a simple example running on main:

chat = [
    {"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."},
    {"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"}
]

import torch
from transformers import pipeline, set_seed

set_seed(0)

pipe = pipeline("text-generation", "meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.bfloat16, device_map="auto")
response = pipe(chat, max_new_tokens=512, do_sample=True, min_p=0.08, temperature=1.5)
print(response[0]['generated_text'][-1]['content'])

ArthurZucker added the Feature request Request for a new feature label Nov 23, 2023

This was referenced Dec 17, 2023

Add support for some extended parameters of llama.cpp(top_k, top_p, min_p, and repeat_penalty) carlrobertoh/llm-client#7

Merged

Add support for some extended parameters of llama.cpp(top_k, top_p, min_p, and repeat_penalty) carlrobertoh/ProxyAI#311

Merged

catid mentioned this issue Jan 14, 2024

Add LLM-FTC-sampling turboderp-org/exllamav2#276

Closed

suhr mentioned this issue Feb 3, 2024

Add min-p samping Ai00-X/ai00_server#63

Open

gante mentioned this issue May 3, 2024

Generate: add min_p sampling #30639

Merged

gante closed this as completed in #30639 May 9, 2024

LawrenceGrigoryan mentioned this issue May 13, 2024

Min P generation parameter huggingface/text-generation-inference#1885

Closed

EricLBuehler mentioned this issue Jul 25, 2024

Implement min-p sampling EricLBuehler/mistral.rs#625

Merged

vpapanasta mentioned this issue Aug 20, 2024

Add min-p sampling for ollamaChat matlab-deep-learning/llms-with-matlab#77

Merged

TomLucidor mentioned this issue Dec 16, 2024

[Bug]: Failing to run OpenRouter AND Ollama All-Hands-AI/OpenHands#5310

Closed

1 task

sammcj mentioned this issue Dec 20, 2024

Prevent agent-zero from overriding sampling parameters - or provide more options frdel/agent-zero#272

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Min P style sampling - an alternative to Top P/TopK #27670

Min P style sampling - an alternative to Top P/TopK #27670

kalomaze commented Nov 23, 2023 •

edited

Loading

ArthurZucker commented Nov 23, 2023

gante commented Nov 29, 2023

kalomaze commented Nov 30, 2023 •

edited

Loading

gante commented Nov 30, 2023

kalomaze commented Nov 30, 2023 •

edited

Loading

gante commented Nov 30, 2023

kalomaze commented Nov 30, 2023 •

edited

Loading

gante commented Dec 1, 2023 •

edited

Loading

kalomaze commented Dec 1, 2023 •

edited

Loading

gante commented Dec 1, 2023

kalomaze commented Dec 3, 2023 •

edited

Loading

gante commented Dec 5, 2023

menhguin commented Apr 8, 2024 •

edited

Loading

menhguin commented Apr 8, 2024 •

edited

Loading

gante commented Apr 19, 2024 •

edited

Loading

menhguin commented Apr 19, 2024 •

edited

Loading

Hellisotherpeople commented Apr 22, 2024

Hellisotherpeople commented Apr 22, 2024

amyeroberts commented Apr 22, 2024

gante commented May 3, 2024

gante commented May 3, 2024

menhguin commented May 3, 2024 •

edited

Loading

gante commented May 3, 2024 •

edited

Loading

menhguin commented May 3, 2024 •

edited

Loading

gante commented May 9, 2024

Min P style sampling - an alternative to Top P/TopK #27670

Min P style sampling - an alternative to Top P/TopK #27670

Comments

kalomaze commented Nov 23, 2023 • edited Loading

Feature request

Motivation

Your contribution

ArthurZucker commented Nov 23, 2023

gante commented Nov 29, 2023

kalomaze commented Nov 30, 2023 • edited Loading

gante commented Nov 30, 2023

kalomaze commented Nov 30, 2023 • edited Loading

gante commented Nov 30, 2023

kalomaze commented Nov 30, 2023 • edited Loading

gante commented Dec 1, 2023 • edited Loading

kalomaze commented Dec 1, 2023 • edited Loading

gante commented Dec 1, 2023

kalomaze commented Dec 3, 2023 • edited Loading

gante commented Dec 5, 2023

menhguin commented Apr 8, 2024 • edited Loading

menhguin commented Apr 8, 2024 • edited Loading

gante commented Apr 19, 2024 • edited Loading

menhguin commented Apr 19, 2024 • edited Loading

Hellisotherpeople commented Apr 22, 2024

Hellisotherpeople commented Apr 22, 2024

amyeroberts commented Apr 22, 2024

gante commented May 3, 2024

gante commented May 3, 2024

menhguin commented May 3, 2024 • edited Loading

gante commented May 3, 2024 • edited Loading

menhguin commented May 3, 2024 • edited Loading

gante commented May 9, 2024

kalomaze commented Nov 23, 2023 •

edited

Loading

kalomaze commented Nov 30, 2023 •

edited

Loading

kalomaze commented Nov 30, 2023 •

edited

Loading

kalomaze commented Nov 30, 2023 •

edited

Loading

gante commented Dec 1, 2023 •

edited

Loading

kalomaze commented Dec 1, 2023 •

edited

Loading

kalomaze commented Dec 3, 2023 •

edited

Loading

menhguin commented Apr 8, 2024 •

edited

Loading

menhguin commented Apr 8, 2024 •

edited

Loading

gante commented Apr 19, 2024 •

edited

Loading

menhguin commented Apr 19, 2024 •

edited

Loading

menhguin commented May 3, 2024 •

edited

Loading

gante commented May 3, 2024 •

edited

Loading

menhguin commented May 3, 2024 •

edited

Loading