Skip to content

Commit 1b94658

Browse files
committed
auto-generating sphinx docs
1 parent 13e8c0d commit 1b94658

File tree

3 files changed

+5
-5
lines changed

3 files changed

+5
-5
lines changed

main/_sources/tutorials/llama_kd_tutorial.rst.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -93,9 +93,9 @@ First, make sure that you have downloaded all the model weights. For this exampl
9393

9494
.. code-block:: bash
9595
96-
tune download meta-llama/Meta-Llama-3.1-8B-Instruct --output-dir /tmp/Meta-Llama-3.1-8B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf_token <HF_TOKEN>
96+
tune download meta-llama/Meta-Llama-3.1-8B-Instruct --output-dir /tmp/Meta-Llama-3.1-8B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf-token <HF_TOKEN>
9797
98-
tune download meta-llama/Llama-3.2-1B-Instruct --output-dir /tmp/Llama-3.2-1B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf_token <HF_TOKEN>
98+
tune download meta-llama/Llama-3.2-1B-Instruct --output-dir /tmp/Llama-3.2-1B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf-token <HF_TOKEN>
9999
100100
Then, we will fine-tune the teacher model using LoRA. Based on our experiments and previous work,
101101
we've found that KD performs better when the teacher model is already fine-tuned on the target dataset.

main/searchindex.js

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

main/tutorials/llama_kd_tutorial.html

+2-2
Original file line numberDiff line numberDiff line change
@@ -546,9 +546,9 @@ <h2>KD recipe in torchtune<a class="headerlink" href="#kd-recipe-in-torchtune" t
546546
<p>With torchtune, we can easily apply knowledge distillation to Llama3, as well as other LLM model families.
547547
Let’s take a look at how you could distill a model using torchtune’s <a class="reference external" href="https://github.com/pytorch/torchtune/blob/4234b78b914af23384ce0348f564e2119d107a96/recipes/knowledge_distillation_single_device.py">KD recipe</a>.</p>
548548
<p>First, make sure that you have downloaded all the model weights. For this example, we’ll use the Llama3.1-8B as teacher and Llama3.2-1B as student.</p>
549-
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tune<span class="w"> </span>download<span class="w"> </span>meta-llama/Meta-Llama-3.1-8B-Instruct<span class="w"> </span>--output-dir<span class="w"> </span>/tmp/Meta-Llama-3.1-8B-Instruct<span class="w"> </span>--ignore-patterns<span class="w"> </span><span class="s2">&quot;original/consolidated.00.pth&quot;</span><span class="w"> </span>--hf_token<span class="w"> </span>&lt;HF_TOKEN&gt;
549+
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tune<span class="w"> </span>download<span class="w"> </span>meta-llama/Meta-Llama-3.1-8B-Instruct<span class="w"> </span>--output-dir<span class="w"> </span>/tmp/Meta-Llama-3.1-8B-Instruct<span class="w"> </span>--ignore-patterns<span class="w"> </span><span class="s2">&quot;original/consolidated.00.pth&quot;</span><span class="w"> </span>--hf-token<span class="w"> </span>&lt;HF_TOKEN&gt;
550550

551-
tune<span class="w"> </span>download<span class="w"> </span>meta-llama/Llama-3.2-1B-Instruct<span class="w"> </span>--output-dir<span class="w"> </span>/tmp/Llama-3.2-1B-Instruct<span class="w"> </span>--ignore-patterns<span class="w"> </span><span class="s2">&quot;original/consolidated.00.pth&quot;</span><span class="w"> </span>--hf_token<span class="w"> </span>&lt;HF_TOKEN&gt;
551+
tune<span class="w"> </span>download<span class="w"> </span>meta-llama/Llama-3.2-1B-Instruct<span class="w"> </span>--output-dir<span class="w"> </span>/tmp/Llama-3.2-1B-Instruct<span class="w"> </span>--ignore-patterns<span class="w"> </span><span class="s2">&quot;original/consolidated.00.pth&quot;</span><span class="w"> </span>--hf-token<span class="w"> </span>&lt;HF_TOKEN&gt;
552552
</pre></div>
553553
</div>
554554
<p>Then, we will fine-tune the teacher model using LoRA. Based on our experiments and previous work,

0 commit comments

Comments
 (0)