auto-generating sphinx docs

pytorchbot · pytorchbot · commit 1b946588ce82 · 2025-04-18T20:53:39.000Z
diff --git a/main/_sources/tutorials/llama_kd_tutorial.rst.txt b/main/_sources/tutorials/llama_kd_tutorial.rst.txt
@@ -93,9 +93,9 @@ First, make sure that you have downloaded all the model weights. For this exampl
 
 .. code-block:: bash
 
-    tune download meta-llama/Meta-Llama-3.1-8B-Instruct --output-dir /tmp/Meta-Llama-3.1-8B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf_token <HF_TOKEN>
+    tune download meta-llama/Meta-Llama-3.1-8B-Instruct --output-dir /tmp/Meta-Llama-3.1-8B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf-token <HF_TOKEN>
 
-    tune download meta-llama/Llama-3.2-1B-Instruct --output-dir /tmp/Llama-3.2-1B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf_token <HF_TOKEN>
+    tune download meta-llama/Llama-3.2-1B-Instruct --output-dir /tmp/Llama-3.2-1B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf-token <HF_TOKEN>
 
 Then, we will fine-tune the teacher model using LoRA. Based on our experiments and previous work,
 we've found that KD performs better when the teacher model is already fine-tuned on the target dataset.
diff --git a/main/searchindex.js b/main/searchindex.js
diff --git a/main/tutorials/llama_kd_tutorial.html b/main/tutorials/llama_kd_tutorial.html
@@ -546,9 +546,9 @@ <h2>KD recipe in torchtune<a class="headerlink" href="#kd-recipe-in-torchtune" t
 <p>With torchtune, we can easily apply knowledge distillation to Llama3, as well as other LLM model families.
 Let’s take a look at how you could distill a model using torchtune’s <a class="reference external" href="https://github.com/pytorch/torchtune/blob/4234b78b914af23384ce0348f564e2119d107a96/recipes/knowledge_distillation_single_device.py">KD recipe</a>.</p>
 <p>First, make sure that you have downloaded all the model weights. For this example, we’ll use the Llama3.1-8B as teacher and Llama3.2-1B as student.</p>
-<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tune<span class="w"> </span>download<span class="w"> </span>meta-llama/Meta-Llama-3.1-8B-Instruct<span class="w"> </span>--output-dir<span class="w"> </span>/tmp/Meta-Llama-3.1-8B-Instruct<span class="w"> </span>--ignore-patterns<span class="w"> </span><span class="s2">&quot;original/consolidated.00.pth&quot;</span><span class="w"> </span>--hf_token<span class="w"> </span>&lt;HF_TOKEN&gt;
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>tune<span class="w"> </span>download<span class="w"> </span>meta-llama/Meta-Llama-3.1-8B-Instruct<span class="w"> </span>--output-dir<span class="w"> </span>/tmp/Meta-Llama-3.1-8B-Instruct<span class="w"> </span>--ignore-patterns<span class="w"> </span><span class="s2">&quot;original/consolidated.00.pth&quot;</span><span class="w"> </span>--hf-token<span class="w"> </span>&lt;HF_TOKEN&gt;
 
-tune<span class="w"> </span>download<span class="w"> </span>meta-llama/Llama-3.2-1B-Instruct<span class="w"> </span>--output-dir<span class="w"> </span>/tmp/Llama-3.2-1B-Instruct<span class="w"> </span>--ignore-patterns<span class="w"> </span><span class="s2">&quot;original/consolidated.00.pth&quot;</span><span class="w"> </span>--hf_token<span class="w"> </span>&lt;HF_TOKEN&gt;
+tune<span class="w"> </span>download<span class="w"> </span>meta-llama/Llama-3.2-1B-Instruct<span class="w"> </span>--output-dir<span class="w"> </span>/tmp/Llama-3.2-1B-Instruct<span class="w"> </span>--ignore-patterns<span class="w"> </span><span class="s2">&quot;original/consolidated.00.pth&quot;</span><span class="w"> </span>--hf-token<span class="w"> </span>&lt;HF_TOKEN&gt;
 </pre></div>
 </div>
 <p>Then, we will fine-tune the teacher model using LoRA. Based on our experiments and previous work,