|
| 1 | + |
| 2 | +<span id="faq"></span><h1><span class="yiyi-st" id="yiyi-50">Frequently Asked Questions (FAQ)</span></h1> |
| 3 | + <blockquote> |
| 4 | + <p>原文:<a href="http://pandas.pydata.org/pandas-docs/stable/faq.html">http://pandas.pydata.org/pandas-docs/stable/faq.html</a></p> |
| 5 | + <p>译者:<a href="https://github.com/wizardforcel">飞龙</a> <a href="http://usyiyi.cn/">UsyiyiCN</a></p> |
| 6 | + <p>校对:(虚位以待)</p> |
| 7 | + </blockquote> |
| 8 | + |
| 9 | +<div class="section" id="dataframe-memory-usage"> |
| 10 | +<span id="df-memory-usage"></span><h2><span class="yiyi-st" id="yiyi-51">DataFrame memory usage</span></h2> |
| 11 | +<p><span class="yiyi-st" id="yiyi-52">对于pandas版本0.15.0,当使用<code class="docutils literal"><span class="pre">info</span></code>方法访问结构数据时,将输出结构数据(包括索引)的内存使用情况。</span><span class="yiyi-st" id="yiyi-53">配置选项<code class="docutils literal"><span class="pre">display.memory_usage</span></code>(请参阅<a class="reference internal" href="options.html#options"><span class="std std-ref">Options and Settings</span></a>)指定在调用<code class="docutils literal"><span class="pre">df.info()</span></code></span></p> |
| 12 | +<p><span class="yiyi-st" id="yiyi-54">例如,调用<code class="docutils literal"><span class="pre">df.info()</span></code>时会显示以下结构数据的内存使用情况:</span></p> |
| 13 | +<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [1]: </span><span class="n">dtypes</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'int64'</span><span class="p">,</span> <span class="s1">'float64'</span><span class="p">,</span> <span class="s1">'datetime64[ns]'</span><span class="p">,</span> <span class="s1">'timedelta64[ns]'</span><span class="p">,</span> |
| 14 | +<span class="gp"> ...:</span> <span class="s1">'complex128'</span><span class="p">,</span> <span class="s1">'object'</span><span class="p">,</span> <span class="s1">'bool'</span><span class="p">]</span> |
| 15 | +<span class="gp"> ...:</span> |
| 16 | + |
| 17 | +<span class="gp">In [2]: </span><span class="n">n</span> <span class="o">=</span> <span class="mi">5000</span> |
| 18 | + |
| 19 | +<span class="gp">In [3]: </span><span class="n">data</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">([</span> <span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">t</span><span class="p">))</span> |
| 20 | +<span class="gp"> ...:</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">dtypes</span><span class="p">])</span> |
| 21 | +<span class="gp"> ...:</span> |
| 22 | + |
| 23 | +<span class="gp">In [4]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> |
| 24 | + |
| 25 | +<span class="gp">In [5]: </span><span class="n">df</span><span class="p">[</span><span class="s1">'categorical'</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s1">'object'</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s1">'category'</span><span class="p">)</span> |
| 26 | + |
| 27 | +<span class="gp">In [6]: </span><span class="n">df</span><span class="o">.</span><span class="n">info</span><span class="p">()</span> |
| 28 | +<span class="go"><class 'pandas.core.frame.DataFrame'></span> |
| 29 | +<span class="go">RangeIndex: 5000 entries, 0 to 4999</span> |
| 30 | +<span class="go">Data columns (total 8 columns):</span> |
| 31 | +<span class="go">bool 5000 non-null bool</span> |
| 32 | +<span class="go">complex128 5000 non-null complex128</span> |
| 33 | +<span class="go">datetime64[ns] 5000 non-null datetime64[ns]</span> |
| 34 | +<span class="go">float64 5000 non-null float64</span> |
| 35 | +<span class="go">int64 5000 non-null int64</span> |
| 36 | +<span class="go">object 5000 non-null object</span> |
| 37 | +<span class="go">timedelta64[ns] 5000 non-null timedelta64[ns]</span> |
| 38 | +<span class="go">categorical 5000 non-null category</span> |
| 39 | +<span class="go">dtypes: bool(1), category(1), complex128(1), datetime64[ns](1), float64(1), int64(1), object(1), timedelta64[ns](1)</span> |
| 40 | +<span class="go">memory usage: 284.1+ KB</span> |
| 41 | +</pre></div> |
| 42 | +</div> |
| 43 | +<p><span class="yiyi-st" id="yiyi-55"><code class="docutils literal"><span class="pre">+</span></code>符号表示真正的内存使用率可能更高,因为pandas不会计算<code class="docutils literal"><span class="pre">dtype=object</span></code>的列中使用的内存。</span></p> |
| 44 | +<div class="versionadded"> |
| 45 | +<p><span class="yiyi-st" id="yiyi-56"><span class="versionmodified">版本0.17.1中的新功能。</span></span></p> |
| 46 | +</div> |
| 47 | +<p><span class="yiyi-st" id="yiyi-57">传递<code class="docutils literal"><span class="pre">memory_usage='deep'</span></code>参数,将输出更准确的内存使用情况报告,包含结构数据内存的完全使用情况。</span><span class="yiyi-st" id="yiyi-58">这是参数是可选的,因为做更深入的内存检查需要付出更多。</span></p> |
| 48 | +<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [7]: </span><span class="n">df</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="n">memory_usage</span><span class="o">=</span><span class="s1">'deep'</span><span class="p">)</span> |
| 49 | +<span class="go"><class 'pandas.core.frame.DataFrame'></span> |
| 50 | +<span class="go">RangeIndex: 5000 entries, 0 to 4999</span> |
| 51 | +<span class="go">Data columns (total 8 columns):</span> |
| 52 | +<span class="go">bool 5000 non-null bool</span> |
| 53 | +<span class="go">complex128 5000 non-null complex128</span> |
| 54 | +<span class="go">datetime64[ns] 5000 non-null datetime64[ns]</span> |
| 55 | +<span class="go">float64 5000 non-null float64</span> |
| 56 | +<span class="go">int64 5000 non-null int64</span> |
| 57 | +<span class="go">object 5000 non-null object</span> |
| 58 | +<span class="go">timedelta64[ns] 5000 non-null timedelta64[ns]</span> |
| 59 | +<span class="go">categorical 5000 non-null category</span> |
| 60 | +<span class="go">dtypes: bool(1), category(1), complex128(1), datetime64[ns](1), float64(1), int64(1), object(1), timedelta64[ns](1)</span> |
| 61 | +<span class="go">memory usage: 401.2 KB</span> |
| 62 | +</pre></div> |
| 63 | +</div> |
| 64 | +<p><span class="yiyi-st" id="yiyi-59">默认情况下,display选项设置为<code class="docutils literal"><span class="pre">True</span></code>,但是可以在调用<code class="docutils literal"><span class="pre">df.info()</span></code>时传递<code class="docutils literal"><span class="pre">memory_usage</span></code>参数来显式覆盖。</span></p> |
| 65 | +<p><span class="yiyi-st" id="yiyi-60">通过调用<code class="docutils literal"><span class="pre">memory_usage</span></code>方法可以找到每列的内存使用情况。</span><span class="yiyi-st" id="yiyi-61">这将返回一个具有以字节表示的列的名称和内存使用情况的索引。</span><span class="yiyi-st" id="yiyi-62">对于上面的数据帧,可以使用memory_usage方法找到每列数据的内存使用情况和结构数据的总内存使用情况:</span></p> |
| 66 | +<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [8]: </span><span class="n">df</span><span class="o">.</span><span class="n">memory_usage</span><span class="p">()</span> |
| 67 | +<span class="gr">Out[8]: </span> |
| 68 | +<span class="go">Index 72</span> |
| 69 | +<span class="go">bool 5000</span> |
| 70 | +<span class="go">complex128 80000</span> |
| 71 | +<span class="go">datetime64[ns] 40000</span> |
| 72 | +<span class="go">float64 40000</span> |
| 73 | +<span class="go">int64 40000</span> |
| 74 | +<span class="go">object 40000</span> |
| 75 | +<span class="go">timedelta64[ns] 40000</span> |
| 76 | +<span class="go">categorical 5800</span> |
| 77 | +<span class="go">dtype: int64</span> |
| 78 | + |
| 79 | +<span class="c"># total memory usage of dataframe</span> |
| 80 | +<span class="gp">In [9]: </span><span class="n">df</span><span class="o">.</span><span class="n">memory_usage</span><span class="p">()</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span> |
| 81 | +<span class="gr">Out[9]: </span><span class="mi">290872</span> |
| 82 | +</pre></div> |
| 83 | +</div> |
| 84 | +<p><span class="yiyi-st" id="yiyi-63">默认情况下,结构数据索引的内存使用情况显示在返回的Series中,可以通过传递<code class="docutils literal"><span class="pre">index=False</span></code>参数来去除索引的内存使用情况:</span></p> |
| 85 | +<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [10]: </span><span class="n">df</span><span class="o">.</span><span class="n">memory_usage</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> |
| 86 | +<span class="gr">Out[10]: </span> |
| 87 | +<span class="go">bool 5000</span> |
| 88 | +<span class="go">complex128 80000</span> |
| 89 | +<span class="go">datetime64[ns] 40000</span> |
| 90 | +<span class="go">float64 40000</span> |
| 91 | +<span class="go">int64 40000</span> |
| 92 | +<span class="go">object 40000</span> |
| 93 | +<span class="go">timedelta64[ns] 40000</span> |
| 94 | +<span class="go">categorical 5800</span> |
| 95 | +<span class="go">dtype: int64</span> |
| 96 | +</pre></div> |
| 97 | +</div> |
| 98 | +<p><span class="yiyi-st" id="yiyi-64"><code class="docutils literal"><span class="pre">info</span></code>方法显示的内存使用情况利用<code class="docutils literal"><span class="pre">memory_usage</span></code>方法来确定结构数据的内存使用情况,同时还以人类可读单位格式化输出(base-2表示;即1KB = 1024字节)。</span></p> |
| 99 | +<p><span class="yiyi-st" id="yiyi-65">另请参见<a class="reference internal" href="categorical.html#categorical-memory"><span class="std std-ref">Categorical Memory Usage</span></a>。</span></p> |
| 100 | +</div> |
| 101 | +<div class="section" id="byte-ordering-issues"> |
| 102 | +<h2><span class="yiyi-st" id="yiyi-66">Byte-Ordering Issues</span></h2> |
| 103 | +<p><span class="yiyi-st" id="yiyi-67">有时,您可能必须处理在机器上创建的数据具有与运行Python不同的字节顺序。</span><span class="yiyi-st" id="yiyi-68">要处理这个问题,应该使用类似于以下内容的方法将底层NumPy数组转换为本地系统字节顺序<em>之后</em>传递给Series / DataFrame / Panel构造函数:</span></p> |
| 104 | +<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [11]: </span><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">)),</span> <span class="s1">'>i4'</span><span class="p">)</span> <span class="c1"># big endian</span> |
| 105 | + |
| 106 | +<span class="gp">In [12]: </span><span class="n">newx</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">byteswap</span><span class="p">()</span><span class="o">.</span><span class="n">newbyteorder</span><span class="p">()</span> <span class="c1"># force native byteorder</span> |
| 107 | + |
| 108 | +<span class="gp">In [13]: </span><span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">(</span><span class="n">newx</span><span class="p">)</span> |
| 109 | +</pre></div> |
| 110 | +</div> |
| 111 | +<p><span class="yiyi-st" id="yiyi-69">有关详细信息,请参阅<a class="reference external" href="http://docs.scipy.org/doc/numpy/user/basics.byteswapping.html">有关字节顺序的NumPy文档</a>。</span></p> |
| 112 | +</div> |
| 113 | +<div class="section" id="visualizing-data-in-qt-applications"> |
| 114 | +<h2><span class="yiyi-st" id="yiyi-70">Visualizing Data in Qt applications</span></h2> |
| 115 | +<p><span class="yiyi-st" id="yiyi-71">在pandas中没有这种可视化的支持。</span><span class="yiyi-st" id="yiyi-72">但是,外部模块<a class="reference external" href="https://github.com/datalyze-solutions/pandas-qt">pandas-qt</a>提供这样的功能。</span></p> |
| 116 | +</div> |
0 commit comments