Skip to content

Commit 8404aeb

Browse files
committed
[Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket. == Background == Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads. By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to. This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market. == The problem == The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std::thread::hardware_concurrency() -- which can only return processors from the current "processor group". == The changes in this patch == To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO). When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead. The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used. When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once. Differential Revision: https://reviews.llvm.org/D71775
1 parent d9049e8 commit 8404aeb

37 files changed

+406
-143
lines changed

clang-tools-extra/clang-doc/tool/ClangDocMain.cpp

+1-2
Original file line numberDiff line numberDiff line change
@@ -268,8 +268,7 @@ int main(int argc, const char **argv) {
268268
Error = false;
269269
llvm::sys::Mutex IndexMutex;
270270
// ExecutorConcurrency is a flag exposed by AllTUsExecution.h
271-
llvm::ThreadPool Pool(ExecutorConcurrency == 0 ? llvm::hardware_concurrency()
272-
: ExecutorConcurrency);
271+
llvm::ThreadPool Pool(llvm::hardware_concurrency(ExecutorConcurrency));
273272
for (auto &Group : USRToBitcode) {
274273
Pool.async([&]() {
275274
std::vector<std::unique_ptr<doc::Info>> Infos;

clang-tools-extra/clangd/TUScheduler.cpp

+1-7
Original file line numberDiff line numberDiff line change
@@ -842,13 +842,7 @@ std::string renderTUAction(const TUAction &Action) {
842842
} // namespace
843843

844844
unsigned getDefaultAsyncThreadsCount() {
845-
unsigned HardwareConcurrency = llvm::heavyweight_hardware_concurrency();
846-
// heavyweight_hardware_concurrency may fall back to hardware_concurrency.
847-
// C++ standard says that hardware_concurrency() may return 0; fallback to 1
848-
// worker thread in that case.
849-
if (HardwareConcurrency == 0)
850-
return 1;
851-
return HardwareConcurrency;
845+
return llvm::heavyweight_hardware_concurrency().compute_thread_count();
852846
}
853847

854848
FileStatus TUStatus::render(PathRef File) const {

clang-tools-extra/clangd/index/Background.cpp

+3-2
Original file line numberDiff line numberDiff line change
@@ -148,9 +148,10 @@ BackgroundIndex::BackgroundIndex(
148148
CDB.watch([&](const std::vector<std::string> &ChangedFiles) {
149149
enqueue(ChangedFiles);
150150
})) {
151-
assert(ThreadPoolSize > 0 && "Thread pool size can't be zero.");
151+
assert(Rebuilder.TUsBeforeFirstBuild > 0 &&
152+
"Thread pool size can't be zero.");
152153
assert(this->IndexStorageFactory && "Storage factory can not be null!");
153-
for (unsigned I = 0; I < ThreadPoolSize; ++I) {
154+
for (unsigned I = 0; I < Rebuilder.TUsBeforeFirstBuild; ++I) {
154155
ThreadPool.runAsync("background-worker-" + llvm::Twine(I + 1), [this] {
155156
WithContext Ctx(this->BackgroundContext.clone());
156157
Queue.work([&] { Rebuilder.idle(); });

clang-tools-extra/clangd/index/Background.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ class BackgroundIndex : public SwapIndex {
135135
Context BackgroundContext, const FileSystemProvider &,
136136
const GlobalCompilationDatabase &CDB,
137137
BackgroundIndexStorage::Factory IndexStorageFactory,
138-
size_t ThreadPoolSize = llvm::heavyweight_hardware_concurrency(),
138+
size_t ThreadPoolSize = 0, // 0 = use all hardware threads
139139
std::function<void(BackgroundQueue::Stats)> OnProgress = nullptr);
140140
~BackgroundIndex(); // Blocks while the current task finishes.
141141

clang-tools-extra/clangd/index/BackgroundRebuild.h

+3-1
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,9 @@ class BackgroundIndexRebuilder {
4949
public:
5050
BackgroundIndexRebuilder(SwapIndex *Target, FileSymbols *Source,
5151
unsigned Threads)
52-
: TUsBeforeFirstBuild(Threads), Target(Target), Source(Source) {}
52+
: TUsBeforeFirstBuild(llvm::heavyweight_hardware_concurrency(Threads)
53+
.compute_thread_count()),
54+
Target(Target), Source(Source) {}
5355

5456
// Called to indicate a TU has been indexed.
5557
// May rebuild, if enough TUs have been indexed.

clang/lib/Tooling/AllTUsExecution.cpp

+1-2
Original file line numberDiff line numberDiff line change
@@ -114,8 +114,7 @@ llvm::Error AllTUsToolExecutor::execute(
114114
auto &Action = Actions.front();
115115

116116
{
117-
llvm::ThreadPool Pool(ThreadCount == 0 ? llvm::hardware_concurrency()
118-
: ThreadCount);
117+
llvm::ThreadPool Pool(llvm::hardware_concurrency(ThreadCount));
119118
for (std::string File : Files) {
120119
Pool.async(
121120
[&](std::string Path) {

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp

+2-1
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,8 @@ DependencyScanningFilesystemSharedCache::
106106
// sharding gives a performance edge by reducing the lock contention.
107107
// FIXME: A better heuristic might also consider the OS to account for
108108
// the different cost of lock contention on different OSes.
109-
NumShards = std::max(2u, llvm::hardware_concurrency() / 4);
109+
NumShards =
110+
std::max(2u, llvm::hardware_concurrency().compute_thread_count() / 4);
110111
CacheShards = std::make_unique<CacheShard[]>(NumShards);
111112
}
112113

clang/tools/clang-scan-deps/ClangScanDeps.cpp

+4-10
Original file line numberDiff line numberDiff line change
@@ -485,15 +485,9 @@ int main(int argc, const char **argv) {
485485

486486
DependencyScanningService Service(ScanMode, Format, ReuseFileManager,
487487
SkipExcludedPPRanges);
488-
#if LLVM_ENABLE_THREADS
489-
unsigned NumWorkers =
490-
NumThreads == 0 ? llvm::hardware_concurrency() : NumThreads;
491-
#else
492-
unsigned NumWorkers = 1;
493-
#endif
494-
llvm::ThreadPool Pool(NumWorkers);
488+
llvm::ThreadPool Pool(llvm::hardware_concurrency(NumThreads));
495489
std::vector<std::unique_ptr<DependencyScanningTool>> WorkerTools;
496-
for (unsigned I = 0; I < NumWorkers; ++I)
490+
for (unsigned I = 0; I < Pool.getThreadCount(); ++I)
497491
WorkerTools.push_back(std::make_unique<DependencyScanningTool>(Service));
498492

499493
std::vector<SingleCommandCompilationDatabase> Inputs;
@@ -508,9 +502,9 @@ int main(int argc, const char **argv) {
508502

509503
if (Verbose) {
510504
llvm::outs() << "Running clang-scan-deps on " << Inputs.size()
511-
<< " files using " << NumWorkers << " workers\n";
505+
<< " files using " << Pool.getThreadCount() << " workers\n";
512506
}
513-
for (unsigned I = 0; I < NumWorkers; ++I) {
507+
for (unsigned I = 0; I < Pool.getThreadCount(); ++I) {
514508
Pool.async([I, &Lock, &Index, &Inputs, &HadErrors, &FD, &WorkerTools,
515509
&DependencyOS, &Errs]() {
516510
llvm::StringSet<> AlreadySeenModules;

lld/ELF/SyntheticSections.cpp

+4-4
Original file line numberDiff line numberDiff line change
@@ -2747,8 +2747,8 @@ createSymbols(ArrayRef<std::vector<GdbIndexSection::NameAttrEntry>> nameAttrs,
27472747
size_t numShards = 32;
27482748
size_t concurrency = 1;
27492749
if (threadsEnabled)
2750-
concurrency =
2751-
std::min<size_t>(PowerOf2Floor(hardware_concurrency()), numShards);
2750+
concurrency = std::min<size_t>(
2751+
hardware_concurrency().compute_thread_count(), numShards);
27522752

27532753
// A sharded map to uniquify symbols by name.
27542754
std::vector<DenseMap<CachedHashStringRef, size_t>> map(numShards);
@@ -3191,8 +3191,8 @@ void MergeNoTailSection::finalizeContents() {
31913191
// operations in the following tight loop.
31923192
size_t concurrency = 1;
31933193
if (threadsEnabled)
3194-
concurrency =
3195-
std::min<size_t>(PowerOf2Floor(hardware_concurrency()), numShards);
3194+
concurrency = std::min<size_t>(
3195+
hardware_concurrency().compute_thread_count(), numShards);
31963196

31973197
// Add section pieces to the builders.
31983198
parallelForEachN(0, concurrency, [&](size_t threadId) {

llvm/include/llvm/LTO/LTO.h

+2-1
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,8 @@ using ThinBackend = std::function<std::unique_ptr<ThinBackendProc>(
227227
AddStreamFn AddStream, NativeObjectCache Cache)>;
228228

229229
/// This ThinBackend runs the individual backend jobs in-process.
230-
ThinBackend createInProcessThinBackend(unsigned ParallelismLevel);
230+
/// The default value means to use one job per hardware core (not hyper-thread).
231+
ThinBackend createInProcessThinBackend(unsigned ParallelismLevel = 0);
231232

232233
/// This ThinBackend writes individual module indexes to files, instead of
233234
/// running the individual backend jobs. This backend is for distributed builds

llvm/include/llvm/Support/ThreadPool.h

+11-6
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,9 @@
1313
#ifndef LLVM_SUPPORT_THREAD_POOL_H
1414
#define LLVM_SUPPORT_THREAD_POOL_H
1515

16+
#include "llvm/ADT/BitVector.h"
1617
#include "llvm/Config/llvm-config.h"
18+
#include "llvm/Support/Threading.h"
1719
#include "llvm/Support/thread.h"
1820

1921
#include <future>
@@ -38,12 +40,11 @@ class ThreadPool {
3840
using TaskTy = std::function<void()>;
3941
using PackagedTaskTy = std::packaged_task<void()>;
4042

41-
/// Construct a pool with the number of threads found by
42-
/// hardware_concurrency().
43-
ThreadPool();
44-
45-
/// Construct a pool of \p ThreadCount threads
46-
ThreadPool(unsigned ThreadCount);
43+
/// Construct a pool using the hardware strategy \p S for mapping hardware
44+
/// execution resources (threads, cores, CPUs)
45+
/// Defaults to using the maximum execution resources in the system, but
46+
/// excluding any resources contained in the affinity mask.
47+
ThreadPool(ThreadPoolStrategy S = hardware_concurrency());
4748

4849
/// Blocking destructor: the pool will wait for all the threads to complete.
4950
~ThreadPool();
@@ -68,6 +69,8 @@ class ThreadPool {
6869
/// It is an error to try to add new tasks while blocking on this call.
6970
void wait();
7071

72+
unsigned getThreadCount() const { return ThreadCount; }
73+
7174
private:
7275
/// Asynchronous submission of a task to the pool. The returned future can be
7376
/// used to wait for the task to finish and is *non-blocking* on destruction.
@@ -94,6 +97,8 @@ class ThreadPool {
9497
/// Signal for the destruction of the pool, asking thread to exit.
9598
bool EnableFlag;
9699
#endif
100+
101+
unsigned ThreadCount;
97102
};
98103
}
99104

llvm/include/llvm/Support/Threading.h

+55-14
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
#ifndef LLVM_SUPPORT_THREADING_H
1515
#define LLVM_SUPPORT_THREADING_H
1616

17+
#include "llvm/ADT/BitVector.h"
1718
#include "llvm/ADT/FunctionExtras.h"
1819
#include "llvm/ADT/SmallVector.h"
1920
#include "llvm/Config/llvm-config.h" // for LLVM_ON_UNIX
@@ -143,20 +144,52 @@ void llvm_execute_on_thread_async(
143144
#endif
144145
}
145146

146-
/// Get the amount of currency to use for tasks requiring significant
147-
/// memory or other resources. Currently based on physical cores, if
148-
/// available for the host system, otherwise falls back to
149-
/// thread::hardware_concurrency().
150-
/// Returns 1 when LLVM is configured with LLVM_ENABLE_THREADS=OFF
151-
unsigned heavyweight_hardware_concurrency();
152-
153-
/// Get the number of threads that the current program can execute
154-
/// concurrently. On some systems std::thread::hardware_concurrency() returns
155-
/// the total number of cores, without taking affinity into consideration.
156-
/// Returns 1 when LLVM is configured with LLVM_ENABLE_THREADS=OFF.
157-
/// Fallback to std::thread::hardware_concurrency() if sched_getaffinity is
158-
/// not available.
159-
unsigned hardware_concurrency();
147+
/// This tells how a thread pool will be used
148+
class ThreadPoolStrategy {
149+
public:
150+
// The default value (0) means all available threads should be used,
151+
// excluding affinity mask. If set, this value only represents a suggested
152+
// high bound, the runtime might choose a lower value (not higher).
153+
unsigned ThreadsRequested = 0;
154+
155+
// If SMT is active, use hyper threads. If false, there will be only one
156+
// std::thread per core.
157+
bool UseHyperThreads = true;
158+
159+
/// Retrieves the max available threads for the current strategy. This
160+
/// accounts for affinity masks and takes advantage of all CPU sockets.
161+
unsigned compute_thread_count() const;
162+
163+
/// Assign the current thread to an ideal hardware CPU or NUMA node. In a
164+
/// multi-socket system, this ensures threads are assigned to all CPU
165+
/// sockets. \p ThreadPoolNum represents a number bounded by [0,
166+
/// compute_thread_count()).
167+
void apply_thread_strategy(unsigned ThreadPoolNum) const;
168+
};
169+
170+
/// Returns a thread strategy for tasks requiring significant memory or other
171+
/// resources. To be used for workloads where hardware_concurrency() proves to
172+
/// be less efficient. Avoid this strategy if doing lots of I/O. Currently
173+
/// based on physical cores, if available for the host system, otherwise falls
174+
/// back to hardware_concurrency(). Returns 1 when LLVM is configured with
175+
/// LLVM_ENABLE_THREADS = OFF
176+
inline ThreadPoolStrategy
177+
heavyweight_hardware_concurrency(unsigned ThreadCount = 0) {
178+
ThreadPoolStrategy S;
179+
S.UseHyperThreads = false;
180+
S.ThreadsRequested = ThreadCount;
181+
return S;
182+
}
183+
184+
/// Returns a default thread strategy where all available hardware ressources
185+
/// are to be used, except for those initially excluded by an affinity mask.
186+
/// This function takes affinity into consideration. Returns 1 when LLVM is
187+
/// configured with LLVM_ENABLE_THREADS=OFF.
188+
inline ThreadPoolStrategy hardware_concurrency(unsigned ThreadCount = 0) {
189+
ThreadPoolStrategy S;
190+
S.ThreadsRequested = ThreadCount;
191+
return S;
192+
}
160193

161194
/// Return the current thread id, as used in various OS system calls.
162195
/// Note that not all platforms guarantee that the value returned will be
@@ -184,6 +217,14 @@ void llvm_execute_on_thread_async(
184217
/// the operation succeeded or failed is returned.
185218
void get_thread_name(SmallVectorImpl<char> &Name);
186219

220+
/// Returns a mask that represents on which hardware thread, core, CPU, NUMA
221+
/// group, the calling thread can be executed. On Windows, threads cannot
222+
/// cross CPU boundaries.
223+
llvm::BitVector get_thread_affinity_mask();
224+
225+
/// Returns how many physical CPUs or NUMA groups the system has.
226+
unsigned get_cpus();
227+
187228
enum class ThreadPriority {
188229
Background = 0,
189230
Default = 1,

llvm/lib/CodeGen/ParallelCG.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ std::unique_ptr<Module> llvm::splitCodeGen(
5151
// Create ThreadPool in nested scope so that threads will be joined
5252
// on destruction.
5353
{
54-
ThreadPool CodegenThreadPool(OSs.size());
54+
ThreadPool CodegenThreadPool(hardware_concurrency(OSs.size()));
5555
int ThreadCount = 0;
5656

5757
SplitModule(

llvm/lib/DWARFLinker/DWARFLinker.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -2446,7 +2446,7 @@ bool DWARFLinker::link() {
24462446
}
24472447
EmitLambda();
24482448
} else {
2449-
ThreadPool Pool(2);
2449+
ThreadPool Pool(hardware_concurrency(2));
24502450
Pool.async(AnalyzeAll);
24512451
Pool.async(CloneAll);
24522452
Pool.wait();

llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -445,7 +445,7 @@ Error DwarfTransformer::convert(uint32_t NumThreads) {
445445

446446
// Now parse all DIEs in case we have cross compile unit references in a
447447
// thread pool.
448-
ThreadPool pool(NumThreads);
448+
ThreadPool pool(hardware_concurrency(NumThreads));
449449
for (const auto &CU : DICtx.compile_units())
450450
pool.async([&CU]() { CU->getUnitDIE(false /*CUDieOnly*/); });
451451
pool.wait();

llvm/lib/ExecutionEngine/Orc/LLJIT.cpp

+2-1
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,8 @@ LLJIT::LLJIT(LLJITBuilderState &S, Error &Err)
157157

158158
if (S.NumCompileThreads > 0) {
159159
TransformLayer->setCloneToNewContextOnEmit(true);
160-
CompileThreads = std::make_unique<ThreadPool>(S.NumCompileThreads);
160+
CompileThreads =
161+
std::make_unique<ThreadPool>(hardware_concurrency(S.NumCompileThreads));
161162
ES->setDispatchMaterialization(
162163
[this](JITDylib &JD, std::unique_ptr<MaterializationUnit> MU) {
163164
// FIXME: Switch to move capture once we have c++14.

llvm/lib/LTO/LTO.cpp

+3-3
Original file line numberDiff line numberDiff line change
@@ -477,8 +477,7 @@ LTO::RegularLTOState::RegularLTOState(unsigned ParallelCodeGenParallelismLevel,
477477
LTO::ThinLTOState::ThinLTOState(ThinBackend Backend)
478478
: Backend(Backend), CombinedIndex(/*HaveGVs*/ false) {
479479
if (!Backend)
480-
this->Backend =
481-
createInProcessThinBackend(llvm::heavyweight_hardware_concurrency());
480+
this->Backend = createInProcessThinBackend();
482481
}
483482

484483
LTO::LTO(Config Conf, ThinBackend Backend,
@@ -1095,7 +1094,8 @@ class InProcessThinBackend : public ThinBackendProc {
10951094
const StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,
10961095
AddStreamFn AddStream, NativeObjectCache Cache)
10971096
: ThinBackendProc(Conf, CombinedIndex, ModuleToDefinedGVSummaries),
1098-
BackendThreadPool(ThinLTOParallelismLevel),
1097+
BackendThreadPool(
1098+
heavyweight_hardware_concurrency(ThinLTOParallelismLevel)),
10991099
AddStream(std::move(AddStream)), Cache(std::move(Cache)) {
11001100
for (auto &Name : CombinedIndex.cfiFunctionDefs())
11011101
CfiFunctionDefs.insert(

llvm/lib/LTO/LTOBackend.cpp

+2-1
Original file line numberDiff line numberDiff line change
@@ -375,7 +375,8 @@ void codegen(const Config &Conf, TargetMachine *TM, AddStreamFn AddStream,
375375
void splitCodeGen(const Config &C, TargetMachine *TM, AddStreamFn AddStream,
376376
unsigned ParallelCodeGenParallelismLevel,
377377
std::unique_ptr<Module> Mod) {
378-
ThreadPool CodegenThreadPool(ParallelCodeGenParallelismLevel);
378+
ThreadPool CodegenThreadPool(
379+
heavyweight_hardware_concurrency(ParallelCodeGenParallelismLevel));
379380
unsigned ThreadCount = 0;
380381
const Target *T = &TM->getTarget();
381382

llvm/lib/LTO/ThinLTOCodeGenerator.cpp

+3-3
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,8 @@ extern cl::opt<std::string> RemarksFormat;
8080

8181
namespace {
8282

83-
static cl::opt<int>
84-
ThreadCount("threads", cl::init(llvm::heavyweight_hardware_concurrency()));
83+
// Default to using one job per hardware core in the system
84+
static cl::opt<int> ThreadCount("threads", cl::init(0));
8585

8686
// Simple helper to save temporary files for debug.
8787
static void saveTempBitcode(const Module &TheModule, StringRef TempDir,
@@ -1042,7 +1042,7 @@ void ThinLTOCodeGenerator::run() {
10421042

10431043
// Parallel optimizer + codegen
10441044
{
1045-
ThreadPool Pool(ThreadCount);
1045+
ThreadPool Pool(heavyweight_hardware_concurrency(ThreadCount));
10461046
for (auto IndexCount : ModulesOrdering) {
10471047
auto &Mod = Modules[IndexCount];
10481048
Pool.async([&](int count) {

llvm/lib/Support/Host.cpp

+5-2
Original file line numberDiff line numberDiff line change
@@ -1266,7 +1266,7 @@ StringRef sys::getHostCPUName() { return "generic"; }
12661266
// On Linux, the number of physical cores can be computed from /proc/cpuinfo,
12671267
// using the number of unique physical/core id pairs. The following
12681268
// implementation reads the /proc/cpuinfo format on an x86_64 system.
1269-
static int computeHostNumPhysicalCores() {
1269+
int computeHostNumPhysicalCores() {
12701270
// Read /proc/cpuinfo as a stream (until EOF reached). It cannot be
12711271
// mmapped because it appears to have 0 size.
12721272
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> Text =
@@ -1312,7 +1312,7 @@ static int computeHostNumPhysicalCores() {
13121312
#include <sys/sysctl.h>
13131313

13141314
// Gets the number of *physical cores* on the machine.
1315-
static int computeHostNumPhysicalCores() {
1315+
int computeHostNumPhysicalCores() {
13161316
uint32_t count;
13171317
size_t len = sizeof(count);
13181318
sysctlbyname("hw.physicalcpu", &count, &len, NULL, 0);
@@ -1326,6 +1326,9 @@ static int computeHostNumPhysicalCores() {
13261326
}
13271327
return count;
13281328
}
1329+
#elif defined(_WIN32)
1330+
// Defined in llvm/lib/Support/Windows/Threading.inc
1331+
int computeHostNumPhysicalCores();
13291332
#else
13301333
// On other systems, return -1 to indicate unknown.
13311334
static int computeHostNumPhysicalCores() { return -1; }

0 commit comments

Comments
 (0)