So uh...how do you know which pin_cpus to use?

James_inthe_box · October 18, 2016, 8:34pm

Never really understood this:

"The correct pin_cpus setting to use is dependent on your CPU architecture. Intel and AMD systems enumerate processors in different ways. Using the wrong pin_cpus setting can cause poor performance."

Is there a magical formula? Any advice would help thanks.

James

Azoff_Justin_S · October 18, 2016, 8:52pm

The best thing to do is to install the hwloc package and use the lstopo or lstopo-no-graphics tool to render a big ascii art image of the system.

on centos7 this works:

lstopo-no-graphics --of txt

You'll get something that looks like this:

https://www.open-mpi.org/projects/hwloc/lstopo/images/2XeonE5v2+2cuda+1display_v1.11.png

or

https://www.open-mpi.org/projects/hwloc/lstopo/images/4Opteron6200.v1.11.png

The numbers towards the bottom are the cpu ids. So you can see that using something like

1,3,5,7,9,11,13,15,17,19,21,23,25

on an intel cpu would be the worst thing you could do since 21,23,25 are on the same physical cores as 1,3, and 5

Azoff_Justin_S · October 18, 2016, 8:55pm

Oh, I should add... ".. on that particular system". On some of our numa machines the allocation is different and 1,3,5,7,9 would be the right cpus to use!

James_inthe_box · October 18, 2016, 9:12pm

Ok cool thanks Justin...so basically I wanna stagger these out so I don't have several processes on the same core ya?

cat /proc/cpuinfo | egrep "processor|core id"
processor : 0
core id : 0
processor : 1
core id : 0
processor : 2
core id : 1
processor : 3
core id : 1
processor : 4
core id : 2
processor : 5
core id : 2
processor : 6
core id : 3
processor : 7
core id : 3
processor : 8
core id : 4
processor : 9
core id : 4
processor : 10
core id : 5
processor : 11
core id : 5
processor : 12
core id : 0
processor : 13
core id : 0
processor : 14
core id : 1
processor : 15
core id : 1
processor : 16
core id : 2
processor : 17
core id : 2
processor : 18
core id : 3
processor : 19
core id : 3
processor : 20
core id : 4
processor : 21
core id : 4
processor : 22
core id : 5
processor : 23
core id : 5

1,3,5,7,9,11 seem to be the best ones here. Thanks...that's super helpful!

James

Azoff_Justin_S · October 18, 2016, 9:19pm

Possibly.. I'd check with what hwloc says. I think just turning off hyper threading makes this even easier since that completely removes the possibility of accidentally pinning 2 workers to the same core.

James_inthe_box · October 18, 2016, 9:23pm

Sweet...thanks Justin...hwloc is a cool app!

James

Azoff_Justin_S · October 18, 2016, 9:34pm

Yeah... it can be a bit confusing though since it has both a 'logical' (-l) and a 'physical' (-p) view.

I _think_ that the cpu ids in the physical view match what taskset use via broctl.

Fortunately you can run hwloc-ps -p and compare which pids are mapped to which cpus to verify it is working right.

Michal_Purzynski1 · October 18, 2016, 10:18pm

2.6 kernels on Linux enumerate HT in a different way 3.x and 4.x do

2.6

Core 0 thread 0
Core 0 thread 1

Etc

3.x

Core 0-N on CPU 0 first half of threads
Then CPU 1
Then CPU 0 second half of threads
Then CPU 1

Results for HT vs cross numa are about to be published, soon
I don't like cache misses when CPU 1 is reaching for data on node 0 though. It is not about cross numa bandwidth it's the fact then you have in the worst case 67ns to process a smallest packet on 10Gbit. And L3 hit on ivy bridge is at least 15ns.
Miss is 5x that.

Azoff_Justin_S · October 18, 2016, 10:28pm

Ah! That explains a lot. I wonder if numa allocation changed too. We just upgraded some machines from centos6 to 7 and I was wondering how the meticulously written node.cfg we had been using for months now appeared completely wrong.

I wonder if broctl should support hwloc for cpu pinning instead of task set. I wouldn't mind having an 'auto' mode that just does the right thing.

It looks like on our dual socket numa box we should be using

0,2,4,6,8,10,12,14 for one 10g card and
1,3,5,7,9,11,13,15 for the other 10g card

0-19 are the physical cores and 20-39 are the HT cores, but using 0,1,2,3 flips between numa nodes which is not what anyone wants.

Michal_Purzynski1 · October 18, 2016, 10:37pm

Lesson learned for me. Never answer from a phone, esp. trying to cover numa allocation on 56 threads on 4 inches

Take back what I said. Here is how it looks like, I’m in front of a server with 2x NIC. I have E5-2697 v3 here, 14 physical cores per CPU, HT enabled, kernel 4.4.something.

0-13 - NUMA node 0, CPU 0, hthreads 0-13

14-27 - NUMA node 1, CPU 1, cores 14-27

28-41 - NUMA node 0, CPU 0, hthreads 28-41

42-55 - NUMA node 1, CPU 1 again

1st card should use virtual cores (AKA threads) 0-13 + 28-41

2nd card should use 14-27 + 42-55

Topic		Replies	Views
Core affinity on AMD Opteron 6276 Zeek	3	86	May 6, 2022
problem with cpu pinning Zeek	2	100	May 6, 2022
Core affinity on AMD Opteron 6276 Zeek	1	68	May 6, 2022
Intel Card Question Zeek	2	87	May 6, 2022
High-CPU on just a single worker in the cluster Zeek	11	104	May 6, 2022

So uh...how do you know which pin_cpus to use?

Related topics