hugepage compaction causes performance drop

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* hugepage compaction causes performance drop
@ 2015-11-19  9:29 Aaron Lu
  2015-11-19 13:29 ` Vlastimil Babka
  0 siblings, 1 reply; 15+ messages in thread
From: Aaron Lu @ 2015-11-19  9:29 UTC (permalink / raw)
  To: linux-mm; +Cc: Huang Ying, Dave Hansen, Tim Chen, lkp

[-- Attachment #1: Type: text/plain, Size: 1635 bytes --]

Hi,

One vm related test case run by LKP on a Haswell EP with 128GiB memory
showed that compaction code would cause performance drop about 30%. To
illustrate the problem, I've simplified the test with a program called
usemem(see attached). The test goes like this:
1 Boot up the server;
2 modprobe scsi_debug(a module that could use memory as SCSI device),
  dev_size set to 4/5 free memory, i.e. about 100GiB. Use it as swap.
3 run the usemem test, which use mmap to map a MAP_PRIVATE | MAP_ANON
  region with the size set to 3/4 of (remaining_free_memory + swap), and
  then write to that region sequentially to trigger page fault and swap
  out.

The above test runs with two configs regarding the below two sysfs files:
/sys/kernel/mm/transparent_hugepage/enabled
/sys/kernel/mm/transparent_hugepage/defrag
1 transparent hugepage and defrag are both set to always, let's call it
  always-always case;
2 transparent hugepage is set to always while defrag is set to never,
  let's call it always-never case.

The output from the always-always case is:
Setting up swapspace version 1, size = 104627196 KiB
no label, UUID=aafa53ae-af9e-46c9-acb9-8b3d4f57f610
cmdline: /lkp/aaron/src/bin/usemem 99994672128
99994672128 transferred in 95 seconds, throughput: 1003 MB/s

And the output from the always-never case is:
etting up swapspace version 1, size = 104629244 KiB
no label, UUID=60563c82-d1c6-4d86-b9fa-b52f208097e9
cmdline: /lkp/aaron/src/bin/usemem 99995965440
99995965440 transferred in 67 seconds, throughput: 1423 MB/s

The vmstat and perf-profile are also attached, please let me know if you
need any more information, thanks.

[-- Attachment #2: swap_test.tar.xz --]
[-- Type: application/x-xz, Size: 297576 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-19  9:29 hugepage compaction causes performance drop Aaron Lu
@ 2015-11-19 13:29 ` Vlastimil Babka
  2015-11-20  8:55   ` Aaron Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2015-11-19 13:29 UTC (permalink / raw)
  To: Aaron Lu, linux-mm
  Cc: Huang Ying, Dave Hansen, Tim Chen, lkp, Andrea Arcangeli,
	David Rientjes, Joonsoo Kim

+CC Andrea, David, Joonsoo

On 11/19/2015 10:29 AM, Aaron Lu wrote:
> Hi,
>
> One vm related test case run by LKP on a Haswell EP with 128GiB memory
> showed that compaction code would cause performance drop about 30%. To
> illustrate the problem, I've simplified the test with a program called
> usemem(see attached). The test goes like this:
> 1 Boot up the server;
> 2 modprobe scsi_debug(a module that could use memory as SCSI device),
>    dev_size set to 4/5 free memory, i.e. about 100GiB. Use it as swap.
> 3 run the usemem test, which use mmap to map a MAP_PRIVATE | MAP_ANON
>    region with the size set to 3/4 of (remaining_free_memory + swap), and
>    then write to that region sequentially to trigger page fault and swap
>    out.
>
> The above test runs with two configs regarding the below two sysfs files:
> /sys/kernel/mm/transparent_hugepage/enabled
> /sys/kernel/mm/transparent_hugepage/defrag
> 1 transparent hugepage and defrag are both set to always, let's call it
>    always-always case;
> 2 transparent hugepage is set to always while defrag is set to never,
>    let's call it always-never case.
>
> The output from the always-always case is:
> Setting up swapspace version 1, size = 104627196 KiB
> no label, UUID=aafa53ae-af9e-46c9-acb9-8b3d4f57f610
> cmdline: /lkp/aaron/src/bin/usemem 99994672128
> 99994672128 transferred in 95 seconds, throughput: 1003 MB/s
>
> And the output from the always-never case is:
> etting up swapspace version 1, size = 104629244 KiB
> no label, UUID=60563c82-d1c6-4d86-b9fa-b52f208097e9
> cmdline: /lkp/aaron/src/bin/usemem 99995965440
> 99995965440 transferred in 67 seconds, throughput: 1423 MB/s

So yeah this is an example of workload that has no benefit from THP's, 
but pays all the cost. Fixing that is non-trivial and I admit I haven't 
pushed my prior efforts there too much lately...
But it's also possible there still are actual compaction bugs making the 
issue worse.

> The vmstat and perf-profile are also attached, please let me know if you
> need any more information, thanks.

Output from vmstat (the tool) isn't much useful here, a periodic "cat 
/proc/vmstat" would be much better.
The perf profiles are somewhat weirdly sorted by children cost (?), but 
I noticed a very high cost (46%) in pageblock_pfn_to_page(). This could 
be due to a very large but sparsely populated zone. Could you provide 
/proc/zoneinfo?
If the compaction scanners behave strangely due to a bug, enabling the 
ftrace compaction tracepoints should help find the cause. That might 
produce a very large output, but maybe it would be enough to see some 
parts of it (i.e. towards beginning, middle, end of the experiment).

Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-19 13:29 ` Vlastimil Babka
@ 2015-11-20  8:55   ` Aaron Lu
  2015-11-20  9:33     ` Aaron Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Aaron Lu @ 2015-11-20  8:55 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm
  Cc: Huang Ying, Dave Hansen, Tim Chen, lkp, Andrea Arcangeli,
	David Rientjes, Joonsoo Kim

On 11/19/2015 09:29 PM, Vlastimil Babka wrote:
> +CC Andrea, David, Joonsoo
> 
> On 11/19/2015 10:29 AM, Aaron Lu wrote:
>> The vmstat and perf-profile are also attached, please let me know if you
>> need any more information, thanks.
> 
> Output from vmstat (the tool) isn't much useful here, a periodic "cat 
> /proc/vmstat" would be much better.

No problem.

> The perf profiles are somewhat weirdly sorted by children cost (?), but 
> I noticed a very high cost (46%) in pageblock_pfn_to_page(). This could 
> be due to a very large but sparsely populated zone. Could you provide 
> /proc/zoneinfo?

Is a one time /proc/zoneinfo enough or also a periodic one?

> If the compaction scanners behave strangely due to a bug, enabling the 
> ftrace compaction tracepoints should help find the cause. That might 
> produce a very large output, but maybe it would be enough to see some 
> parts of it (i.e. towards beginning, middle, end of the experiment).

I'll see how to do this, never used ftrace before.

Thanks for the quick response.

Regards,
Aaron

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-20  8:55   ` Aaron Lu
@ 2015-11-20  9:33     ` Aaron Lu
  2015-11-20 10:06       ` Vlastimil Babka
  0 siblings, 1 reply; 15+ messages in thread
From: Aaron Lu @ 2015-11-20  9:33 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm
  Cc: Huang Ying, Dave Hansen, Tim Chen, lkp, Andrea Arcangeli,
	David Rientjes, Joonsoo Kim

[-- Attachment #1: Type: text/plain, Size: 835 bytes --]

On 11/20/2015 04:55 PM, Aaron Lu wrote:
> On 11/19/2015 09:29 PM, Vlastimil Babka wrote:
>> +CC Andrea, David, Joonsoo
>>
>> On 11/19/2015 10:29 AM, Aaron Lu wrote:
>>> The vmstat and perf-profile are also attached, please let me know if you
>>> need any more information, thanks.
>>
>> Output from vmstat (the tool) isn't much useful here, a periodic "cat 
>> /proc/vmstat" would be much better.
> 
> No problem.
> 
>> The perf profiles are somewhat weirdly sorted by children cost (?), but 
>> I noticed a very high cost (46%) in pageblock_pfn_to_page(). This could 
>> be due to a very large but sparsely populated zone. Could you provide 
>> /proc/zoneinfo?
> 
> Is a one time /proc/zoneinfo enough or also a periodic one?

Please see attached, note that this is a new run so the perf profile is
a little different.

Thanks,
Aaron

[-- Attachment #2: zoneinfo --]
[-- Type: text/plain, Size: 36523 bytes --]

/proc/zoneinfo
Node 0, zone      DMA
  pages free     3950
        min      2
        low      2
        high     3
        scanned  0
        spanned  4095
        present  3994
        managed  3973
    nr_free_pages 3950
    nr_alloc_batch 1
    nr_inactive_anon 0
    nr_active_anon 0
    nr_inactive_file 21
    nr_active_file 1
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 0
    nr_mapped    0
    nr_file_pages 22
    nr_dirty     0
    nr_writeback 0
    nr_slab_reclaimable 0
    nr_slab_unreclaimable 1
    nr_page_table_pages 0
    nr_kernel_stack 0
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     0
    nr_dirtied   0
    nr_written   0
    nr_pages_scanned 0
    numa_hit     23
    numa_miss    0
    numa_foreign 0
    numa_interleave 0
    numa_local   1
    numa_other   22
    workingset_refault 0
    workingset_activate 0
    workingset_nodereclaim 0
    nr_anon_transparent_hugepages 0
    nr_free_cma  0
        protection: (0, 1873, 64327, 64327)
  pagesets
    cpu: 0
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 1
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 2
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 3
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 4
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 5
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 6
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 7
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 8
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 9
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 10
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 11
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 12
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 13
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 14
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 15
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 16
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 17
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 18
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 19
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 20
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 21
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 22
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 23
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 24
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 25
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 26
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 27
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 28
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 29
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 30
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 31
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 32
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 33
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 34
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 35
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 36
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 37
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 38
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 39
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 40
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 41
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 42
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 43
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 44
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 45
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 46
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 47
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 48
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 49
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 50
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 51
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 52
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 53
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 54
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 55
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 56
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 57
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 58
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 59
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 60
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 61
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 62
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 63
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 64
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 65
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 66
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 67
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 68
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 69
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 70
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
    cpu: 71
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 14
  all_unreclaimable: 0
  start_pfn:         1
  inactive_ratio:    1
Node 0, zone    DMA32
  pages free     62829
        min      327
        low      408
        high     490
        scanned  0
        spanned  1044480
        present  495951
        managed  479559
    nr_free_pages 62829
    nr_alloc_batch 3
    nr_inactive_anon 12
    nr_active_anon 50
    nr_inactive_file 1440
    nr_active_file 316
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 40
    nr_mapped    39
    nr_file_pages 1778
    nr_dirty     0
    nr_writeback 0
    nr_slab_reclaimable 238
    nr_slab_unreclaimable 246
    nr_page_table_pages 15
    nr_kernel_stack 9
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     22
    nr_dirtied   0
    nr_written   0
    nr_pages_scanned 0
    numa_hit     416524
    numa_miss    0
    numa_foreign 0
    numa_interleave 0
    numa_local   414721
    numa_other   1803
    workingset_refault 0
    workingset_activate 0
    workingset_nodereclaim 0
    nr_anon_transparent_hugepages 0
    nr_free_cma  0
        protection: (0, 0, 62453, 62453)
  pagesets
    cpu: 0
              count: 33
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 1
              count: 104
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 2
              count: 79
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 3
              count: 109
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 4
              count: 53
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 5
              count: 43
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 6
              count: 126
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 7
              count: 38
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 8
              count: 63
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 9
              count: 63
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 10
              count: 144
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 11
              count: 59
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 12
              count: 43
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 13
              count: 52
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 14
              count: 111
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 15
              count: 112
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 16
              count: 118
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 17
              count: 41
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 18
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 19
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 20
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 21
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 22
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 23
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 24
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 25
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 26
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 27
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 28
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 29
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 30
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 31
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 32
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 33
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 34
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 35
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 36
              count: 44
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 37
              count: 53
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 38
              count: 109
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 39
              count: 40
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 40
              count: 85
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 41
              count: 30
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 42
              count: 48
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 43
              count: 59
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 44
              count: 96
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 45
              count: 55
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 46
              count: 93
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 47
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 48
              count: 75
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 49
              count: 63
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 50
              count: 87
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 51
              count: 124
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 52
              count: 68
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 53
              count: 57
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 54
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 55
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 56
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 57
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 58
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 59
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 60
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 61
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 62
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 63
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 64
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 65
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 66
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 67
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 68
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 69
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 70
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
    cpu: 71
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 70
  all_unreclaimable: 0
  start_pfn:         4096
  inactive_ratio:    3
Node 0, zone   Normal
  pages free     13732
        min      10921
        low      13651
        high     16381
        scanned  0
        spanned  16252928
        present  16252928
        managed  15988216
    nr_free_pages 13732
    nr_alloc_batch 1009
    nr_inactive_anon 80
    nr_active_anon 630
    nr_inactive_file 44444
    nr_active_file 11926
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 633
    nr_mapped    1613
    nr_file_pages 56462
    nr_dirty     0
    nr_writeback 0
    nr_slab_reclaimable 6196
    nr_slab_unreclaimable 12143
    nr_page_table_pages 104
    nr_kernel_stack 590
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     92
    nr_dirtied   29
    nr_written   29
    nr_pages_scanned 0
    numa_hit     16004783
    numa_miss    0
    numa_foreign 10066439
    numa_interleave 87113
    numa_local   15944244
    numa_other   60539
    workingset_refault 0
    workingset_activate 0
    workingset_nodereclaim 0
    nr_anon_transparent_hugepages 0
    nr_free_cma  0
        protection: (0, 0, 0, 0)
  pagesets
    cpu: 0
              count: 77
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 1
              count: 25
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 2
              count: 155
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 3
              count: 56
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 4
              count: 176
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 5
              count: 157
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 6
              count: 109
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 7
              count: 156
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 8
              count: 66
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 9
              count: 164
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 10
              count: 167
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 11
              count: 161
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 12
              count: 76
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 13
              count: 73
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 14
              count: 184
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 15
              count: 170
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 16
              count: 81
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 17
              count: 181
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 18
              count: 136
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 19
              count: 105
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 20
              count: 98
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 21
              count: 134
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 22
              count: 163
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 23
              count: 46
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 24
              count: 181
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 25
              count: 138
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 26
              count: 127
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 27
              count: 104
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 28
              count: 54
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 29
              count: 105
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 30
              count: 95
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 31
              count: 150
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 32
              count: 166
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 33
              count: 137
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 34
              count: 152
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 35
              count: 108
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 36
              count: 176
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 37
              count: 163
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 38
              count: 124
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 39
              count: 132
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 40
              count: 108
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 41
              count: 91
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 42
              count: 172
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 43
              count: 165
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 44
              count: 182
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 45
              count: 163
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 46
              count: 122
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 47
              count: 127
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 48
              count: 151
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 49
              count: 170
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 50
              count: 145
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 51
              count: 138
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 52
              count: 176
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 53
              count: 183
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 54
              count: 112
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 55
              count: 144
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 56
              count: 49
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 57
              count: 57
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 58
              count: 110
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 59
              count: 124
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 60
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 61
              count: 184
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 62
              count: 126
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 63
              count: 75
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 64
              count: 108
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 65
              count: 10
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 66
              count: 152
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 67
              count: 94
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 68
              count: 9
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 69
              count: 66
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 70
              count: 60
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 71
              count: 73
              high:  186
              batch: 31
  vm stats threshold: 125
  all_unreclaimable: 0
  start_pfn:         1048576
  inactive_ratio:    24
Node 1, zone   Normal
  pages free     6322599
        min      11276
        low      14095
        high     16914
        scanned  0
        spanned  16777216
        present  16777216
        managed  16507772
    nr_free_pages 6322599
    nr_alloc_batch 2797
    nr_inactive_anon 2202
    nr_active_anon 5117
    nr_inactive_file 46700
    nr_active_file 11418
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 4967
    nr_mapped    3009
    nr_file_pages 60363
    nr_dirty     0
    nr_writeback 0
    nr_slab_reclaimable 5328
    nr_slab_unreclaimable 14512
    nr_page_table_pages 458
    nr_kernel_stack 274
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     2245
    nr_dirtied   44
    nr_written   44
    nr_pages_scanned 0
    numa_hit     219272
    numa_miss    10066439
    numa_foreign 0
    numa_interleave 88755
    numa_local   195291
    numa_other   10090608
    workingset_refault 0
    workingset_activate 0
    workingset_nodereclaim 0
    nr_anon_transparent_hugepages 2
    nr_free_cma  0
        protection: (0, 0, 0, 0)
  pagesets
    cpu: 0
              count: 158
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 1
              count: 140
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 2
              count: 73
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 3
              count: 153
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 4
              count: 179
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 5
              count: 70
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 6
              count: 143
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 7
              count: 93
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 8
              count: 68
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 9
              count: 84
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 10
              count: 153
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 11
              count: 89
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 12
              count: 164
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 13
              count: 88
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 14
              count: 177
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 15
              count: 66
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 16
              count: 51
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 17
              count: 141
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 18
              count: 55
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 19
              count: 132
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 20
              count: 170
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 21
              count: 145
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 22
              count: 163
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 23
              count: 100
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 24
              count: 17
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 25
              count: 87
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 26
              count: 152
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 27
              count: 50
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 28
              count: 165
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 29
              count: 145
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 30
              count: 114
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 31
              count: 26
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 32
              count: 168
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 33
              count: 46
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 34
              count: 171
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 35
              count: 144
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 36
              count: 79
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 37
              count: 130
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 38
              count: 40
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 39
              count: 58
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 40
              count: 166
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 41
              count: 185
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 42
              count: 150
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 43
              count: 110
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 44
              count: 56
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 45
              count: 83
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 46
              count: 165
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 47
              count: 136
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 48
              count: 93
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 49
              count: 101
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 50
              count: 165
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 51
              count: 84
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 52
              count: 164
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 53
              count: 181
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 54
              count: 148
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 55
              count: 181
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 56
              count: 145
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 57
              count: 159
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 58
              count: 163
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 59
              count: 75
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 60
              count: 68
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 61
              count: 127
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 62
              count: 106
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 63
              count: 170
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 64
              count: 161
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 65
              count: 176
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 66
              count: 136
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 67
              count: 162
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 68
              count: 151
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 69
              count: 150
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 70
              count: 116
              high:  186
              batch: 31
  vm stats threshold: 125
    cpu: 71
              count: 137
              high:  186
              batch: 31
  vm stats threshold: 125
  all_unreclaimable: 0
  start_pfn:         17301504
  inactive_ratio:    24

[-- Attachment #3: proc-vmstat.gz --]
[-- Type: application/gzip, Size: 22205 bytes --]

[-- Attachment #4: perf-profile.xz --]
[-- Type: application/x-xz, Size: 116760 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-20  9:33     ` Aaron Lu
@ 2015-11-20 10:06       ` Vlastimil Babka
  2015-11-23  8:16         ` Joonsoo Kim
  2015-11-24  2:45         ` Joonsoo Kim
  0 siblings, 2 replies; 15+ messages in thread
From: Vlastimil Babka @ 2015-11-20 10:06 UTC (permalink / raw)
  To: Aaron Lu, linux-mm
  Cc: Huang Ying, Dave Hansen, Tim Chen, lkp, Andrea Arcangeli,
	David Rientjes, Joonsoo Kim

On 11/20/2015 10:33 AM, Aaron Lu wrote:
> On 11/20/2015 04:55 PM, Aaron Lu wrote:
>> On 11/19/2015 09:29 PM, Vlastimil Babka wrote:
>>> +CC Andrea, David, Joonsoo
>>>
>>> On 11/19/2015 10:29 AM, Aaron Lu wrote:
>>>> The vmstat and perf-profile are also attached, please let me know if you
>>>> need any more information, thanks.
>>>
>>> Output from vmstat (the tool) isn't much useful here, a periodic "cat
>>> /proc/vmstat" would be much better.
>>
>> No problem.
>>
>>> The perf profiles are somewhat weirdly sorted by children cost (?), but
>>> I noticed a very high cost (46%) in pageblock_pfn_to_page(). This could
>>> be due to a very large but sparsely populated zone. Could you provide
>>> /proc/zoneinfo?
>>
>> Is a one time /proc/zoneinfo enough or also a periodic one?
>
> Please see attached, note that this is a new run so the perf profile is
> a little different.
>
> Thanks,
> Aaron

Thanks.

DMA32 is a bit sparse:

Node 0, zone    DMA32
   pages free     62829
         min      327
         low      408
         high     490
         scanned  0
         spanned  1044480
         present  495951
         managed  479559

Since the other zones are much larger, probably this is not the culprit. 
But tracepoints should tell us more. I have a theory that updating free 
scanner's cached pfn doesn't happen if it aborts due to need_resched() 
during isolate_freepages(), before hitting a valid pageblock, if the 
zone has a large hole in it. But zoneinfo doesn't tell us if the large 
difference between "spanned" and "present"/"managed" is due to a large 
hole, or many smaller holes...

compact_migrate_scanned 1982396
compact_free_scanned 40576943
compact_isolated 2096602
compact_stall 9070
compact_fail 6025
compact_success 3045

So it's struggling to find free pages, no wonder about that. I'm working 
on a series that should hopefully help here, and Joonsoo as well.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-20 10:06       ` Vlastimil Babka
@ 2015-11-23  8:16         ` Joonsoo Kim
  2015-11-23  8:33           ` Aaron Lu
  2015-11-24  2:45         ` Joonsoo Kim
  1 sibling, 1 reply; 15+ messages in thread
From: Joonsoo Kim @ 2015-11-23  8:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Aaron Lu, linux-mm, Huang Ying, Dave Hansen, Tim Chen, lkp,
	Andrea Arcangeli, David Rientjes

On Fri, Nov 20, 2015 at 11:06:46AM +0100, Vlastimil Babka wrote:
> On 11/20/2015 10:33 AM, Aaron Lu wrote:
> >On 11/20/2015 04:55 PM, Aaron Lu wrote:
> >>On 11/19/2015 09:29 PM, Vlastimil Babka wrote:
> >>>+CC Andrea, David, Joonsoo
> >>>
> >>>On 11/19/2015 10:29 AM, Aaron Lu wrote:
> >>>>The vmstat and perf-profile are also attached, please let me know if you
> >>>>need any more information, thanks.
> >>>
> >>>Output from vmstat (the tool) isn't much useful here, a periodic "cat
> >>>/proc/vmstat" would be much better.
> >>
> >>No problem.
> >>
> >>>The perf profiles are somewhat weirdly sorted by children cost (?), but
> >>>I noticed a very high cost (46%) in pageblock_pfn_to_page(). This could
> >>>be due to a very large but sparsely populated zone. Could you provide
> >>>/proc/zoneinfo?
> >>
> >>Is a one time /proc/zoneinfo enough or also a periodic one?
> >
> >Please see attached, note that this is a new run so the perf profile is
> >a little different.
> >
> >Thanks,
> >Aaron
> 
> Thanks.
> 
> DMA32 is a bit sparse:
> 
> Node 0, zone    DMA32
>   pages free     62829
>         min      327
>         low      408
>         high     490
>         scanned  0
>         spanned  1044480
>         present  495951
>         managed  479559
> 
> Since the other zones are much larger, probably this is not the
> culprit. But tracepoints should tell us more. I have a theory that
> updating free scanner's cached pfn doesn't happen if it aborts due
> to need_resched() during isolate_freepages(), before hitting a valid
> pageblock, if the zone has a large hole in it. But zoneinfo doesn't
> tell us if the large difference between "spanned" and
> "present"/"managed" is due to a large hole, or many smaller holes...
> 
> compact_migrate_scanned 1982396
> compact_free_scanned 40576943
> compact_isolated 2096602
> compact_stall 9070
> compact_fail 6025
> compact_success 3045
> 
> So it's struggling to find free pages, no wonder about that. I'm

Numbers looks fine to me. I guess this performance degradation is
caused by COMPACT_CLUSTER_MAX change (from 32 to 256). THP allocation
is async so should be aborted quickly. But, after isolating 256
migratable pages, it can't be aborted and will finish 256 pages
migration (at least, current implementation).

Aaron, please test again with setting COMPACT_CLUSTER_MAX to 32
(in swap.h)?

And, please attach always-always's vmstat numbers, too.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-23  8:16         ` Joonsoo Kim
@ 2015-11-23  8:33           ` Aaron Lu
  2015-11-23  9:24             ` Joonsoo Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Aaron Lu @ 2015-11-23  8:33 UTC (permalink / raw)
  To: Joonsoo Kim, Vlastimil Babka
  Cc: linux-mm, Huang Ying, Dave Hansen, Tim Chen, lkp,
	Andrea Arcangeli, David Rientjes

[-- Attachment #1: Type: text/plain, Size: 881 bytes --]

On 11/23/2015 04:16 PM, Joonsoo Kim wrote:
> Numbers looks fine to me. I guess this performance degradation is
> caused by COMPACT_CLUSTER_MAX change (from 32 to 256). THP allocation
> is async so should be aborted quickly. But, after isolating 256
> migratable pages, it can't be aborted and will finish 256 pages
> migration (at least, current implementation).
> 
> Aaron, please test again with setting COMPACT_CLUSTER_MAX to 32
> (in swap.h)?

This is what I found in include/linux/swap.h:

#define SWAP_CLUSTER_MAX 32UL
#define COMPACT_CLUSTER_MAX SWAP_CLUSTER_MAX

Looks like it is already 32, or am I looking at the wrong place?

BTW, I'm using v4.3 for all these tests, and I just checked v4.4-rc2,
the above definition doesn't change.

> 
> And, please attach always-always's vmstat numbers, too.

Sure, attached the vmstat tool output, taken every second.

Thanks,
Aaron

[-- Attachment #2: vmstat --]
[-- Type: text/plain, Size: 9298 bytes --]

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- -----timestamp-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st                 CST
 6  1      0 25647504    580 626540    0    0     0     0   66   19  0  1 99  0  0 2015-11-20 02:19:37
 1  0      0 25563796    580 638000    0    0     0     0  769 6085  0  1 99  0  0 2015-11-20 02:19:38
 1  0      0 22010336    580 638168    0    0     0     0 1698  930  0  1 99  0  0 2015-11-20 02:19:39
 1  0      0 18589868    580 638084    0    0     0     0 1198  793  0  1 99  0  0 2015-11-20 02:19:40
 1  0      0 15173252    580 638104    0    0     0     0 1234  738  0  1 99  0  0 2015-11-20 02:19:41
 1  0      0 11751756    580 638120    0    0     0     0 1224  679  0  1 99  0  0 2015-11-20 02:19:42
 1  0      0 8322416    580 638156    0    0     0     0 1213  726  0  1 99  0  0 2015-11-20 02:19:43
 1  0      0 4877336    580 638232    0    0     0     0 1171  726  0  1 99  0  0 2015-11-20 02:19:44
 1  0      0 1437496    580 638300    0    0     0     0 1203  641  0  1 99  0  0 2015-11-20 02:19:45
 1  0 460904 439088    284 631300 1020 465260  1020 465260 7392 6468  0  2 98  0  0 2015-11-20 02:19:46
 3  1 1656704 371028    148 633716 2216 1203792  2216 1203792 253072 5293  0  4 95  1  0 2015-11-20 02:19:47
 2  0 2989412 385264    140 631940 1772 1325968  1772 1325968 291189 4987  0  4 95  0  0 2015-11-20 02:19:48
 1  0 4271348 396588    140 634024  604 1281156   604 1281156 114622 5095  0  2 97  0  0 2015-11-20 02:19:49
 1  0 5590260 391532    140 634208  324 1318916   324 1318916 1550 5516  0  1 99  0  0 2015-11-20 02:19:50
 3  0 6735960 373428    140 634744   20 1147804    20 1147804 106941 4821  0  2 98  0  0 2015-11-20 02:19:51
 3  0 7933896 374244    140 636020  632 1197576   632 1197572 240440 4690  0  4 96  0  0 2015-11-20 02:19:52
 3  0 9262464 366936    140 638332  128 1327512   128 1327516 291280 4277  0  4 96  0  0 2015-11-20 02:19:53
 1  0 10465268 400632    140 637240   56 1204884    56 1204884 119208 4982  0  2 97  0  0 2015-11-20 02:19:54
 1  0 11487212 401092    140 636896   24 1019904    24 1019904 1579 5249  0  1 99  0  0 2015-11-20 02:19:55
 1  0 12398600 400644    140 637240    8 911396     8 911396 1434 4825  0  1 99  0  0 2015-11-20 02:19:56
 1  1 13407712 396808    140 636480  108 1010376   108 1010376 1741 9335  0  1 98  0  0 2015-11-20 02:19:57
 1  0 14212948 397452    140 637192  120 804160   120 804160 1414 4490  0  1 99  0  0 2015-11-20 02:19:58
 1  0 14976844 399148    140 636696    0 763904     0 763904 1473 4379  0  1 99  0  0 2015-11-20 02:19:59
 1  0 15765336 401612    140 636656   12 788508    12 788508 1387 4378  0  1 99  0  0 2015-11-20 02:20:00
 1  0 16737876 403216    140 636468   80 975368    80 975364 1469 4950  0  1 99  0  0 2015-11-20 02:20:01
 1  0 17532708 403472    140 637256    0 792056     0 792060 1375 4558  0  1 99  0  0 2015-11-20 02:20:02
 1  0 18263000 402296    140 637784  784 733184   784 733184 15557 4555  0  1 98  0  0 2015-11-20 02:20:03
 1  0 19246408 404008    140 639284    0 981040     0 981040 15169 4835  0  1 99  0  0 2015-11-20 02:20:04
 1  0 19713820 407392    140 638924    0 467420     0 467420 15464 3788  0  1 99  0  0 2015-11-20 02:20:05
 1  0 20326740 401112    140 639860   60 612936    60 612936 15072 4204  0  1 99  0  0 2015-11-20 02:20:06
 1  0 21001060 402152    140 640008    0 676376     0 676376 15018 4148  0  1 99  0  0 2015-11-20 02:20:07
 1  0 21563284 406060    140 639804   20 560188    20 560188 17919 8419  0  2 98  0  0 2015-11-20 02:20:08
 1  0 22077856 403296    140 640604    0 514576     0 514576 15618 3734  0  1 99  0  0 2015-11-20 02:20:09
 1  0 22578344 402016    140 640896   32 500516    32 500516 15288 3848  0  1 99  0  0 2015-11-20 02:20:10
 1  0 23054368 401156    140 641000    0 476024     0 476024 15534 3896  0  1 99  0  0 2015-11-20 02:20:11
 1  0 23678064 403060    140 640980    0 623700     0 623700 15184 4009  0  1 99  0  0 2015-11-20 02:20:12
 1  0 24152136 424660    140 646608 7564 483848  7564 483848 3544 4709  0  1 98  0  0 2015-11-20 02:20:13
 1  0 24631332 402948    140 646572  124 479232   124 479232 1475 4037  0  1 99  0  0 2015-11-20 02:20:14
 1  0 25137188 399836    140 646496    0 505856     0 505856 1546 3745  0  1 99  0  0 2015-11-20 02:20:15
 1  0 25809492 399544    140 639732    0 672304     0 672304 1500 4242  0  1 99  0  0 2015-11-20 02:20:16
 1  0 26839604 397088    140 639816  100 1030144   100 1030144 1476 5131  0  1 99  0  0 2015-11-20 02:20:17
 1  0 27873840 392212    140 640160    0 1034240     0 1034240 1387 5104  0  1 99  0  0 2015-11-20 02:20:18
 1  0 28866508 423100    140 634052   40 992692    40 992692 43633 8766  0  2 98  0  0 2015-11-20 02:20:19
 1  0 29953544 384884    140 634228 1020 1087856  1020 1087856 244003 2850  0  2 97  0  0 2015-11-20 02:20:20
 1  0 30991516 388644    140 634544  928 1038824   928 1038824 104550 4616  0  2 98  0  0 2015-11-20 02:20:21
 1  0 32099728 393432    140 634540   40 1108220    40 1108220 36817 5281  0  2 98  0  0 2015-11-20 02:20:22
 1  0 33346816 398860    140 634820  864 1248384   864 1248384 203863 4811  0  3 96  0  0 2015-11-20 02:20:23
 3  0 34256800 396392     92 636452  376 912708   376 912708 106741 4122  0  2 98  0  0 2015-11-20 02:20:24
 3  0 35305064 360224     92 637684  756 1047104   756 1047104 215548 3407  0  4 96  0  0 2015-11-20 02:20:25
 1  0 36096908 399740     92 636628  180 791124   180 791124 91507 4034  0  2 98  0  0 2015-11-20 02:20:26
 3  0 37168508 388644     92 637652  596 1072448   596 1072444 33876 5317  0  2 98  0  0 2015-11-20 02:20:27
 1  0 38356828 383224     92 635984  764 1189040   764 1189044 15618 5839  0  1 98  0  0 2015-11-20 02:20:28
 1  0 39697584 383288     92 636848    8 1342204     8 1342204 1466 5839  0  1 99  0  0 2015-11-20 02:20:29
 1  0 40936988 393532     92 636784  196 1239952   196 1239952 9621 10147  0  2 98  0  0 2015-11-20 02:20:30
 1  0 42314612 393596     92 636908    0 1375824     0 1375824 1513 5957  0  1 99  0  0 2015-11-20 02:20:31
 1  0 43648364 388076     92 637308    4 1333860     4 1333860 1403 5806  0  1 99  0  0 2015-11-20 02:20:32
 1  0 44909932 395256     92 637168    0 1261472     0 1261472 1407 5562  0  1 99  0  0 2015-11-20 02:20:33
 1  0 46161428 387000     92 637192    4 1253376     4 1253376 1341 5563  0  1 99  0  0 2015-11-20 02:20:34
 1  0 47420772 389516     92 637100    0 1257472     0 1257472 1490 5671  0  1 99  0  0 2015-11-20 02:20:35
 1  0 48688468 389928     92 637712   24 1267712    24 1267712 1406 5615  0  1 99  0  0 2015-11-20 02:20:36
 1  1 50005344 385328     92 636720   60 1316908    60 1316908 1472 5744  0  1 99  0  0 2015-11-20 02:20:37
 1  0 51232704 383852     92 636824  464 1230888   464 1230888 1564 5727  0  1 98  0  0 2015-11-20 02:20:38
 1  0 52472728 383456     92 637680    4 1236964     4 1236964 1419 5537  0  1 99  0  0 2015-11-20 02:20:39
 1  0 53671788 381408     92 637304    4 1201744     4 1201744 1368 5411  0  1 99  0  0 2015-11-20 02:20:40
 2  0 54444956 401360     92 636396  952 771872   952 771872 55875 9172  0  2 97  0  0 2015-11-20 02:20:41
 1  0 55317332 391568     92 637668  852 875836   852 875836 85727 4794  0  2 97  0  0 2015-11-20 02:20:42
 1  0 56218404 409888     92 634484  764 900928   764 900928 89499 4867  0  2 97  0  0 2015-11-20 02:20:43
 1  0 56989380 392256     92 633052 2016 773700  2016 773700 50196 5161  0  2 98  1  0 2015-11-20 02:20:44
 3  0 57633636 378960     92 633920  944 642984   944 642984 32478 4148  0  2 98  0  0 2015-11-20 02:20:45
 1  0 58693956 392100     92 633332  792 1061092   792 1061092 92713 4976  0  2 97  0  0 2015-11-20 02:20:46
 1  0 59583820 407380     92 633848  912 890916   912 890916 98123 4765  0  2 98  0  0 2015-11-20 02:20:47
 1  0 60493136 376532     92 633184 1128 912444  1128 912444 50279 5276  0  2 98  0  0 2015-11-20 02:20:48
 1  0 61168796 398636     92 634220  920 674796   920 674796 40032 4843  0  2 98  0  0 2015-11-20 02:20:49
 1  0 61952912 387680     92 633732  840 784944   840 784944 10903 4938  0  1 98  0  0 2015-11-20 02:20:50
 1  0 63190048 387784     92 634132  564 1237880   564 1237880 1759 5928  0  1 98  0  0 2015-11-20 02:20:51
 1  1 64455336 383276     92 633660   76 1265664    76 1265664 1541 5668  0  1 99  0  0 2015-11-20 02:20:52
 1  0 65825828 386788     92 633696   84 1370948    84 1370948 2145 9935  0  1 98  0  0 2015-11-20 02:20:53
 1  0 66505700 386764     92 634404  876 679936   876 679936 1574 4153  0  1 99  0  0 2015-11-20 02:20:54
 1  0 67357744 384320     92 634760    8 852056     8 852056 1470 4443  0  1 99  0  0 2015-11-20 02:20:55
 1  0 68500536 386720     92 634268   12 1142800    12 1142800 1367 5366  0  1 99  0  0 2015-11-20 02:20:56
 1  0 69670512 385312     92 634572    0 1171456     0 1171456 1403 5325  0  1 99  0  0 2015-11-20 02:20:57
 1  0 70771360 378756     92 634888  484 1099776   484 1099776 1459 5365  0  1 99  0  0 2015-11-20 02:20:58
 1  0 71882880 384664     92 635176    0 1114192     0 1114192 1454 5193  0  1 99  0  0 2015-11-20 02:20:59
 1  0 43315772 379880     92 634952   40 634800    40 634800 1382 3564  0  1 99  0  0 2015-11-20 02:21:00
 0  0  33600 25734156     92 634884 4536    0  4536     0 2215 4864  0  1 98  1  0 2015-11-20 02:21:01

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-23  8:33           ` Aaron Lu
@ 2015-11-23  9:24             ` Joonsoo Kim
  2015-11-24  3:40               ` Aaron Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Joonsoo Kim @ 2015-11-23  9:24 UTC (permalink / raw)
  To: Aaron Lu
  Cc: Joonsoo Kim, Vlastimil Babka, Linux Memory Management List,
	Huang Ying, Dave Hansen, Tim Chen, lkp, Andrea Arcangeli,
	David Rientjes

2015-11-23 17:33 GMT+09:00 Aaron Lu <aaron.lu@intel.com>:
> On 11/23/2015 04:16 PM, Joonsoo Kim wrote:
>> Numbers looks fine to me. I guess this performance degradation is
>> caused by COMPACT_CLUSTER_MAX change (from 32 to 256). THP allocation
>> is async so should be aborted quickly. But, after isolating 256
>> migratable pages, it can't be aborted and will finish 256 pages
>> migration (at least, current implementation).

Let me correct above comment. It can be aborted after some try.

>> Aaron, please test again with setting COMPACT_CLUSTER_MAX to 32
>> (in swap.h)?
>
> This is what I found in include/linux/swap.h:
>
> #define SWAP_CLUSTER_MAX 32UL
> #define COMPACT_CLUSTER_MAX SWAP_CLUSTER_MAX
>
> Looks like it is already 32, or am I looking at the wrong place?
>
> BTW, I'm using v4.3 for all these tests, and I just checked v4.4-rc2,
> the above definition doesn't change.

Sorry. I looked at linux-next tree and, there, it is 128.
Please ignore my comment! :)

>>
>> And, please attach always-always's vmstat numbers, too.
>
> Sure, attached the vmstat tool output, taken every second.

Oops... I'd like to see '1 sec interval cat /proc/vmstat' for always-never.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-23  9:24             ` Joonsoo Kim
@ 2015-11-24  3:40               ` Aaron Lu
  2015-11-24  4:55                 ` Joonsoo Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Aaron Lu @ 2015-11-24  3:40 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Joonsoo Kim, Vlastimil Babka, Linux Memory Management List,
	Huang Ying, Dave Hansen, Tim Chen, lkp, Andrea Arcangeli,
	David Rientjes

[-- Attachment #1: Type: text/plain, Size: 509 bytes --]

On 11/23/2015 05:24 PM, Joonsoo Kim wrote:
> 2015-11-23 17:33 GMT+09:00 Aaron Lu <aaron.lu@intel.com>:
>> On 11/23/2015 04:16 PM, Joonsoo Kim wrote:
>>>
>>> And, please attach always-always's vmstat numbers, too.
>>
>> Sure, attached the vmstat tool output, taken every second.
> 
> Oops... I'd like to see '1 sec interval cat /proc/vmstat' for always-never.

Here it is, the proc-vmstat for always-never.

BTW, I'm still learning how to do proper ftrace for this case and it may
take a while.

Thanks,
Aaron

[-- Attachment #2: proc-vmstat.gz --]
[-- Type: application/gzip, Size: 17041 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-24  3:40               ` Aaron Lu
@ 2015-11-24  4:55                 ` Joonsoo Kim
  2015-11-24  7:27                   ` Aaron Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Joonsoo Kim @ 2015-11-24  4:55 UTC (permalink / raw)
  To: Aaron Lu
  Cc: Vlastimil Babka, Linux Memory Management List, Huang Ying,
	Dave Hansen, Tim Chen, lkp, Andrea Arcangeli, David Rientjes

On Tue, Nov 24, 2015 at 11:40:28AM +0800, Aaron Lu wrote:
> On 11/23/2015 05:24 PM, Joonsoo Kim wrote:
> > 2015-11-23 17:33 GMT+09:00 Aaron Lu <aaron.lu@intel.com>:
> >> On 11/23/2015 04:16 PM, Joonsoo Kim wrote:
> >>>
> >>> And, please attach always-always's vmstat numbers, too.
> >>
> >> Sure, attached the vmstat tool output, taken every second.
> > 
> > Oops... I'd like to see '1 sec interval cat /proc/vmstat' for always-never.
> 
> Here it is, the proc-vmstat for always-never.

Okay. In this case, compaction never happen.
Could you show 1 sec interval cat /proc/pagetypeinfo for
always-always?

> BTW, I'm still learning how to do proper ftrace for this case and it may
> take a while.

You can do it simply with trace-cmd.

sudo trace-cmd record -e compaction &
run test program
fg
Ctrl + c

sudo trace-cmd report

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-24  4:55                 ` Joonsoo Kim
@ 2015-11-24  7:27                   ` Aaron Lu
  2015-11-24  8:29                     ` Joonsoo Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Aaron Lu @ 2015-11-24  7:27 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Vlastimil Babka, Linux Memory Management List, Huang Ying,
	Dave Hansen, Tim Chen, lkp, Andrea Arcangeli, David Rientjes

On 11/24/2015 12:55 PM, Joonsoo Kim wrote:
> On Tue, Nov 24, 2015 at 11:40:28AM +0800, Aaron Lu wrote:
>> BTW, I'm still learning how to do proper ftrace for this case and it may
>> take a while.
> 
> You can do it simply with trace-cmd.
> 
> sudo trace-cmd record -e compaction &
> run test program
> fg
> Ctrl + c
> 
> sudo trace-cmd report

Thanks for the tip, I just recorded it like this:
trace-cmd record -e compaction ./usemem xxx

Due to the big size of trace.out(6MB after compress), I've uploaed it:
https://drive.google.com/open?id=0B49uX3igf4K4UkJBOGt3cHhOU00

The pagetypeinfo, perf and proc-vmstat is also there.

Regards,
Aaron

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-24  7:27                   ` Aaron Lu
@ 2015-11-24  8:29                     ` Joonsoo Kim
  2015-11-25 12:44                       ` Vlastimil Babka
  0 siblings, 1 reply; 15+ messages in thread
From: Joonsoo Kim @ 2015-11-24  8:29 UTC (permalink / raw)
  To: Aaron Lu
  Cc: Vlastimil Babka, Linux Memory Management List, Huang Ying,
	Dave Hansen, Tim Chen, lkp, Andrea Arcangeli, David Rientjes

On Tue, Nov 24, 2015 at 03:27:43PM +0800, Aaron Lu wrote:
> On 11/24/2015 12:55 PM, Joonsoo Kim wrote:
> > On Tue, Nov 24, 2015 at 11:40:28AM +0800, Aaron Lu wrote:
> >> BTW, I'm still learning how to do proper ftrace for this case and it may
> >> take a while.
> > 
> > You can do it simply with trace-cmd.
> > 
> > sudo trace-cmd record -e compaction &
> > run test program
> > fg
> > Ctrl + c
> > 
> > sudo trace-cmd report
> 
> Thanks for the tip, I just recorded it like this:
> trace-cmd record -e compaction ./usemem xxx
> 
> Due to the big size of trace.out(6MB after compress), I've uploaed it:
> https://drive.google.com/open?id=0B49uX3igf4K4UkJBOGt3cHhOU00
> 
> The pagetypeinfo, perf and proc-vmstat is also there.
> 

Thanks.

Okay. Output proves the theory. pagetypeinfo shows that there are
too many unmovable pageblocks. isolate_freepages() should skip these
so it's not easy to meet proper pageblock until need_resched(). Hence,
updating cached pfn doesn't happen. (You can see unchanged free_pfn
with 'grep compaction_begin tracepoint-output')

But, I don't think that updating cached pfn is enough to solve your problem.
More complex change would be needed, I guess.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-24  8:29                     ` Joonsoo Kim
@ 2015-11-25 12:44                       ` Vlastimil Babka
  2015-11-26  5:47                         ` Aaron Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2015-11-25 12:44 UTC (permalink / raw)
  To: Joonsoo Kim, Aaron Lu
  Cc: Linux Memory Management List, Huang Ying, Dave Hansen, Tim Chen,
	lkp, Andrea Arcangeli, David Rientjes

On 11/24/2015 09:29 AM, Joonsoo Kim wrote:
> On Tue, Nov 24, 2015 at 03:27:43PM +0800, Aaron Lu wrote:
> 
> Thanks.
> 
> Okay. Output proves the theory. pagetypeinfo shows that there are
> too many unmovable pageblocks. isolate_freepages() should skip these
> so it's not easy to meet proper pageblock until need_resched(). Hence,
> updating cached pfn doesn't happen. (You can see unchanged free_pfn
> with 'grep compaction_begin tracepoint-output')

Hm to me it seems that the scanners meet a lot, so they restart at zone
boundaries and that's fine. There's nothing to cache.

> But, I don't think that updating cached pfn is enough to solve your problem.
> More complex change would be needed, I guess.

One factor is probably that THP only use async compaction and those don't result
in deferred compaction, which should help here. It also means that
pageblock_skip bits are not being reset except by kswapd...

Oh and pageblock_pfn_to_page is done before checking the pageblock skip bits, so
that's why it's prominent in the profiles. Although it was less prominent (9% vs
46% before) in the last data... was perf collected while tracing, thus
generating extra noise?

> Thanks.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-25 12:44                       ` Vlastimil Babka
@ 2015-11-26  5:47                         ` Aaron Lu
  0 siblings, 0 replies; 15+ messages in thread
From: Aaron Lu @ 2015-11-26  5:47 UTC (permalink / raw)
  To: Vlastimil Babka, Joonsoo Kim
  Cc: Linux Memory Management List, Huang Ying, Dave Hansen, Tim Chen,
	lkp, Andrea Arcangeli, David Rientjes

On 11/25/2015 08:44 PM, Vlastimil Babka wrote:
> On 11/24/2015 09:29 AM, Joonsoo Kim wrote:
>> On Tue, Nov 24, 2015 at 03:27:43PM +0800, Aaron Lu wrote:
>>
>> Thanks.
>>
>> Okay. Output proves the theory. pagetypeinfo shows that there are
>> too many unmovable pageblocks. isolate_freepages() should skip these
>> so it's not easy to meet proper pageblock until need_resched(). Hence,
>> updating cached pfn doesn't happen. (You can see unchanged free_pfn
>> with 'grep compaction_begin tracepoint-output')
> 
> Hm to me it seems that the scanners meet a lot, so they restart at zone
> boundaries and that's fine. There's nothing to cache.
> 
>> But, I don't think that updating cached pfn is enough to solve your problem.
>> More complex change would be needed, I guess.
> 
> One factor is probably that THP only use async compaction and those don't result
> in deferred compaction, which should help here. It also means that
> pageblock_skip bits are not being reset except by kswapd...
> 
> Oh and pageblock_pfn_to_page is done before checking the pageblock skip bits, so
> that's why it's prominent in the profiles. Although it was less prominent (9% vs
> 46% before) in the last data... was perf collected while tracing, thus
> generating extra noise?

The perf is always run during these test runs, it will start 25 seconds
later after the test starts to give it some time to eat the remaining
free memory so that when perf starts collection data, the swap out should
already start. The perf data is collected for 10 seconds.

I guess the test run under trace-cmd is slower before before, so the
perf is collecting data at a different time window.

Regards,
Aaron

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: hugepage compaction causes performance drop
  2015-11-20 10:06       ` Vlastimil Babka
  2015-11-23  8:16         ` Joonsoo Kim
@ 2015-11-24  2:45         ` Joonsoo Kim
  1 sibling, 0 replies; 15+ messages in thread
From: Joonsoo Kim @ 2015-11-24  2:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Aaron Lu, linux-mm, Huang Ying, Dave Hansen, Tim Chen, lkp,
	Andrea Arcangeli, David Rientjes

On Fri, Nov 20, 2015 at 11:06:46AM +0100, Vlastimil Babka wrote:
> On 11/20/2015 10:33 AM, Aaron Lu wrote:
> >On 11/20/2015 04:55 PM, Aaron Lu wrote:
> >>On 11/19/2015 09:29 PM, Vlastimil Babka wrote:
> >>>+CC Andrea, David, Joonsoo
> >>>
> >>>On 11/19/2015 10:29 AM, Aaron Lu wrote:
> >>>>The vmstat and perf-profile are also attached, please let me know if you
> >>>>need any more information, thanks.
> >>>
> >>>Output from vmstat (the tool) isn't much useful here, a periodic "cat
> >>>/proc/vmstat" would be much better.
> >>
> >>No problem.
> >>
> >>>The perf profiles are somewhat weirdly sorted by children cost (?), but
> >>>I noticed a very high cost (46%) in pageblock_pfn_to_page(). This could
> >>>be due to a very large but sparsely populated zone. Could you provide
> >>>/proc/zoneinfo?
> >>
> >>Is a one time /proc/zoneinfo enough or also a periodic one?
> >
> >Please see attached, note that this is a new run so the perf profile is
> >a little different.
> >
> >Thanks,
> >Aaron
> 
> Thanks.
> 
> DMA32 is a bit sparse:
> 
> Node 0, zone    DMA32
>   pages free     62829
>         min      327
>         low      408
>         high     490
>         scanned  0
>         spanned  1044480
>         present  495951
>         managed  479559
> 
> Since the other zones are much larger, probably this is not the
> culprit. But tracepoints should tell us more. I have a theory that
> updating free scanner's cached pfn doesn't happen if it aborts due
> to need_resched() during isolate_freepages(), before hitting a valid
> pageblock, if the zone has a large hole in it. But zoneinfo doesn't

Today, I revisit this issue and yes, I think that your theory is
right. isolate_freepages() will not update cached pfn until call
isolate_freepages_block(). So, if there are many holes or many
unmovable pageblocks or !isolation_suitable() pageblocks, cached pfn
will not updated if compaction aborts due to need_resched(). zoneinfo
shows that there is not much holes so I guess that this problem is caused
by latter two cases.

It is better to update cached pfn in these cases. Although I don't see
your solution yet, I guess it will help here.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-11-26  5:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-19  9:29 hugepage compaction causes performance drop Aaron Lu
2015-11-19 13:29 ` Vlastimil Babka
2015-11-20  8:55   ` Aaron Lu
2015-11-20  9:33     ` Aaron Lu
2015-11-20 10:06       ` Vlastimil Babka
2015-11-23  8:16         ` Joonsoo Kim
2015-11-23  8:33           ` Aaron Lu
2015-11-23  9:24             ` Joonsoo Kim
2015-11-24  3:40               ` Aaron Lu
2015-11-24  4:55                 ` Joonsoo Kim
2015-11-24  7:27                   ` Aaron Lu
2015-11-24  8:29                     ` Joonsoo Kim
2015-11-25 12:44                       ` Vlastimil Babka
2015-11-26  5:47                         ` Aaron Lu
2015-11-24  2:45         ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox