linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Cgroup-based THP control
@ 2024-10-30  8:33 gutierrez.asier
  2024-10-30  8:33 ` [RFC PATCH 1/3] mm: Add thp_flags control for cgroup gutierrez.asier
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: gutierrez.asier @ 2024-10-30  8:33 UTC (permalink / raw)
  To: akpm, david, ryan.roberts, baohua, willy, peterx, hannes, hocko,
	roman.gushchin, shakeel.butt, muchun.song
  Cc: cgroups, linux-mm, linux-kernel, stepanov.anatoly,
	alexander.kozhevnikov, guohanjun, weiyongjun1, wangkefeng.wang,
	judy.chenhui, yusongping, artem.kuzin, kang.sun

From: Asier Gutierrez <gutierrez.asier@huawei-partners.com>

Currently THP modes are set globally. It can be an overkill if only some
specific app/set of apps need to get benefits from THP usage. Moreover, various
apps might need different THP settings. Here we propose a cgroup-based THP
control mechanism.

THP interface is added to memory cgroup subsystem. Existing global THP control
semantics is supported for backward compatibility. When THP modes are set
globally all the changes are propagated to memory cgroups. However, when a
particular cgroup changes its THP policy, the global THP policy in sysfs remains
the same.

New memcg files are exposed: memory.thp_enabled and memory.thp_defrag, which
have completely the same format as global THP enabled/defrag.

Child cgroups inherit THP settings from parent cgroup upon creation. Particular
cgroup mode changes aren't propagated to child cgroups.

During the memory cgroup attachment stage, the correct slots
are added or removed to khugepaged according to the THP
policy.

Usage examples:

Set globally "madvise" mode:
# echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
# cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never

All the settings are propagated
# cat /sys/fs/cgroup/memory.thp_enabled
always [madvise] never

# cat /sys/fs/cgroup/test/memory.thp_enabled
always [madvise] never

Set "always" for some specific cgroup:
# echo always > /sys/fs/cgroup/test/memory.thp_enabled
# cat /sys/fs/cgroup/test/memory.thp_enabled
[always] madvise never

Root cgroup remains with "madvise" mode:
# cat /sys/fs/cgroup/memory.thp_enabled
always [madvise] never

When attempting to read global settings we get "mixed state" warning as the
THP-mode isn't the same for every cgroup:
# cat /sys/kernel/mm/transparent_hugepage/enabled
Mixed state: see particular memcg flags! 

Again, set THP mode globally, make sure everything works fine:
# echo never > /sys/kernel/mm/transparent_hugepage/enabled
# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]

# cat /sys/fs/cgroup/memory.thp_enabled
always madvise [never]

# cat /sys/fs/cgroup/test/memory.thp_enabled
always madvise [never]

Here is a simple demo with a 
test which is doing anon. mmap() and a series of random reads.
System is rebooted between the cases.

Case 1: Global THP - always. No cgroup.

// Global THP stats:
AnonHugePages:    391168 kB
FileHugePages:    120832 kB
FilePmdMapped:     67584 kB

// THP stats from *smaps* of the testing process
AnonHugePages:     12288 kB

Case 2: Global THP - never. Cgroup - always.

// Global THP stats:
AnonHugePages:     12288 kB
FileHugePages:      2048 kB
FilePmdMapped:      2048 kB

// THP stats from *smaps* of the testing process
AnonHugePages:     12288 kB

// The cgroup THP stats
anon_thp 12582912
file_thp 2097152

Obviously there's a huge difference between the two in terms of global THP 
usage, thus showing the cgroup approach is beneficial for such cases, when a 
specific app/set of apps needs THP, but not willing to change anything in the 
app. code.

TODO list:

1. Anonymous mTHP
2. Fine-grained mode selection for different VMA types: "anon|exec|ro|file", to
   be able to support combinations as: "always + exec", "always + anon", etc.
3. Per-cgroup limit for the THP usage


Signed-off-by: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
Signed-off-by: Anatoly Stepanov <stepanov.anatoly@huawei.com>
Reviewed-by: Alexander Kozhevnikov <alexander.kozhevnikov@huawei-partners.com>

Asier Gutierrez, Anatoly Stepanov (3):
  mm: Add thp_flags control for cgroup
  mm: Support for huge pages in cgroups
  mm: Add thp_defrag control for cgroup


 include/linux/huge_mm.h    |  23 +++-
 include/linux/khugepaged.h |   2 +-
 include/linux/memcontrol.h |  28 ++++
 mm/huge_memory.c           | 207 ++++++++++++++++++-----------
 mm/khugepaged.c            |   8 +-
 mm/memcontrol.c            | 262 +++++++++++++++++++++++++++++++++++++
 6 files changed, 449 insertions(+), 81 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-11-01 16:01 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-30  8:33 [RFC PATCH 0/3] Cgroup-based THP control gutierrez.asier
2024-10-30  8:33 ` [RFC PATCH 1/3] mm: Add thp_flags control for cgroup gutierrez.asier
2024-10-30  8:33 ` [RFC PATCH 2/3] mm: Support for huge pages in cgroups gutierrez.asier
2024-10-30  8:33 ` [RFC PATCH 3/3] mm: Add thp_defrag control for cgroup gutierrez.asier
2024-10-30  8:38 ` [RFC PATCH 0/3] Cgroup-based THP control Michal Hocko
2024-10-30 12:51   ` Gutierrez Asier
2024-10-30 13:27     ` Michal Hocko
2024-10-30 14:58       ` Gutierrez Asier
2024-10-30 15:15         ` Michal Hocko
2024-10-31  6:06           ` Stepanov Anatoly
2024-10-31  8:33             ` Michal Hocko
2024-10-31 14:37               ` Stepanov Anatoly
2024-11-01  7:35                 ` Michal Hocko
2024-11-01 11:54                   ` Stepanov Anatoly
2024-11-01 13:15                     ` Michal Hocko
2024-11-01 13:24                       ` Stepanov Anatoly
2024-11-01 13:28                         ` Michal Hocko
2024-11-01 13:39                           ` Stepanov Anatoly
2024-11-01 13:50                             ` Michal Hocko
2024-11-01 14:03                               ` Stepanov Anatoly
2024-11-01 16:01                 ` Matthew Wilcox
2024-10-30 13:14 ` Matthew Wilcox
2024-10-30 13:16   ` David Hildenbrand
2024-10-30 14:45 ` Chris Down
2024-10-30 15:04   ` Michal Hocko
2024-10-30 15:08 ` Johannes Weiner
2024-11-01 12:44   ` Stepanov Anatoly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox