From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F487C76196 for ; Mon, 3 Apr 2023 16:30:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B64656B0071; Mon, 3 Apr 2023 12:30:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AED586B0074; Mon, 3 Apr 2023 12:30:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 966DA6B0075; Mon, 3 Apr 2023 12:30:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7E28F6B0071 for ; Mon, 3 Apr 2023 12:30:13 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 31A581C6737 for ; Mon, 3 Apr 2023 16:30:13 +0000 (UTC) X-FDA: 80640617106.28.6FDFF4A Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) by imf25.hostedemail.com (Postfix) with ESMTP id 775FDA0012 for ; Mon, 3 Apr 2023 16:30:09 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=devkernel.io header.s=fm3 header.b=RaeUC6Fe; dkim=pass header.d=messagingengine.com header.s=fm2 header.b=TE7mdmvd; spf=pass (imf25.hostedemail.com: domain of shr@devkernel.io designates 66.111.4.29 as permitted sender) smtp.mailfrom=shr@devkernel.io; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680539409; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OIOsSOFpmWZkshwfP4APr12ORwkJEGK1I/Ih5Fr1hno=; b=E+xr68KGwePZDiN9eGKIHAd3QYahNX2zzx1y4Wb0S0Rb0nqT4FqJR7Drbr/NPoDH0uA784 e81gJxVPDs6XvVRgxVwJBp4GTSCUcgPgBIdcFInGbhA4pk94WrIR+V41Kjfjosr69uI4oK JiRNPwjvpgjCGxwerP3u5VbIFPqnQ5c= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=devkernel.io header.s=fm3 header.b=RaeUC6Fe; dkim=pass header.d=messagingengine.com header.s=fm2 header.b=TE7mdmvd; spf=pass (imf25.hostedemail.com: domain of shr@devkernel.io designates 66.111.4.29 as permitted sender) smtp.mailfrom=shr@devkernel.io; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680539409; a=rsa-sha256; cv=none; b=a0P3xWG7769lat5ebBvD9vixHuAbaBOzGdUnARJeYiV6JmifvFMdMJUDT91S0rcHVQinEY H7EQgwD/usqoRot0TtERW78qoQ1htbcFQSYNDY+FHNGkgpSG/BUbD1QlmoITpj7b1XiVwi 12MwVeKDJnREq1CiYbmfhgVRYlzbXHQ= Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id BEFDC5C01B0; Mon, 3 Apr 2023 12:30:08 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Mon, 03 Apr 2023 12:30:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=devkernel.io; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm3; t=1680539408; x=1680625808; bh=OI OsSOFpmWZkshwfP4APr12ORwkJEGK1I/Ih5Fr1hno=; b=RaeUC6Fe3m5Ks0v7fr nJVdJP2ZlnO4ClzF8FQCHdETWcezuie8Y7npSDRxODCOXUYSxHD8ElsCdMLPANb4 pf0WO+LuFZ5LXYGeaWd52QP74sQkTm4uAFjyIIkntIwaWk1fK5nYzvQwP75Tpn9r Kvps3QByV47HiU/VKAUuHBa+SmWkxz7kNF9e5/q0pCS0I6YYw/9q6OSh3WddKcZE W2tQLJS3/bOGbndK5rFzfQ8VQofMBRy2RHVMRwrSVxgZ0aXUNwAlECMlmXxFi+LN 5qG/QcPDZ+r6RnXGzjjc3SD88QCQBemdun7x6mhnjHHlB9nPlnkIrypF8GwA9Rbc FCEg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1680539408; x=1680625808; bh=OIOsSOFpmWZks hwfP4APr12ORwkJEGK1I/Ih5Fr1hno=; b=TE7mdmvduUPYZKxxzW1Zkz+mnQ1JX duhZ1gzmJTNRuyFRQbG4emdGXxkBWl15xpMdBjTWSjOJu44LWYd5Cz2L5A/L6+Nn Pn5x5VBVCc+t48gAvaXiIiBdvsbxfnnOviT/ljhsR9eCppcO7NGv6dTrLfejWVwe VOut0Wpm9HqGIM25L0Zsy1dr74DCrumbcOOFxhiuFSQIWYkHWhtNNPFCoyEdKBnh U00tKAdSJTl9eVFEmwKpy5aANrOkkvi/0GlQCdEjpDg2/rszOLXMUL6R0SePT0mz 0gfwUEJtUnA+5kIG6CJSMELQkFYATjvypUGuceknswhgMOm1GBPUC9RAQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrvdeijedguddtudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpehffgfhvfevufffjgfkgggtsehttdertddtredtnecuhfhrohhmpefuthgv fhgrnhcutfhovghstghhuceoshhhrhesuggvvhhkvghrnhgvlhdrihhoqeenucggtffrrg htthgvrhhnpefhgfejveduudehvdduvefguddvffefgffhieeuudefudfhhfeijeefjeff heelvdenucffohhmrghinhepkhgvrhhnvghlrdhorhhgnecuvehluhhsthgvrhfuihiivg eptdenucfrrghrrghmpehmrghilhhfrhhomhepshhhrhesuggvvhhkvghrnhgvlhdrihho X-ME-Proxy: Feedback-ID: i84614614:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 3 Apr 2023 12:30:06 -0400 (EDT) References: <20230310182851.2579138-1-shr@devkernel.io> <20230310182851.2579138-2-shr@devkernel.io> <7ed4308d-b400-d2bb-b539-3fe418862ab8@redhat.com> User-agent: mu4e 1.6.11; emacs 28.2.50 From: Stefan Roesch To: David Hildenbrand Cc: kernel-team@fb.com, linux-mm@kvack.org, riel@surriel.com, mhocko@suse.com, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, akpm@linux-foundation.org, hannes@cmpxchg.org, Bagas Sanjaya Subject: Re: [PATCH v4 1/3] mm: add new api to enable ksm per process Date: Mon, 03 Apr 2023 08:50:55 -0700 In-reply-to: <7ed4308d-b400-d2bb-b539-3fe418862ab8@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 775FDA0012 X-Stat-Signature: hj1gsx7g9aw7ostb5ribnyd5nrcau84w X-HE-Tag: 1680539409-319596 X-HE-Meta: U2FsdGVkX1/dnl3J2ZQgoEeIj2yxrTqIdOWDbTBEHMuDm82NlOV5YChSeCY3n8XPxiJy5Tdqfp9FP9NRqNmGhIsP7e994Qm6L3Q++CHCUFrVx65zqb3h+kWnKRpOsTYOamPQjTY/8n4lu5zOQ6fIWGnmsaidxlBU3OsIamt84gQlEioiZ3G0TPicCuOlOJRoP+3K4qDdVpskKKcyoPHEKAlACgD4uA59iqZnRTxOCR9FXItLY2yAQNnQg6DDstq1YvRc030nDgW8VCMASnhDPOeGWEgHOWESiAhojH5O7h4XRL7LoJd1pvya2yW6tinldKqej1RyYLRIIMqbSCT23OTjfr/XzDXSQPDzRhhpBoVQ80tAQexdQwbMZn02I4PAM+WdXVzrKrvx1q3+W2J1ySloSLiLFZljCdjsL5qrrJ7dqfgfvdZonafeA0DKXHoLiEub1dQvLi1VIDegKBYSxw+94yAh8sJiP8K/eaRFWtAXXFqLhuoz6fjlEUTm8iUSzXwtPgHXMFcBKDEHRq8nRWxip9ZHhrcUj8lATz+Gbv8fB+GLcJoJDVY9mHgZK6YhwmOYQW2/bjpu0VrgLfZ2t1HcPtduvod8CQKVp0feThh5+hhMs6K+J1+rYSitqT1vfRCvqstkf4ZIfi5fcC9u5PaHIMbVVh1L0z/HS9AuRWgkQpSMiopgsol8Apg1d4ZlXttITMaAtlb7vB71Nbi48ltqgTKaIep98TIRw5cXS4l9/643CIqg7TSdJKKTV66RxdXawhda4+QxxBnzKWugvLh6tOndMihnUHXHOC+DUEMF6fr8syLdPzU80/xsKfUfzZbreO7mcwFFpwaMhEPIOXwX6s/spbB0cB5rfQ8JM5ZmNU0YThGIuVSoxFPoaPIuXP2d/YneiiXAGeT2/3x2+H4U4xwy8qGtUaiA00w82zJl9IK2DIoz1xpAVy0zavnqaW+feWiT8lgSVKRj6uO rBRe2D+T yP/+3vZtP6rKGdLO6b4scZu6iaxSBIpaOUNe2FHtd4Pckk4NrY0ovxRDqZTPi6KDeyuWc0csZqsabSNPuTxBKUCwLUAXUFWzpnhWm71wJWalWsIC2yJfYHwLVKQjLMG2Sivul3bQCI4ZYairI3hG/fPFYb1zXLtLgv1WF2Qu6UVFQkc93TJT+eVX0VRXyg+XGtaQtRCwzu8fXL7dVlNcbtyJECWmVbE4EwRiSHw4Vuvx+/nhjI+DayPgEleKfl8LjfRhGC6gUh0ACZH3UospMnK3GM+aJawHDoguxJ1pVDfOC9xjbhn1VSumu+70VA7g+E7GgFIoNazL8enGH/Uep/37mr2VZ7Oro6wuYiXoZRWglNs9uxYZVnrnDx5SK1X3r5QNsvJaf8P0l0p7P8vGbymk2HMI+jFfVLQvhSKNx9o0WY0504BcMthFDo0HfJyDgyH03AGRGc4xv3tdZdEhf300mYsBdGhT3F4vBV1DL78kkdkg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: David Hildenbrand writes: > On 10.03.23 19:28, Stefan Roesch wrote: >> Patch series "mm: process/cgroup ksm support", v3. >> So far KSM can only be enabled by calling madvise for memory regions. To >> be able to use KSM for more workloads, KSM needs to have the ability to be >> enabled / disabled at the process / cgroup level. >> Use case 1: >> The madvise call is not available in the programming language. An >> example for this are programs with forked workloads using a garbage >> collected language without pointers. In such a language madvise cannot >> be made available. >> In addition the addresses of objects get moved around as they are >> garbage collected. KSM sharing needs to be enabled "from the outside" >> for these type of workloads. > > I guess the interpreter could enable it (like a memory allocator could enable it > for the whole heap). But I get that it's much easier to enable this per-process, > and eventually only when a lot of the same processes are running in that > particular environment. > We don't want it to get enabled for all workloads of that interpreter, instead we want to be able to select for which workloads we enable KSM. >> Use case 2: >> The same interpreter can also be used for workloads where KSM brings >> no benefit or even has overhead. We'd like to be able to enable KSM on >> a workload by workload basis. > > Agreed. A per-process control is also helpful to identidy workloads where KSM > might be beneficial (and to which degree). > >> Use case 3: >> With the madvise call sharing opportunities are only enabled for the >> current process: it is a workload-local decision. A considerable number >> of sharing opportuniites may exist across multiple workloads or jobs. >> Only a higler level entity like a job scheduler or container can know >> for certain if its running one or more instances of a job. That job >> scheduler however doesn't have the necessary internal worklaod knowledge >> to make targeted madvise calls. >> Security concerns: >> In previous discussions security concerns have been brought up. The >> problem is that an individual workload does not have the knowledge about >> what else is running on a machine. Therefore it has to be very >> conservative in what memory areas can be shared or not. However, if the >> system is dedicated to running multiple jobs within the same security >> domain, its the job scheduler that has the knowledge that sharing can be >> safely enabled and is even desirable. >> Performance: >> Experiments with using UKSM have shown a capacity increase of around >> 20%. >> > > As raised, it would be great to include more details about the workload where > this particulalry helps (e.g., a lot of Django processes operating in the same > domain). > I can add that the django processes are part of the same domain with the next version of the patch series. >> 1. New options for prctl system command >> This patch series adds two new options to the prctl system call. >> The first one allows to enable KSM at the process level and the second >> one to query the setting. >> The setting will be inherited by child processes. >> With the above setting, KSM can be enabled for the seed process of a >> cgroup and all processes in the cgroup will inherit the setting. >> 2. Changes to KSM processing >> When KSM is enabled at the process level, the KSM code will iterate >> over all the VMA's and enable KSM for the eligible VMA's. >> When forking a process that has KSM enabled, the setting will be >> inherited by the new child process. >> In addition when KSM is disabled for a process, KSM will be disabled >> for the VMA's where KSM has been enabled. > > Do we want to make MADV_MERGEABLE/MADV_UNMERGEABLE fail while the new prctl is > enabled for a process? I decided to allow enabling KSM with prctl even when MADV_MERGEABLE, this allows more flexibility. > >> 3. Add general_profit metric >> The general_profit metric of KSM is specified in the documentation, >> but not calculated. This adds the general profit metric to >> /sys/kernel/debug/mm/ksm. >> 4. Add more metrics to ksm_stat >> This adds the process profit and ksm type metric to >> /proc//ksm_stat. >> 5. Add more tests to ksm_tests >> This adds an option to specify the merge type to the ksm_tests. >> This allows to test madvise and prctl KSM. It also adds a new option >> to query if prctl KSM has been enabled. It adds a fork test to verify >> that the KSM process setting is inherited by client processes. >> An update to the prctl(2) manpage has been proposed at [1]. >> This patch (of 3): >> This adds a new prctl to API to enable and disable KSM on a per process >> basis instead of only at the VMA basis (with madvise). >> 1) Introduce new MMF_VM_MERGE_ANY flag >> This introduces the new flag MMF_VM_MERGE_ANY flag. When this flag >> is set, kernel samepage merging (ksm) gets enabled for all vma's of a >> process. >> 2) add flag to __ksm_enter >> This change adds the flag parameter to __ksm_enter. This allows to >> distinguish if ksm was called by prctl or madvise. >> 3) add flag to __ksm_exit call >> This adds the flag parameter to the __ksm_exit() call. This allows >> to distinguish if this call is for an prctl or madvise invocation. >> 4) invoke madvise for all vmas in scan_get_next_rmap_item >> If the new flag MMF_VM_MERGE_ANY has been set for a process, iterate >> over all the vmas and enable ksm if possible. For the vmas that can be >> ksm enabled this is only done once. >> 5) support disabling of ksm for a process >> This adds the ability to disable ksm for a process if ksm has been >> enabled for the process. >> 6) add new prctl option to get and set ksm for a process >> This adds two new options to the prctl system call >> - enable ksm for all vmas of a process (if the vmas support it). >> - query if ksm has been enabled for a process. > > > Did you consider, instead of handling MMF_VM_MERGE_ANY in a special way, to > instead make it reuse the existing MMF_VM_MERGEABLE/VM_MERGEABLE infrastructure. > Especially: > > 1) During prctl(MMF_VM_MERGE_ANY), set VM_MERGABLE on all applicable > compatible. Further, set MMF_VM_MERGEABLE and enter KSM if not > already set. > > 2) When creating a new, compatible VMA and MMF_VM_MERGE_ANY is set, set > VM_MERGABLE? > > The you can avoid all runtime checks for compatible VMAs and only look at the > VM_MERGEABLE flag. In fact, the VM_MERGEABLE will be completely expressive then > for all VMAs. You don't need vma_ksm_mergeable() then. > I didn't consider the above approach, I can have a look. I can see the benefit of not needing vma_ksm_mergeable(). > Another thing to consider is interaction with arch/s390/mm/gmap.c: s390x/kvm > does not support KSM and it has to disable it for all VMAs. We have to find a > way to fence the prctl (for example, fail setting the prctl after > gmap_mark_unmergeable() ran, and make gmap_mark_unmergeable() fail if the prctl > ran -- or handle it gracefully in some other way). > > I'll have a look. >> Link: https://lkml.kernel.org/r/20230227220206.436662-1-shr@devkernel.io [1] >> Link: https://lkml.kernel.org/r/20230224044000.3084046-1-shr@devkernel.io >> Link: https://lkml.kernel.org/r/20230224044000.3084046-2-shr@devkernel.io >> Signed-off-by: Stefan Roesch >> Cc: David Hildenbrand >> Cc: Johannes Weiner >> Cc: Michal Hocko >> Cc: Rik van Riel >> Cc: Bagas Sanjaya >> Signed-off-by: Andrew Morton >> --- >> include/linux/ksm.h | 14 ++++-- >> include/linux/sched/coredump.h | 1 + >> include/uapi/linux/prctl.h | 2 + >> kernel/sys.c | 27 ++++++++++ >> mm/ksm.c | 90 +++++++++++++++++++++++----------- >> 5 files changed, 101 insertions(+), 33 deletions(-) >> diff --git a/include/linux/ksm.h b/include/linux/ksm.h >> index 7e232ba59b86..d38a05a36298 100644 >> --- a/include/linux/ksm.h >> +++ b/include/linux/ksm.h >> @@ -18,20 +18,24 @@ >> #ifdef CONFIG_KSM >> int ksm_madvise(struct vm_area_struct *vma, unsigned long start, >> unsigned long end, int advice, unsigned long *vm_flags); >> -int __ksm_enter(struct mm_struct *mm); >> -void __ksm_exit(struct mm_struct *mm); >> +int __ksm_enter(struct mm_struct *mm, int flag); >> +void __ksm_exit(struct mm_struct *mm, int flag); >> static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) >> { >> + if (test_bit(MMF_VM_MERGE_ANY, &oldmm->flags)) >> + return __ksm_enter(mm, MMF_VM_MERGE_ANY); >> if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags)) >> - return __ksm_enter(mm); >> + return __ksm_enter(mm, MMF_VM_MERGEABLE); >> return 0; >> } >> static inline void ksm_exit(struct mm_struct *mm) >> { >> - if (test_bit(MMF_VM_MERGEABLE, &mm->flags)) >> - __ksm_exit(mm); >> + if (test_bit(MMF_VM_MERGE_ANY, &mm->flags)) >> + __ksm_exit(mm, MMF_VM_MERGE_ANY); >> + else if (test_bit(MMF_VM_MERGEABLE, &mm->flags)) >> + __ksm_exit(mm, MMF_VM_MERGEABLE); >> } >> /* >> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h >> index 0e17ae7fbfd3..0ee96ea7a0e9 100644 >> --- a/include/linux/sched/coredump.h >> +++ b/include/linux/sched/coredump.h >> @@ -90,4 +90,5 @@ static inline int get_dumpable(struct mm_struct *mm) >> #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ >> MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK) >> +#define MMF_VM_MERGE_ANY 29 >> #endif /* _LINUX_SCHED_COREDUMP_H */ >> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h >> index 1312a137f7fb..759b3f53e53f 100644 >> --- a/include/uapi/linux/prctl.h >> +++ b/include/uapi/linux/prctl.h >> @@ -290,4 +290,6 @@ struct prctl_mm_map { >> #define PR_SET_VMA 0x53564d41 >> # define PR_SET_VMA_ANON_NAME 0 >> +#define PR_SET_MEMORY_MERGE 67 >> +#define PR_GET_MEMORY_MERGE 68 >> #endif /* _LINUX_PRCTL_H */ >> diff --git a/kernel/sys.c b/kernel/sys.c >> index 495cd87d9bf4..edc439b1cae9 100644 >> --- a/kernel/sys.c >> +++ b/kernel/sys.c >> @@ -15,6 +15,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -2661,6 +2662,32 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, >> case PR_SET_VMA: >> error = prctl_set_vma(arg2, arg3, arg4, arg5); >> break; >> +#ifdef CONFIG_KSM >> + case PR_SET_MEMORY_MERGE: >> + if (!capable(CAP_SYS_RESOURCE)) >> + return -EPERM; >> + >> + if (arg2) { >> + if (mmap_write_lock_killable(me->mm)) >> + return -EINTR; >> + >> + if (!test_bit(MMF_VM_MERGE_ANY, &me->mm->flags)) >> + error = __ksm_enter(me->mm, MMF_VM_MERGE_ANY); > > Hm, I think this might be problematic if we alread called __ksm_enter() via > madvise(). Maybe we should really consider making MMF_VM_MERGE_ANY set > MMF_VM_MERGABLE instead. Like: > > error = 0; > if(test_bit(MMF_VM_MERGEABLE, &me->mm->flags)) > error = __ksm_enter(me->mm); > if (!error) > set_bit(MMF_VM_MERGE_ANY, &me->mm->flags); > If we make that change, we would no longer be able to distinguish if MMF_VM_MERGEABLE or MMF_VM_MERGE_ANY have been set. >> + mmap_write_unlock(me->mm); >> + } else { >> + __ksm_exit(me->mm, MMF_VM_MERGE_ANY); > > Hm, I'd prefer if we really only call __ksm_exit() when we really exit the > process. Is there a strong requirement to optimize disabling of KSM or would it > be sufficient to clear the MMF_VM_MERGE_ANY flag here? > Then we still have the mm_slot allocated until the process gets terminated. > Also, I wonder what happens if we have another VMA in that process that has it > enabled .. > > Last but not least, wouldn't we want to do the same thing as MADV_UNMERGEABLE > and actually unmerge the KSM pages? > Do you want to call unmerge for all VMA's? > > It smells like it could be simpler and more consistent to handle by letting > PR_SET_MEMORY_MERGE piggy-back on MMF_VM_MERGABLE/VM_MERGABLE and mimic what > ksm_madvise() does simply for all VMAs. > >> --- a/mm/ksm.c >> +++ b/mm/ksm.c >> @@ -534,16 +534,58 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr, >> return (ret & VM_FAULT_OOM) ? -ENOMEM : 0; >> } >> +static bool vma_ksm_compatible(struct vm_area_struct *vma) >> +{ >> + /* >> + * Be somewhat over-protective for now! >> + */ >> + if (vma->vm_flags & (VM_MERGEABLE | VM_SHARED | VM_MAYSHARE | >> + VM_PFNMAP | VM_IO | VM_DONTEXPAND | >> + VM_HUGETLB | VM_MIXEDMAP)) >> + return false; /* just ignore the advice */ > > That comment is kind-of stale and ksm_madvise() specific. > I'll remove the comment. >> + > > The VM_MERGEABLE check is really only used for ksm_madvise() to return > immediately. I suggest keeping it in ksm_madvise() -- "Already enabled". > Returning "false" in that case looks wrong (it's not broken because you do an > early check in vma_ksm_mergeable(), it's just semantically weird). > I'll make it in ksm_madvise and remove it here.