From: David Hildenbrand <david@redhat.com>
To: Stefan Roesch <shr@devkernel.io>, kernel-team@fb.com
Cc: linux-mm@kvack.org, riel@surriel.com, mhocko@suse.com,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
akpm@linux-foundation.org, hannes@cmpxchg.org,
willy@infradead.org, Bagas Sanjaya <bagasdotme@gmail.com>
Subject: Re: [PATCH v6 1/3] mm: add new api to enable ksm per process
Date: Wed, 12 Apr 2023 17:40:32 +0200 [thread overview]
Message-ID: <b3bac995-0d87-a4d7-b261-9cbe3aa901af@redhat.com> (raw)
In-Reply-To: <20230412031648.2206875-2-shr@devkernel.io>
[...]
Thanks for giving mu sugegstions a churn. I think we can further
improve/simplify some things. I added some comments, but might have more
regarding MMF_VM_MERGE_ANY / MMF_VM_MERGEABLE.
[I'll try reowkring your patch after I send this mail to play with some
simplifications]
> arch/s390/mm/gmap.c | 1 +
> include/linux/ksm.h | 23 +++++--
> include/linux/sched/coredump.h | 1 +
> include/uapi/linux/prctl.h | 2 +
> kernel/fork.c | 1 +
> kernel/sys.c | 23 +++++++
> mm/ksm.c | 111 ++++++++++++++++++++++++++-------
> mm/mmap.c | 7 +++
> 8 files changed, 142 insertions(+), 27 deletions(-)
>
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 5a716bdcba05..9d85e5589474 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2591,6 +2591,7 @@ int gmap_mark_unmergeable(void)
> int ret;
> VMA_ITERATOR(vmi, mm, 0);
>
> + clear_bit(MMF_VM_MERGE_ANY, &mm->flags);
Okay, that should keep the existing mechanism working. (but users can
still mess it up)
Might be worth a comment
/*
* Make sure to disable KSM (if enabled for the whole process or
* individual VMAs). Note that nothing currently hinders user space
* from re-enabling it.
*/
> for_each_vma(vmi, vma) {
> /* Copy vm_flags to avoid partial modifications in ksm_madvise */
> vm_flags = vma->vm_flags;
> diff --git a/include/linux/ksm.h b/include/linux/ksm.h
> index 7e232ba59b86..f24f9faf1561 100644
> --- a/include/linux/ksm.h
> +++ b/include/linux/ksm.h
> @@ -18,20 +18,29 @@
> #ifdef CONFIG_KSM
> int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
> unsigned long end, int advice, unsigned long *vm_flags);
> -int __ksm_enter(struct mm_struct *mm);
> -void __ksm_exit(struct mm_struct *mm);
> +
> +int ksm_add_mm(struct mm_struct *mm);
> +void ksm_add_vma(struct vm_area_struct *vma);
> +void ksm_add_vmas(struct mm_struct *mm);
> +
> +int __ksm_enter(struct mm_struct *mm, int flag);
> +void __ksm_exit(struct mm_struct *mm, int flag);
>
> static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
> {
> + if (test_bit(MMF_VM_MERGE_ANY, &oldmm->flags))
> + return ksm_add_mm(mm);
ksm_fork() runs before copying any VMAs. Copying the bit should be
sufficient.
Would it be possible to rework to something like:
if (test_bit(MMF_VM_MERGE_ANY, &oldmm->flags))
set_bit(MMF_VM_MERGE_ANY, &mm->flags)
if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags))
return __ksm_enter(mm);
work? IOW, not exporting ksm_add_mm() and not passing a flag to
__ksm_enter() -- it would simply set MMF_VM_MERGEABLE ?
I rememebr proposing that enabling MMF_VM_MERGE_ANY would simply enable
MMF_VM_MERGEABLE.
> if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags))
> - return __ksm_enter(mm);
> + return __ksm_enter(mm, MMF_VM_MERGEABLE);
> return 0;
> }
>
> static inline void ksm_exit(struct mm_struct *mm)
> {
> - if (test_bit(MMF_VM_MERGEABLE, &mm->flags))
> - __ksm_exit(mm);
> + if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
> + __ksm_exit(mm, MMF_VM_MERGE_ANY);
> + else if (test_bit(MMF_VM_MERGEABLE, &mm->flags))
> + __ksm_exit(mm, MMF_VM_MERGEABLE);
Can we do
if (test_bit(MMF_VM_MERGEABLE, &mm->flags))
__ksm_exit(mm);
And simply let __ksm_exit() clear both bits?
> }
>
> /*
> @@ -53,6 +62,10 @@ void folio_migrate_ksm(struct folio *newfolio, struct folio *folio);
>
> #else /* !CONFIG_KSM */
>
[...]
> #endif /* _LINUX_SCHED_COREDUMP_H */
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 1312a137f7fb..759b3f53e53f 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -290,4 +290,6 @@ struct prctl_mm_map {
> #define PR_SET_VMA 0x53564d41
> # define PR_SET_VMA_ANON_NAME 0
>
> +#define PR_SET_MEMORY_MERGE 67
> +#define PR_GET_MEMORY_MERGE 68
> #endif /* _LINUX_PRCTL_H */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index f68954d05e89..1520697cf6c7 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -686,6 +686,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> if (vma_iter_bulk_store(&vmi, tmp))
> goto fail_nomem_vmi_store;
>
> + ksm_add_vma(tmp);
Is this really required? The relevant VMAs should have VM_MERGEABLE set.
> mm->map_count++;
> if (!(tmp->vm_flags & VM_WIPEONFORK))
> retval = copy_page_range(tmp, mpnt);
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 495cd87d9bf4..9bba163d2d04 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -15,6 +15,7 @@
> #include <linux/highuid.h>
> #include <linux/fs.h>
> #include <linux/kmod.h>
> +#include <linux/ksm.h>
> #include <linux/perf_event.h>
> #include <linux/resource.h>
> #include <linux/kernel.h>
> @@ -2661,6 +2662,28 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> case PR_SET_VMA:
> error = prctl_set_vma(arg2, arg3, arg4, arg5);
> break;
> +#ifdef CONFIG_KSM
> + case PR_SET_MEMORY_MERGE:
> + if (mmap_write_lock_killable(me->mm))
> + return -EINTR;
> +
> + if (arg2) {
> + int err = ksm_add_mm(me->mm);
> +
> + if (!err)
> + ksm_add_vmas(me->mm);
> + } else {
> + clear_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
Okay, so disabling doesn't actually unshare anything.
> + }
> + mmap_write_unlock(me->mm);
> + break;
> + case PR_GET_MEMORY_MERGE:
> + if (arg2 || arg3 || arg4 || arg5)
> + return -EINVAL;
> +
> + error = !!test_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
> + break;
> +#endif
> default:
> error = -EINVAL;
> break;
> diff --git a/mm/ksm.c b/mm/ksm.c
> index d7bd28199f6c..ab95ae0f9def 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -534,10 +534,33 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr,
> return (ret & VM_FAULT_OOM) ? -ENOMEM : 0;
> }
>
> +static bool vma_ksm_compatible(struct vm_area_struct *vma)
> +{
> + if (vma->vm_flags & (VM_SHARED | VM_MAYSHARE | VM_PFNMAP |
> + VM_IO | VM_DONTEXPAND | VM_HUGETLB |
> + VM_MIXEDMAP))
> + return false; /* just ignore the advice */
> +
> + if (vma_is_dax(vma))
> + return false;
> +
> +#ifdef VM_SAO
> + if (vma->vm_flags & VM_SAO)
> + return false;
> +#endif
> +#ifdef VM_SPARC_ADI
> + if (vma->vm_flags & VM_SPARC_ADI)
> + return false;
> +#endif
> +
> + return true;
> +}
> +
> static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm,
> unsigned long addr)
> {
> struct vm_area_struct *vma;
> +
unrelated change
> if (ksm_test_exit(mm))
> return NULL;
> vma = vma_lookup(mm, addr);
> @@ -1065,6 +1088,7 @@ static int unmerge_and_remove_all_rmap_items(void)
>
> mm_slot_free(mm_slot_cache, mm_slot);
> clear_bit(MMF_VM_MERGEABLE, &mm->flags);
> + clear_bit(MMF_VM_MERGE_ANY, &mm->flags);
> mmdrop(mm);
> } else
> spin_unlock(&ksm_mmlist_lock);
> @@ -2495,6 +2519,7 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>
> mm_slot_free(mm_slot_cache, mm_slot);
> clear_bit(MMF_VM_MERGEABLE, &mm->flags);
> + clear_bit(MMF_VM_MERGE_ANY, &mm->flags);
> mmap_read_unlock(mm);
> mmdrop(mm);
> } else {
> @@ -2571,6 +2596,63 @@ static int ksm_scan_thread(void *nothing)
> return 0;
> }
>
> +static void __ksm_add_vma(struct vm_area_struct *vma)
> +{
> + unsigned long vm_flags = vma->vm_flags;
> +
> + if (vm_flags & VM_MERGEABLE)
> + return;
> +
> + if (vma_ksm_compatible(vma)) {
> + vm_flags |= VM_MERGEABLE;
> + vm_flags_reset(vma, vm_flags);
> + }
> +}
> +
> +/**
> + * ksm_add_vma - Mark vma as mergeable
"if compatible"
> + *
> + * @vma: Pointer to vma
> + */
> +void ksm_add_vma(struct vm_area_struct *vma)
> +{
> + struct mm_struct *mm = vma->vm_mm;
> +
> + if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
> + __ksm_add_vma(vma);
> +}
> +
> +/**
> + * ksm_add_vmas - Mark all vma's of a process as mergeable
> + *
> + * @mm: Pointer to mm
> + */
> +void ksm_add_vmas(struct mm_struct *mm)
I'd suggest calling this
> +{
> + struct vm_area_struct *vma;
> +
> + VMA_ITERATOR(vmi, mm, 0);
> + for_each_vma(vmi, vma)
> + __ksm_add_vma(vma);
> +}
> +
> +/**
> + * ksm_add_mm - Add mm to mm ksm list
> + *
> + * @mm: Pointer to mm
> + *
> + * Returns 0 on success, otherwise error code
> + */
> +int ksm_add_mm(struct mm_struct *mm)
> +{
> + if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
> + return -EINVAL;
> + if (test_bit(MMF_VM_MERGEABLE, &mm->flags))
> + return -EINVAL;
> +
> + return __ksm_enter(mm, MMF_VM_MERGE_ANY);
> +}
> +
> int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
> unsigned long end, int advice, unsigned long *vm_flags)
> {
> @@ -2579,28 +2661,13 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
>
> switch (advice) {
> case MADV_MERGEABLE:
> - /*
> - * Be somewhat over-protective for now!
> - */
> - if (*vm_flags & (VM_MERGEABLE | VM_SHARED | VM_MAYSHARE |
> - VM_PFNMAP | VM_IO | VM_DONTEXPAND |
> - VM_HUGETLB | VM_MIXEDMAP))
> - return 0; /* just ignore the advice */
> -
> - if (vma_is_dax(vma))
> + if (vma->vm_flags & VM_MERGEABLE)
> return 0;
> -
> -#ifdef VM_SAO
> - if (*vm_flags & VM_SAO)
> + if (!vma_ksm_compatible(vma))
> return 0;
> -#endif
> -#ifdef VM_SPARC_ADI
> - if (*vm_flags & VM_SPARC_ADI)
> - return 0;
> -#endif
>
> if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) {
> - err = __ksm_enter(mm);
> + err = __ksm_enter(mm, MMF_VM_MERGEABLE);
> if (err)
> return err;
> }
> @@ -2626,7 +2693,7 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
> }
> EXPORT_SYMBOL_GPL(ksm_madvise);
>
> -int __ksm_enter(struct mm_struct *mm)
> +int __ksm_enter(struct mm_struct *mm, int flag)
> {
> struct ksm_mm_slot *mm_slot;
> struct mm_slot *slot;
> @@ -2659,7 +2726,7 @@ int __ksm_enter(struct mm_struct *mm)
> list_add_tail(&slot->mm_node, &ksm_scan.mm_slot->slot.mm_node);
> spin_unlock(&ksm_mmlist_lock);
>
> - set_bit(MMF_VM_MERGEABLE, &mm->flags);
> + set_bit(flag, &mm->flags);
> mmgrab(mm);
>
> if (needs_wakeup)
> @@ -2668,7 +2735,7 @@ int __ksm_enter(struct mm_struct *mm)
> return 0;
> }
>
> -void __ksm_exit(struct mm_struct *mm)
> +void __ksm_exit(struct mm_struct *mm, int flag)
> {
> struct ksm_mm_slot *mm_slot;
> struct mm_slot *slot;
> @@ -2700,7 +2767,7 @@ void __ksm_exit(struct mm_struct *mm)
>
> if (easy_to_free) {
> mm_slot_free(mm_slot_cache, mm_slot);
> - clear_bit(MMF_VM_MERGEABLE, &mm->flags);
> + clear_bit(flag, &mm->flags);
> mmdrop(mm);
> } else if (mm_slot) {
> mmap_write_lock(mm);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 740b54be3ed4..483e182e0b9d 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -46,6 +46,7 @@
> #include <linux/pkeys.h>
> #include <linux/oom.h>
> #include <linux/sched/mm.h>
> +#include <linux/ksm.h>
>
> #include <linux/uaccess.h>
> #include <asm/cacheflush.h>
> @@ -2213,6 +2214,8 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> /* vma_complete stores the new vma */
> vma_complete(&vp, vmi, vma->vm_mm);
>
> + ksm_add_vma(new);
> +
Splitting a VMA shouldn't modify VM_MERGEABLE, so I assume this is not
required?
> /* Success. */
> if (new_below)
> vma_next(vmi);
> @@ -2664,6 +2667,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> if (file && vm_flags & VM_SHARED)
> mapping_unmap_writable(file->f_mapping);
> file = vma->vm_file;
> + ksm_add_vma(vma);
> expanded:
> perf_event_mmap(vma);
>
> @@ -2936,6 +2940,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> goto mas_store_fail;
>
> mm->map_count++;
> + ksm_add_vma(vma);
> out:
> perf_event_mmap(vma);
> mm->total_vm += len >> PAGE_SHIFT;
> @@ -3180,6 +3185,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
> if (vma_link(mm, new_vma))
> goto out_vma_link;
> *need_rmap_locks = false;
> + ksm_add_vma(new_vma);
Copying shouldn't modify VM_MERGEABLE, so I think this is not required?
> }
> validate_mm_mt(mm);
> return new_vma;
> @@ -3356,6 +3362,7 @@ static struct vm_area_struct *__install_special_mapping(
> vm_stat_account(mm, vma->vm_flags, len >> PAGE_SHIFT);
>
> perf_event_mmap(vma);
> + ksm_add_vma(vma);
IIUC, special mappings will never be considered a reasonable target for
KSM (especially, because at least VM_DONTEXPAND is always set).
I think you can just drop this call.
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2023-04-12 15:40 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-12 3:16 [PATCH v6 0/3] mm: process/cgroup ksm support Stefan Roesch
2023-04-12 3:16 ` [PATCH v6 1/3] mm: add new api to enable ksm per process Stefan Roesch
2023-04-12 13:20 ` Matthew Wilcox
2023-04-12 16:08 ` Stefan Roesch
2023-04-12 16:29 ` Matthew Wilcox
2023-04-12 15:40 ` David Hildenbrand [this message]
2023-04-12 16:44 ` Stefan Roesch
2023-04-12 18:41 ` David Hildenbrand
2023-04-12 19:08 ` David Hildenbrand
2023-04-12 19:55 ` Stefan Roesch
2023-04-13 9:46 ` David Hildenbrand
2023-04-12 3:16 ` [PATCH v6 2/3] mm: add new KSM process and sysfs knobs Stefan Roesch
2023-04-12 3:16 ` [PATCH v6 3/3] selftests/mm: add new selftests for KSM Stefan Roesch
2023-04-13 13:07 ` David Hildenbrand
2023-04-13 13:08 ` David Hildenbrand
2023-04-13 16:32 ` Stefan Roesch
2023-04-13 18:09 ` Stefan Roesch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b3bac995-0d87-a4d7-b261-9cbe3aa901af@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bagasdotme@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=riel@surriel.com \
--cc=shr@devkernel.io \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox