linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, patches@lists.linux.dev,
	tglx@linutronix.de, linux-crypto@vger.kernel.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
	Carlos O'Donell <carlos@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>, Jann Horn <jannh@google.com>,
	Christian Brauner <brauner@kernel.org>,
	David Hildenbrand <dhildenb@redhat.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH v21 1/4] mm: add VM_DROPPABLE for designating always lazily freeable mappings
Date: Mon, 8 Jul 2024 22:21:09 +0200	[thread overview]
Message-ID: <75d6c45d-deea-464d-b0fd-b36e5d73b898@redhat.com> (raw)
In-Reply-To: <Zov6SZZCKrqmigua@zx2c4.com>

On 08.07.24 16:40, Jason A. Donenfeld wrote:
> Hi David, Linus,
> 
> Below is what I understand the suggestions about the UX to be. The full
> commit is in https://git.zx2c4.com/linux-rng/log/ but here's the part
> we've been discussing. I've held off on David's suggestion changing
> "DROPPABLE" to "VOLATILE" to give Linus some time to wake up on the west
> coast and voice his preference for "DROPPABLE". But the rest is in
> place.
> 
> Jason
> 
> diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
> index a246e11988d5..e89d00528f2f 100644
> --- a/include/uapi/linux/mman.h
> +++ b/include/uapi/linux/mman.h
> @@ -17,6 +17,7 @@
>   #define MAP_SHARED	0x01		/* Share changes */
>   #define MAP_PRIVATE	0x02		/* Changes are private */
>   #define MAP_SHARED_VALIDATE 0x03	/* share + validate extension flags */
> +#define MAP_DROPPABLE	0x08		/* Zero memory under memory pressure. */
>   
>   /*
>    * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
> diff --git a/mm/madvise.c b/mm/madvise.c
> index a77893462b92..cba5bc652fc4 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1068,13 +1068,16 @@ static int madvise_vma_behavior(struct vm_area_struct *vma,
>   		new_flags |= VM_WIPEONFORK;
>   		break;
>   	case MADV_KEEPONFORK:
> +		if (vma->vm_flags & VM_DROPPABLE)
> +			return -EINVAL;
>   		new_flags &= ~VM_WIPEONFORK;
>   		break;
>   	case MADV_DONTDUMP:
>   		new_flags |= VM_DONTDUMP;
>   		break;
>   	case MADV_DODUMP:
> -		if (!is_vm_hugetlb_page(vma) && new_flags & VM_SPECIAL)
> +		if ((!is_vm_hugetlb_page(vma) && new_flags & VM_SPECIAL) ||
> +		    (vma->vm_flags & VM_DROPPABLE))
>   			return -EINVAL;
>   		new_flags &= ~VM_DONTDUMP;
>   		break;
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 30b51cdea89d..b87b3d8cc9cc 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -485,7 +485,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,
>   
>   	if (newflags == oldflags || (oldflags & VM_SPECIAL) ||
>   	    is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm) ||
> -	    vma_is_dax(vma) || vma_is_secretmem(vma))
> +	    vma_is_dax(vma) || vma_is_secretmem(vma) || (oldflags & VM_DROPPABLE))
>   		/* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */
>   		goto out;
>   
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 83b4682ec85c..b3d38179dd42 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1369,6 +1369,34 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>   			pgoff = 0;
>   			vm_flags |= VM_SHARED | VM_MAYSHARE;
>   			break;
> +		case MAP_DROPPABLE:
> +			/*
> +			 * A locked or stack area makes no sense to be droppable.
> +			 *
> +			 * Also, since droppable pages can just go away at any time
> +			 * it makes no sense to copy them on fork or dump them.
> +			 *
> +			 * And don't attempt to combine with hugetlb for now.
> +			 */
> +			if (flags & (MAP_LOCKED | MAP_HUGETLB))
> +			        return -EINVAL;
> +			if (vm_flags & (VM_GROWSDOWN | VM_GROWSUP))
> +			        return -EINVAL;
> +
> +			vm_flags |= VM_DROPPABLE;
> +
> +			/*
> +			 * If the pages can be dropped, then it doesn't make
> +			 * sense to reserve them.
> +			 */
> +			vm_flags |= VM_NORESERVE;

That is certainly interesting. Nothing that we might not be able to 
reclaim these pages reliably in all cases: for example when long-term 
pinning them.

In some environments (OVERCOMMIT_NEVER) MAP_NORESERE would never be 
effective. I wonder if we want to stick to the same behavior here ... 
but in theory I agree that we can set this here unconditionally, it's 
just the corner case of "there are ways to prohibit reclaim" that makes 
me wonder.

BTW, I was just trying to understand how MADV_FREE + MAP_DROPPABLE would 
behave without any swap space around.

Did you experiment with that?

I'm reading can_reclaim_anon_pages(), and I'm wondering how 
good/reliable that works when there is no swap configured.

Also, the comment in get_scan_count(): "If we have no swap space, do not 
bother scanning anon folios." makes me wonder if some work in that area 
is needed.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-07-08 20:21 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240707002658.1917440-1-Jason@zx2c4.com>
2024-07-07  0:26 ` Jason A. Donenfeld
2024-07-07  7:42   ` David Hildenbrand
2024-07-07 18:19     ` Linus Torvalds
2024-07-07 18:52       ` David Hildenbrand
2024-07-07 19:22         ` Linus Torvalds
2024-07-07 21:01           ` David Hildenbrand
2024-07-08  0:08             ` Linus Torvalds
2024-07-08  8:11               ` David Hildenbrand
2024-07-08  8:23                 ` David Hildenbrand
2024-07-08 13:57                   ` Jason A. Donenfeld
2024-07-08 20:05                     ` David Hildenbrand
2024-07-08 13:55                 ` Jason A. Donenfeld
2024-07-08 14:40                   ` Jason A. Donenfeld
2024-07-08 20:21                     ` David Hildenbrand [this message]
2024-07-08 20:26                       ` David Hildenbrand
2024-07-09  2:17                       ` Jason A. Donenfeld
2024-07-10  3:05                         ` David Hildenbrand
2024-07-10  3:34                           ` Jason A. Donenfeld
2024-07-10  3:53                             ` David Hildenbrand
2024-07-08 20:06                   ` David Hildenbrand
2024-07-08 13:50               ` Jason A. Donenfeld
2024-07-08  1:59       ` Jason A. Donenfeld
2024-07-08  1:46     ` Jason A. Donenfeld
2024-07-08 20:24       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=75d6c45d-deea-464d-b0fd-b36e5d73b898@redhat.com \
    --to=david@redhat.com \
    --cc=Jason@zx2c4.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=carlos@redhat.com \
    --cc=dhildenb@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=patches@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox