From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08966C2BD09 for ; Thu, 27 Jun 2024 19:38:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 715AB6B009A; Thu, 27 Jun 2024 15:38:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C4FA6B009C; Thu, 27 Jun 2024 15:38:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5178F6B009D; Thu, 27 Jun 2024 15:38:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 30EE16B009A for ; Thu, 27 Jun 2024 15:38:53 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C5A54C0F76 for ; Thu, 27 Jun 2024 19:38:52 +0000 (UTC) X-FDA: 82277681304.06.D90F994 Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by imf13.hostedemail.com (Postfix) with ESMTP id E390F2000B for ; Thu, 27 Jun 2024 19:38:50 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="f/1CbMqF"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719517116; a=rsa-sha256; cv=none; b=Du2fCZigrVOwYBvF0QdX80FEB5bkWVdQhaOyOdOYX4ATfIGtUNVNj8a2e2njydzuUR8JMq AvfO0sK6a2kRQzma/mNdhlrXOqyWUifXqdx4XjfcX/qX9pYHS+pC9xPOtT9ZLWhO5RpvQb eI5+lwfhBh3jFvJjqyKL48/byPy97rQ= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="f/1CbMqF"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=lstoakes@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719517116; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5NLJfqsFcOHvKNXK7I50zdH/4C4TLlyZL7yB7AkwTj0=; b=FSEU4SVVPEAOetWcShy2QSrE7cpOlQC4GFLL8NnAy+vkzsyJCzvLwzPNlBYXK0bHa+hfjk 0CCLDKoN+iTM+p/bzzBcr+nJDWT2L3FYlpY0i1X5p6fUGm4ZxFnPG02pJJDlD+aRzQSa7g ZlkY71ajeQnoYwLmMS7pO2YBEYqg1WQ= Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-4247f36f689so59621035e9.1 for ; Thu, 27 Jun 2024 12:38:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719517129; x=1720121929; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=5NLJfqsFcOHvKNXK7I50zdH/4C4TLlyZL7yB7AkwTj0=; b=f/1CbMqFMUPueI8MYoEqXEyPytwfzsqHLRV0fv+aLJFH2j4qdnBWdrjsd/L+RIZbiV rZMtsxruVmqTtoHs1vp212zOkft2yJ/LD5kEBp1v7GIolhZaMEaIchMtuxqpFjy6gdDV 2Qqc5EgRqlP0zDJAP5YBZHKhTfyo+m/moZ+zJks/uXYlR9kc1JCULZpEzbnNzKyQGDMZ 5q6OAJ2myEblTFK/bvPfTnfoK2i0DNe4t5kX6/4Ch94QJOlwpOuWHCjfSsyZRkrs1ZXr 0jOGgM2RrDSqvVr/x7cUlxwUOJXRqGJW+IDDwLJA+DVBiG/Ng2atABvG42WD7boMZZK2 kV6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719517129; x=1720121929; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=5NLJfqsFcOHvKNXK7I50zdH/4C4TLlyZL7yB7AkwTj0=; b=As9H4GJASuhX+x76Jb9eW8bmWIBYK1IMhllWXLE+kViQpYp3DCjvxpDoY/HFd0dDUg w3eKyTKX4EWK+79FvK5u5CNRXHiXoPsKinDudR/Zp+G+lphB3R/Q4pYGcjC5qvZ5WbDk cPFaApNJyABPvOMG2aCWWbZS0JMQg+LRTzlfP43lO2BBVKUrZWykMy5RYP26kx2LMICA eFuHEbrDfvHjtI82+TwZ6Xi9mKXJLG9h9otWojSv8c2TDnAbbqrB0ZktwMxeP8mQkJUn R6dyZfVSVuYPVhE4S1fHtLBIU/ysvhd3P7mtapCrib0KVbzjhyaLlGmdPk2uN6VJbrHf fHYQ== X-Forwarded-Encrypted: i=1; AJvYcCVxIHDP6bEpME/h8Wus5Gq8qDsEC+ldGDxV5YSAk1jZRYUl5BmuWALbWWumsRlKDpN5UMMutHIsSNWzLHRGSw/I/T8= X-Gm-Message-State: AOJu0Yz9U68tooiQdSy2bbGV6rv/J+ydAa8yRjck9DhifgltbDHbzqEf 1UBsxQA+b7zEBq0ZIr/AnyNacRdNn3CDUE0M6sLZn6Z9pvN4BChT X-Google-Smtp-Source: AGHT+IHnGQ/X4kqKjeCk/PpAvVamiIfefvKEnfvZX1JBFIbnEJRvWCpE1eC8cjFYXMlxXgHpMbkUVg== X-Received: by 2002:a5d:6a09:0:b0:35f:26e7:f978 with SMTP id ffacd0b85a97d-366e7a3752bmr9877768f8f.37.1719517129033; Thu, 27 Jun 2024 12:38:49 -0700 (PDT) Received: from localhost ([2a00:23cc:d20f:ba01:bb66:f8b2:a0e8:6447]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3675a0e1412sm149864f8f.53.2024.06.27.12.38.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jun 2024 12:38:47 -0700 (PDT) Date: Thu, 27 Jun 2024 20:38:47 +0100 From: Lorenzo Stoakes To: "Liam R. Howlett" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Vlastimil Babka , Matthew Wilcox , Alexander Viro , Christian Brauner , Jan Kara , Eric Biederman , Kees Cook , Suren Baghdasaryan Subject: Re: [RFC PATCH 3/7] mm: unexport vma_expand() / vma_shrink() Message-ID: <057aa98a-bab6-4d0c-838b-6ab8acb5bb7f@lucifer.local> References: <8c548bb3d0286bfaef2cd5e67d7bf698967a52a1.1719481836.git.lstoakes@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: E390F2000B X-Stat-Signature: 41b17ainm7bgjp6jci6u8tqdzqywyy9o X-Rspam-User: X-HE-Tag: 1719517130-38365 X-HE-Meta: U2FsdGVkX1+Fc6OIH4pOvfBK/NV8i1O66euz/+DsIW5e4aUlnQQZC2FkTZACgxMo4jX5hKSV6HAMolPLB4ll/CCsq9a8WDDUBN+j0cqNj4lHcPUSEV4CeMAl4kNyDg52lRPzZLleG0oPiFEEi/KL+1FVf7/BE3ifBceE07y5OQLGYRVaEqzQixFq4QVx+VV0q6UWqYzLzYOGnofYOBlypGmurkXL31ERTkMOJRMkZdftKgqvK1dZ7Pg3rA2nTCHJmJY03NKM3VJ6sVpjUnjsipRWu0EjE9bH8Zcp1IsBk6zjAbH9LJWJllbzm64sCuA4PAecZ2fVVQ+j+yUqoHcFuyr5oOZ2Ohkmcix4AcnGq42x96i9+PwTDsOzSQmcOTC4ZJbDxvJOoAcKOzKG9ZaUio4EamqXWxp6n/d/Tfr9M20xPZ5ezjDSAa8IisOhS8dezP6Wojc1iieLUjvTMkKRkVlWE5awzfw6n6Jvn7ti/osM6UG7LR4khdjOimv+cyWj5y8RiUbjJyhityHAEuIx53Zxqm2h2AjI3C48PresPq3A221zuPIl7sgSC2mLCG1NyEdGe94GfQxZaf3eyGi32rtY+k9S1Kq3cpAtasQgDJpf5nIErere5FBtfr9IH0p2PtvxTa1AY1o7qFpM5dYA13VKKdl5lZ5fwvE4rG6EzHh8RcqYbbMRTIBlT9BIGf/D2A5XE8LF10Q8ugkoXbNW3uNxoc6may2CEQSTydUo2lzRDe2DLN+j6Tw0XbNxlpCABeMzkfPOlw8fuaHtlTx3RxWh40Qr4+YAq8mtLHZOBHPEkL0zMjzU4nyrJLzEERg02DoIKqfgv4c88qTaBsGP+gUPvKeoKU7hadGveL2OOQ/YFBSiFO+aZ1PgUit/3vopaxnNHxWz3DucQ3D/jvZu739pAORM06LJB/0Wy4oVAuYX2uyiVSvLRg3ewTWCw6RVl2MksUBU3tVR7VrkEmX VOQzE4rw 3Bh2300xRAlNVdy9DJv3URi9qpVMdIWX0XlL1lInBEMg47EyGxuoSQQ62m5r/34zz0MYXtscJjcvMwh3lUO7lbQIh8MubT1vfcV+DDyP0OlWM96l5jV6SbFIzhQQzcEW3lijAiJ0zcbqyhj6CfB1iXkp1r06cVSfLOvC5xfpteDc8giUvhpfHX1Vf9hHRYK4qsTWDM85VLx+tygSFZk1aFkXaRKIbZiOWh0YpXO/ceXrE4Sqm2g2DLOVn41mCInFC75kKnvqcWvfnROalb2U3ys4SwJYU6uUGVAyZRr54E3vr0dMk182mSIi2zGpS6S2CUZ5tJsPuJbR39RkPAsQzHf07BmG3MH53kAlq1LEI94mm22A9jxKdkoFRyKAaAIwJMp2REHyK/jPVIMtKfC78tG/ZgHOJyY39tnYB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 27, 2024 at 01:45:34PM -0400, Liam R. Howlett wrote: > * Lorenzo Stoakes [240627 06:39]: > > The vma_expand() and vma_shrink() functions are core VMA manipulaion > > functions which ultimately invoke VMA split/merge. In order to make these > > testable, it is convenient to place all such core functions in a header > > internal to mm/. > > > > The sole user doesn't cause a split or merge, it relocates a vma by > 'sliding' the window of the vma by expand/shrink with the moving of page > tables in the middle of the slide. > > It slides to relocate the vma start/end and keep the vma pointer > constant. Yeah sorry, I actually don't know why I said this (I did say ultimately again as well!), as you say and I was in fact aware of, this doesn't invoke split/merge. I will put this down to me being tired when I wrote this :) Will fix. > > > In addition, it is safer to abstract direct access to such functionality so > > we can better control how other parts of the kernel use them, which > > provides us the freedom to change how this functionality behaves as needed > > without having to worry about how this functionality is used elsewhere. > > > > In order to service both these requirements, we provide abstractions for > > the sole external user of these functions, shift_arg_pages() in fs/exec.c. > > > > We provide vma_expand_bottom() and vma_shrink_top() functions which better > > match the semantics of what shift_arg_pages() is trying to accomplish by > > explicitly wrapping the safe expansion of the bottom of a VMA and the > > shrinking of the top of a VMA. > > > > As a result, we place the vma_shrink() and vma_expand() functions into > > mm/internal.h to unexport them from use by any other part of the kernel. > > There is no point to have vma_shrink() have a wrapper since this is the > only place it's ever used. So we're wrapping a function that's only > called once. Yeah that was a sketchy part of this change, I feel the vma_expand() case is a lot more defensible, the vma_shrink() one, well I expected I might get some feedback on anyway :) This was obviously to try to find a way to abstract these away from fs/ in some vaguely sensible fashion while retaining functionality. > > I'd rather a vma_relocate() do everything in this function than wrap > them. The only other think it does is the page table moving and freeing > - which we have to do in the vma code. We;d expose something we want no > one to use - but we already have two of those here.. Right, I think I was trying to avoid _the whole thing_ as it's so specific and not so nice to make available, but at the same time, it is perhaps the only way forward reasonably to avoid the vma_shrink() micro-wrapper. So yeah, will rework with a vma_relocate() or similar. As you say, we can't really get away from exposing something nasty here. > > > > > Signed-off-by: Lorenzo Stoakes > > --- > > fs/exec.c | 26 +++++-------------- > > include/linux/mm.h | 9 +++---- > > mm/internal.h | 6 +++++ > > mm/mmap.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++ > > 4 files changed, 82 insertions(+), 24 deletions(-) > > > > diff --git a/fs/exec.c b/fs/exec.c > > index 40073142288f..1cb3bf323e0f 100644 > > --- a/fs/exec.c > > +++ b/fs/exec.c > > @@ -700,25 +700,14 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) > > unsigned long length = old_end - old_start; > > unsigned long new_start = old_start - shift; > > unsigned long new_end = old_end - shift; > > - VMA_ITERATOR(vmi, mm, new_start); > > + VMA_ITERATOR(vmi, mm, 0); > > struct vm_area_struct *next; > > struct mmu_gather tlb; > > + int ret; > > > > - BUG_ON(new_start > new_end); > > - > > - /* > > - * ensure there are no vmas between where we want to go > > - * and where we are > > - */ > > - if (vma != vma_next(&vmi)) > > - return -EFAULT; > > - > > - vma_iter_prev_range(&vmi); > > - /* > > - * cover the whole range: [new_start, old_end) > > - */ > > - if (vma_expand(&vmi, vma, new_start, old_end, vma->vm_pgoff, NULL)) > > - return -ENOMEM; > > + ret = vma_expand_bottom(&vmi, vma, shift, &next); > > + if (ret) > > + return ret; > > > > /* > > * move the page tables downwards, on failure we rely on > > @@ -730,7 +719,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) > > > > lru_add_drain(); > > tlb_gather_mmu(&tlb, mm); > > - next = vma_next(&vmi); > > + > > if (new_end > old_start) { > > /* > > * when the old and new regions overlap clear from new_end. > > @@ -749,9 +738,8 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) > > } > > tlb_finish_mmu(&tlb); > > > > - vma_prev(&vmi); > > /* Shrink the vma to just the new range */ > > - return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff); > > + return vma_shrink_top(&vmi, vma, shift); > > } > > > > /* > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 4d2b5538925b..e3220439cf75 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -3273,11 +3273,10 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node); > > > > /* mmap.c */ > > extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin); > > -extern int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, > > - unsigned long start, unsigned long end, pgoff_t pgoff, > > - struct vm_area_struct *next); > > -extern int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, > > - unsigned long start, unsigned long end, pgoff_t pgoff); > > +extern int vma_expand_bottom(struct vma_iterator *vmi, struct vm_area_struct *vma, > > + unsigned long shift, struct vm_area_struct **next); > > +extern int vma_shrink_top(struct vma_iterator *vmi, struct vm_area_struct *vma, > > + unsigned long shift); > > extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *); > > extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *); > > extern void unlink_file_vma(struct vm_area_struct *); > > diff --git a/mm/internal.h b/mm/internal.h > > index c8177200c943..f7779727bb78 100644 > > --- a/mm/internal.h > > +++ b/mm/internal.h > > @@ -1305,6 +1305,12 @@ static inline struct vm_area_struct > > vma_policy(vma), new_ctx, anon_vma_name(vma)); > > } > > > > +int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, > > + unsigned long start, unsigned long end, pgoff_t pgoff, > > + struct vm_area_struct *next); > > +int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, > > + unsigned long start, unsigned long end, pgoff_t pgoff); > > + > > enum { > > /* mark page accessed */ > > FOLL_TOUCH = 1 << 16, > > diff --git a/mm/mmap.c b/mm/mmap.c > > index e42d89f98071..574e69a04ebe 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -3940,6 +3940,71 @@ void mm_drop_all_locks(struct mm_struct *mm) > > mutex_unlock(&mm_all_locks_mutex); > > } > > > > +/* > > + * vma_expand_bottom() - Expands the bottom of a VMA downwards. An error will > > + * arise if there is another VMA in the expanded range, or > > + * if the expansion fails. This function leaves the VMA > > + * iterator, vmi, positioned at the newly expanded VMA. > > + * @vmi: The VMA iterator. > > + * @vma: The VMA to modify. > > + * @shift: The number of bytes by which to expand the bottom of the VMA. > > + * @next: Output parameter, pointing at the VMA immediately succeeding the newly > > + * expanded VMA. > > + * > > + * Returns: 0 on success, an error code otherwise. > > + */ > > +int vma_expand_bottom(struct vma_iterator *vmi, struct vm_area_struct *vma, > > + unsigned long shift, struct vm_area_struct **next) > > +{ > > + unsigned long old_start = vma->vm_start; > > + unsigned long old_end = vma->vm_end; > > + unsigned long new_start = old_start - shift; > > + unsigned long new_end = old_end - shift; > > + > > + BUG_ON(new_start > new_end); > > + > > + vma_iter_set(vmi, new_start); > > + > > + /* > > + * ensure there are no vmas between where we want to go > > + * and where we are > > + */ > > + if (vma != vma_next(vmi)) > > + return -EFAULT; > > + > > + vma_iter_prev_range(vmi); > > + > > + /* > > + * cover the whole range: [new_start, old_end) > > + */ > > + if (vma_expand(vmi, vma, new_start, old_end, vma->vm_pgoff, NULL)) > > + return -ENOMEM; > > + > > + *next = vma_next(vmi); > > + vma_prev(vmi); > > + > > + return 0; > > +} > > + > > +/* > > + * vma_shrink_top() - Reduce an existing VMA's memory area by shift bytes from > > + * the top of the VMA. > > + * @vmi: The VMA iterator, must be positioned at the VMA. > > + * @vma: The VMA to modify. > > + * @shift: The number of bytes by which to shrink the VMA. > > + * > > + * Returns: 0 on success, an error code otherwise. > > + */ > > +int vma_shrink_top(struct vma_iterator *vmi, struct vm_area_struct *vma, > > + unsigned long shift) > > +{ > > + if (shift >= vma->vm_end - vma->vm_start) > > + return -EINVAL; > > + > > + return vma_shrink(vmi, vma, vma->vm_start, vma->vm_end - shift, > > + vma->vm_pgoff); > > +} > > + > > /* > > * initialise the percpu counter for VM > > */ > > -- > > 2.45.1 > >