From: Dennis Zhou <dennis@kernel.org>
To: Yosry Ahmed <yosryahmed@google.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
Christoph Lameter <cl@linux.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Uladzislau Rezki <urezki@gmail.com>,
Christoph Hellwig <hch@infradead.org>,
Lorenzo Stoakes <lstoakes@gmail.com>
Subject: Re: [PATCH] percpu: clean up all mappings when pcpu_map_pages() fails
Date: Thu, 21 Mar 2024 09:57:05 -0700 [thread overview]
Message-ID: <Zfxm4e42VaAH2eMJ@V92F7Y9K0C> (raw)
In-Reply-To: <CAJD7tkYRnCkP39gOHtfpxTpz_ntx47dz48pFes9dHunnmy7Xtw@mail.gmail.com>
On Tue, Mar 19, 2024 at 01:49:17PM -0700, Yosry Ahmed wrote:
> On Tue, Mar 19, 2024 at 1:32 PM Dennis Zhou <dennis@kernel.org> wrote:
> >
> > Hi Yosry,
> >
> > On Tue, Mar 19, 2024 at 01:08:26PM -0700, Yosry Ahmed wrote:
> > > On Mon, Mar 11, 2024 at 12:43 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > > >
> > > > In pcpu_map_pages(), if __pcpu_map_pages() fails on a CPU, we call
> > > > __pcpu_unmap_pages() to clean up mappings on all CPUs where mappings
> > > > were created, but not on the CPU where __pcpu_map_pages() fails.
> > > >
> > > > __pcpu_map_pages() and __pcpu_unmap_pages() are wrappers around
> > > > vmap_pages_range_noflush() and vunmap_range_noflush(). All other callers
> > > > of vmap_pages_range_noflush() call vunmap_range_noflush() when mapping
> > > > fails, except pcpu_map_pages(). The reason could be that partial
> > > > mappings may be left behind from a failed mapping attempt.
> > > >
> > > > Call __pcpu_unmap_pages() for the failed CPU as well in
> > > > pcpu_map_pages().
> > > >
> > > > This was found by code inspection, no failures or bugs were observed.
> > > >
> > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > >
> > > Any thoughts about this change? Should I resend next week after the
> > > merge window?
> > >
> >
> > Sorry for the delay.
> >
> > I'm looking at the code from mm/kmsan/hooks.c kmsan_ioremap_page_range().
> > It seems like __vunmap_range_noflush() is called on error for
> > successfully mapped pages similar to how it's being done in percpu-vm.c.
>
> You picked an unconventional example to compare against :)
>
> >
> > I haven't read in depth the expectations of vmap_pages_range_noflush()
> > but on first glance it doesn't seem like percpu is operating out of the
> > ordinary?
>
> I was looking at vm_map_ram(), vmap(), and __vmalloc_area_node(). They
> all call vmap_pages_range()-> vmap_pages_range_noflush().
>
> When vmap_pages_range() fails:
> - vm_map_ram() calls
> vm_unmap_ram()->free_unmap_vmap_area()->vunmap_range_noflush().
> - vmap() calls vunmap()->remove_vm_area()->free_unmap_vmap_area()->
> vunmap_range_noflush().
> - __vmalloc_area_node() calls
> vfree()->remove_vm_area()->free_unmap_vmap_area()->
> vunmap_range_noflush().
>
Okay so I had a moment to read it more closely. If we're mapping > 1
pages, and one of the latter pages fails. Then we could leave some
mappings installed.
@Andrew, I think this makes sense. Would you please be able to pick this
up? I'm not running a tree this window. I will try to send out the percpu
hotplug changes I've been forward porting for a while now and try to get
that in a branch for-6.10.
Acked-by: Dennis Zhou <dennis@kernel.org>
> I think it is needed to clean up any leftover partial mappings. I am
> not sure about the kmsan example though.
>
Yeah the kmsan example seems like it could be wrong for the same reason,
but I haven't inspected that more closely.
Thanks,
Dennis
> Adding vmalloc reviewers here as well here.
> >
> >
> > > > ---
> > > >
> > > > Perhaps the reason __pcpu_unmap_pages() is not currently being called
> > > > for the failed CPU is that the size and alignment requirements make sure
> > > > we never leave any partial mappings behind? I have no idea. Nonetheless,
> > > > I think we want this change as that could be fragile, and is
> > > > inconsistent with other callers.
> > > >
> > > > ---
> > > > mm/percpu-vm.c | 4 ++--
> > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
> > > > index 2054c9213c433..cd69caf6aa8d8 100644
> > > > --- a/mm/percpu-vm.c
> > > > +++ b/mm/percpu-vm.c
> > > > @@ -231,10 +231,10 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk,
> > > > return 0;
> > > > err:
> > > > for_each_possible_cpu(tcpu) {
> > > > - if (tcpu == cpu)
> > > > - break;
> > > > __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start),
> > > > page_end - page_start);
> > > > + if (tcpu == cpu)
> > > > + break;
> > > > }
> > > > pcpu_post_unmap_tlb_flush(chunk, page_start, page_end);
> > > > return err;
> > > > --
> > > > 2.44.0.278.ge034bb2e1d-goog
> > > >
prev parent reply other threads:[~2024-03-21 16:57 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-11 19:43 Yosry Ahmed
2024-03-19 20:08 ` Yosry Ahmed
2024-03-19 20:32 ` Dennis Zhou
2024-03-19 20:49 ` Yosry Ahmed
2024-03-21 16:57 ` Dennis Zhou [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zfxm4e42VaAH2eMJ@V92F7Y9K0C \
--to=dennis@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lstoakes@gmail.com \
--cc=tj@kernel.org \
--cc=urezki@gmail.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox