linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, stephen.s.brennan@oracle.com,
	urezki@gmail.com, hch@infradead.org
Subject: Re: [PATCH RFC 3/3] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
Date: Thu, 24 Nov 2022 17:52:34 +0800	[thread overview]
Message-ID: <Y38+4tspZxRJj73p@MiWiFi-R3L-srv> (raw)
In-Reply-To: <Y34fIzV7UnquyS1r@casper.infradead.org>

On 11/23/22 at 01:24pm, Matthew Wilcox wrote:
> On Wed, Nov 23, 2022 at 11:38:54AM +0800, Baoquan He wrote:
> > On 11/18/22 at 08:01am, Matthew Wilcox wrote:
> > > On Wed, Nov 09, 2022 at 11:35:34AM +0800, Baoquan He wrote:
> > > > Currently, vread() can read out vmalloc areas which is associated with
> > > > a vm_struct. While this doesn't work for areas created by vm_map_ram()
> > > > interface because it doesn't allocate a vm_struct. Then in vread(),
> > > > these areas will be skipped.
> > > > 
> > > > Here, add a new function vb_vread() to read out areas managed by
> > > > vmap_block specifically. Then recognize vm_map_ram areas via vmap->flags
> > > > and handle  them respectively.
> > > 
> > > i don't understand how this deals with the original problem identified,
> > > that the vread() can race with an unmap.
> > 
> > Thanks for checking.
> > 
> > I wrote a paragraph, then realized I misunderstood your concern. You are
> > saying the comment from Uladzislau about my original draft patch, right?
> > Paste the link of Uladzislau's reply here in case other people want to
> > know the background:
> > https://lore.kernel.org/all/Y1uKSmgURNEa3nQu@pc636/T/#u
> > 
> > When Stephen raised the issue originally, I posted a draft patch as
> > below trying to fix it:
> > https://lore.kernel.org/all/Y1pHTj2wuhoWmeV3@MiWiFi-R3L-srv/T/#u
> > 
> > In above draft patch, I tried to differentiate normal vmalloc area and
> > vm_map_ram area with the fact that vmalloc area is associated with a
> > vm_struct, while vm_map_ram area has ->vm as NULL. And I thought their
> > only difference is normal vmalloc area has guard page, so its size need
> > consider the guard page; while vm_map_ram area has no guard page, only
> > consider its own actual size. Uladzislau's comment reminded me I was
> > wrong. And the things we need handle are beyond that.
> > 
> > Currently there are three kinds of vmalloc areas in kernel:
> > 
> > 1) normal vmalloc areas, associated with a vm_struct, this is allocated 
> > in __get_vm_area_node(). When freeing, it set ->vm to NULL
> > firstly, then unmap and free vmap_area, see remove_vm_area().
> > 
> > 2) areas allocated via vm_map_ram() and size is larger than
> > VMAP_MAX_ALLOC. The entire area is not associated with vm_struct, and
> > freed at one time in vm_unmap_ram() with unmapping and freeing vmap_area;
> > 
> > 3) areas allocated via vm_map_ram(), then delegate to vb_alloc() when
> > size <= VMAP_MAX_ALLOC. Its vmap_area is allocated at one time with
> > VMAP_BLOCK_SIZE big, and split and used later through vb_alloc(), freed
> > via vb_free(). When the entire area is dirty, it will be unmapped and
> > freed.
> > 
> > Based on above facts, we need add flags to differentiate the normal
> > vmalloc area from the vm_map_ram area, namely area 1) and 2). And we
> > also need flags to differentiate the area 2) and 3). Because area 3) are
> > pieces of a entire vmap_area, vb_free() will unmap the piece of area and
> > set the part dirty, but the entire vmap_area will kept there. So when we
> > will read area 3), we need take vb->lock and only read out the still
> > mapped part, but not dirty or free part of the vmap_area.
> 
> I don't think you understand the problem.
> 
> Task A:			Task B:		Task C:
> p = vm_map_ram()
> 			vread(p);
> 			... preempted ...
> vm_unmap_ram(p);
> 					q = vm_map_ram();
> 			vread continues



> 
> If C has reused the address space allocated by A, task B is now reading
> the memory mapped by task C instead of task A.  If it hasn't, it's now
> trying to read from unmapped, and quite possibly freed memory.  Which
> might have been allocated by task D.

Hmm, it may not be like that.

Firstly, I would remind that vread() takes vmap_area_lock during the
whole reading process. Before this patchset, the vread() and other area
manipulation will have below status:
1) __get_vm_area_node() could only finish insert_vmap_area(), then free
the vmap_area_lock, then warting;
2) __get_vm_area_node() finishs setup_vmalloc_vm()
  2.1) doing mapping but not finished;
  2.2) clear_vm_uninitialized_flag() is called after mapping is done;
3) remove_vm_area() is called to set -> = NULL, then free vmap_area_lock;

Task A:			   Task B:		     Task C:
p = __get_vm_area_node()
remove_vm_area(p);
			   vread(p);

			   vread end 
					     q = __get_vm_area_node();

So, as you can see, the checking "if (!va->vm)" in vread() will filter
out vmap_area:
a) areas if only insert_vmap_area() is called, but ->vm is still NULL; 
b) areas if remove_vm_area() is called to clear ->vm to NULL;
c) areas created through vm_map_ram() since its ->vm is always NULL;

Means vread() will read out vmap_area:
d) areas if setup_vmalloc_vm() is called;
  1) mapping is done on areas, e.g clear_vm_uninitialized_flag() is
       called;
  2) mapping is being handled, just after returning from setup_vmalloc_vm();


******* after this patchset applied:

Task A:			Task B:		Task C:
p = vm_map_ram()
vm_unmap_ram(p);
			vread(p);
                         vb_vread()
			vread end 

					q = vm_map_ram();

With this patchset applied, other than normal areas, for the
vm_map_ram() areas:
1) In vm_map_ram(), set vmap_area->flags = VMAP_RAM when vmap_area_lock
   is taken; In vm_unmap_ram(), clear it wiht "va->flags &= ~VMAP_RAM"
   when vmap_area_lock is taken;
2) If vmap_block, set va->flags = VMAP_RAM|VMAP_BLOCK; And set
   vmap_block->used_map to track the used region, filter out the dirty
   and free region;
3) In vb_vread(), we take vb->lock to avoid reading out dirty regions.

Please help point out what is wrong or I missed.

> 
> Unless there's some kind of reference count so that B knows that both
> the address range and the underlying memory can't be freed while it's
> in the middle of the vread(), this is just unsafe.
> 



  reply	other threads:[~2022-11-24  9:52 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-09  3:35 [PATCH RFC 0/3] " Baoquan He
2022-11-09  3:35 ` [PATCH RFC 1/3] mm/vmalloc.c: add used_map into vmap_block to track space of vmap_block Baoquan He
2022-11-09  3:35 ` [PATCH RFC 2/3] mm/vmalloc.c: add flags to mark vm_map_ram area Baoquan He
2022-11-09  3:35 ` [PATCH RFC 3/3] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
2022-11-10  0:59   ` Stephen Brennan
2022-11-10 10:23     ` Baoquan He
2022-11-10 18:48       ` Stephen Brennan
2022-11-14 10:06         ` Baoquan He
2022-11-18  8:01   ` Matthew Wilcox
2022-11-23  3:38     ` Baoquan He
2022-11-23 13:24       ` Matthew Wilcox
2022-11-24  9:52         ` Baoquan He [this message]
2022-11-30 13:06           ` Uladzislau Rezki
2022-12-01  4:46             ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y38+4tspZxRJj73p@MiWiFi-R3L-srv \
    --to=bhe@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=stephen.s.brennan@oracle.com \
    --cc=urezki@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox