* [RFC] remove page_table_lock in anon_vma_prepare @ 2009-06-05 14:35 Minchan Kim 2009-06-05 18:26 ` Hugh Dickins 0 siblings, 1 reply; 5+ messages in thread From: Minchan Kim @ 2009-06-05 14:35 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Minchan Kim, Hugh Dickins, Rik van Riel, Nick Piggin As I looked over the page_table_lock, it related to page table not anon_vma I think anon_vma->lock can protect race against threads. Do I miss something ? If I am right, we can remove unnecessary page_table_lock holding in anon_vma_prepare. We can get performance benefit. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Rik van Riel <riel@redhat.com> Cc: Nick Piggin <npiggin@suse.de> --- mm/rmap.c | 3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index b5c6e12..65b4877 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -113,14 +113,11 @@ int anon_vma_prepare(struct vm_area_struct *vma) } spin_lock(&anon_vma->lock); - /* page_table_lock to protect against threads */ - spin_lock(&mm->page_table_lock); if (likely(!vma->anon_vma)) { vma->anon_vma = anon_vma; list_add_tail(&vma->anon_vma_node, &anon_vma->head); allocated = NULL; } - spin_unlock(&mm->page_table_lock); spin_unlock(&anon_vma->lock); if (unlikely(allocated)) -- 1.5.6.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] remove page_table_lock in anon_vma_prepare 2009-06-05 14:35 [RFC] remove page_table_lock in anon_vma_prepare Minchan Kim @ 2009-06-05 18:26 ` Hugh Dickins 2009-06-07 15:16 ` Minchan Kim 0 siblings, 1 reply; 5+ messages in thread From: Hugh Dickins @ 2009-06-05 18:26 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, linux-mm, linux-kernel, Hugh Dickins, Rik van Riel, Nick Piggin On Fri, 5 Jun 2009, Minchan Kim wrote: > As I looked over the page_table_lock, it related to page table not anon_vma > > I think anon_vma->lock can protect race against threads. > Do I miss something ? > > If I am right, we can remove unnecessary page_table_lock holding > in anon_vma_prepare. We can get performance benefit. > > Signed-off-by: Minchan Kim <minchan.kim@gmail.com> > Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> > Cc: Rik van Riel <riel@redhat.com> > Cc: Nick Piggin <npiggin@suse.de> No, NAK to this one. Look above the context shown in the patch: anon_vma = find_mergeable_anon_vma(vma); allocated = NULL; if (!anon_vma) { anon_vma = anon_vma_alloc(); if (unlikely(!anon_vma)) return -ENOMEM; allocated = anon_vma; } spin_lock(&anon_vma->lock); So if find_mergeable_anon_vma failed to find a suitable neighbouring vma to share with, we'll have got the anon_vma from anon_vma_alloc(). Two threads could perfectly well do that concurrently (mmap_sem is held only for reading), each allocating a separate fresh anon_vma, then they'd each do spin_lock(&anon_vma->lock), but on _different_ anon_vmas, so wouldn't exclude each other at all: we need a common lock to exclude that race, and abuse page_table_lock for the purpose. (As I expect you've noticed, we used not to bother with the spin_lock on anon_vma->lock when we'd freshly allocated the anon_vma, it looks as if it's unnecessary. But in fact Nick and Linus found there's a subtle reason why it is necessary even then - hopefully the git log explains it, or I could look up the mails if you want, but at this moment the details escape me. And do we need the page_table_lock even when find_mergeable_anon_vma succeeds? That also looks as if it's unnecessary, but I've the ghost of a memory that it's needed even for that case: I seem to remember that there can be a benign race where find_mergeable_anon_vma called by concurrent threads could actually return different anon_vmas. That also is something I don't want to think too deeply into at this instant, but beg me if you wish!) Hugh > --- > mm/rmap.c | 3 --- > 1 files changed, 0 insertions(+), 3 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index b5c6e12..65b4877 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -113,14 +113,11 @@ int anon_vma_prepare(struct vm_area_struct *vma) > } > spin_lock(&anon_vma->lock); > > - /* page_table_lock to protect against threads */ > - spin_lock(&mm->page_table_lock); > if (likely(!vma->anon_vma)) { > vma->anon_vma = anon_vma; > list_add_tail(&vma->anon_vma_node, &anon_vma->head); > allocated = NULL; > } > - spin_unlock(&mm->page_table_lock); > > spin_unlock(&anon_vma->lock); > if (unlikely(allocated)) > -- > 1.5.6.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] remove page_table_lock in anon_vma_prepare 2009-06-05 18:26 ` Hugh Dickins @ 2009-06-07 15:16 ` Minchan Kim 2009-06-07 16:28 ` Hugh Dickins 0 siblings, 1 reply; 5+ messages in thread From: Minchan Kim @ 2009-06-07 15:16 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel, Nick Piggin Hi, Hugh. On Sat, Jun 6, 2009 at 3:26 AM, Hugh Dickins<hugh.dickins@tiscali.co.uk> wrote: > On Fri, 5 Jun 2009, Minchan Kim wrote: > >> As I looked over the page_table_lock, it related to page table not anon_vma >> >> I think anon_vma->lock can protect race against threads. >> Do I miss something ? >> >> If I am right, we can remove unnecessary page_table_lock holding >> in anon_vma_prepare. We can get performance benefit. >> >> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> >> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> >> Cc: Rik van Riel <riel@redhat.com> >> Cc: Nick Piggin <npiggin@suse.de> > > No, NAK to this one. Look above the context shown in the patch: > > anon_vma = find_mergeable_anon_vma(vma); > allocated = NULL; > if (!anon_vma) { > anon_vma = anon_vma_alloc(); > if (unlikely(!anon_vma)) > return -ENOMEM; > allocated = anon_vma; > } > spin_lock(&anon_vma->lock); > > So if find_mergeable_anon_vma failed to find a suitable neighbouring > vma to share with, we'll have got the anon_vma from anon_vma_alloc(). > > Two threads could perfectly well do that concurrently (mmap_sem is > held only for reading), each allocating a separate fresh anon_vma, > then they'd each do spin_lock(&anon_vma->lock), but on _different_ > anon_vmas, so wouldn't exclude each other at all: we need a common > lock to exclude that race, and abuse page_table_lock for the purpose. Indeed! I have missed it until now. In fact, I expected whoever expert like you point me out. > (As I expect you've noticed, we used not to bother with the spin_lock > on anon_vma->lock when we'd freshly allocated the anon_vma, it looks > as if it's unnecessary. But in fact Nick and Linus found there's a > subtle reason why it is necessary even then - hopefully the git log > explains it, or I could look up the mails if you want, but at this > moment the details escape me. Hmm. I didn't follow up that at that time. After you noticed me, I found that. commit d9d332e0874f46b91d8ac4604b68ee42b8a7a2c6 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Sun Oct 19 10:32:20 2008 -0700 anon_vma_prepare: properly lock even newly allocated entries It's subtle race so I can't digest it fully but I can understand that following as. If we don't hold lock at fresh anon_vma, it can be removed and reallocated by other threads since other cpu's can find it, free, reallocate before first thread which call anon_vma_prepare adds anon_vma to list after vma->anon_vma = anon_vma I hope my above explanation is right :) > And do we need the page_table_lock even when find_mergeable_anon_vma > succeeds? That also looks as if it's unnecessary, but I've the ghost > of a memory that it's needed even for that case: I seem to remember > that there can be a benign race where find_mergeable_anon_vma called > by concurrent threads could actually return different anon_vmas. > That also is something I don't want to think too deeply into at > this instant, but beg me if you wish!) Unfortunately I can't found this issue mail or changelog. Hugh. Could you explain this issue more detail in your convenient time ? I don't mind you ignore me. I don't want you to be busy from me. :) I always thanks for your kind explanation and learns lots of thing from you. :) Thanks again. -- Kinds regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] remove page_table_lock in anon_vma_prepare 2009-06-07 15:16 ` Minchan Kim @ 2009-06-07 16:28 ` Hugh Dickins 2009-06-07 23:50 ` Minchan Kim 0 siblings, 1 reply; 5+ messages in thread From: Hugh Dickins @ 2009-06-07 16:28 UTC (permalink / raw) To: Minchan Kim Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel, Rik van Riel, Nick Piggin [-- Attachment #1: Type: TEXT/PLAIN, Size: 4143 bytes --] On Mon, 8 Jun 2009, Minchan Kim wrote: > On Sat, Jun 6, 2009 at 3:26 AM, Hugh Dickins<hugh.dickins@tiscali.co.uk> wrote: > > On Fri, 5 Jun 2009, Minchan Kim wrote: > > > (As I expect you've noticed, we used not to bother with the spin_lock > > on anon_vma->lock when we'd freshly allocated the anon_vma, it looks > > as if it's unnecessary. But in fact Nick and Linus found there's a > > subtle reason why it is necessary even then - hopefully the git log Actually, Linus put a lot of his git comment into the comment above anon_vma_prepare(); but it doesn't pin down the case Nick identified as well as Nick's original mail. > > explains it, or I could look up the mails if you want, but at this > > moment the details escape me. > > Hmm. I didn't follow up that at that time. > > After you noticed me, I found that. > commit d9d332e0874f46b91d8ac4604b68ee42b8a7a2c6 > Author: Linus Torvalds <torvalds@linux-foundation.org> > Date: Sun Oct 19 10:32:20 2008 -0700 > > anon_vma_prepare: properly lock even newly allocated entries > > It's subtle race so I can't digest it fully but I can understand that > following as. > > If we don't hold lock at fresh anon_vma, it can be removed and > reallocated by other threads since other cpu's can find it, free, > reallocate before first thread which call anon_vma_prepare adds > anon_vma to list after vma->anon_vma = anon_vma > > I hope my above explanation is right :) Not really: I don't think there was a risk of it getting freed at that point, but there was a risk of its list head getting dereferenced before we'd initialized it. Here's a link to Nick's 16oct08 linux-mm mail on the subject, then you can follow the thread from there. In brief, IIRC, Nick found a race which he proposed to fix with barriers, but in the end we were all much happier just taking the anon_vma lock in all cases. http://marc.info/?l=linux-mm&m=122413030612659&w=2 > > > And do we need the page_table_lock even when find_mergeable_anon_vma > > succeeds? That also looks as if it's unnecessary, but I've the ghost > > of a memory that it's needed even for that case: I seem to remember > > that there can be a benign race where find_mergeable_anon_vma called > > by concurrent threads could actually return different anon_vmas. > > That also is something I don't want to think too deeply into at > > this instant, but beg me if you wish!) > > Unfortunately I can't found this issue mail or changelog. > Hugh. Could you explain this issue more detail in your convenient time ? Sure, I remembered it once I went to bed that night, it's an easy one; wasn't ever discussed on list, just something I'd been aware of. Remember that anon_vma_prepare() gets called at fault time, when we have only down_read of mmap_sem, so there may well be concurrent faults. find_mergeable_anon_vma looks at the vma on either side of our faulting vma, to see if the neighbouring vma already has an anon_vma, which we'd be wise to use if that vma could plausibly be merged with our vma later e.g. mprotect may have temporarily split ours from the next, but another mprotect may make them mergeable - it would be a pity to be prevented from merging them just because we'd already attached distinct anon_vmas. But, as I said, there may well be concurrent faults, on ours and on neighbouring vmas: so one call to find_mergeable_anon_vma on our vma may find that the next vma has no anon_vma yet, but the prev has one, so it returns the prev's anon_vma; but a racing fault on the next vma immediately gives it an anon_vma, and a racing fault on our vma finds that, so its find_mergeable_anon_vma returns the next's anon_vma. So the two faults on our vma could both be in anon_vma_prepare(), doing the spin_lock(&anon_vma->lock) on find_mergeable_anon_vma's anon_vma, but those could still be different anon_vmas: but if both lock the page_table_lock, we can be sure to catch that case. When I said the race was benign, I meant that it doesn't matter in such a case which one we choose; but we don't want to choose both! Hugh ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] remove page_table_lock in anon_vma_prepare 2009-06-07 16:28 ` Hugh Dickins @ 2009-06-07 23:50 ` Minchan Kim 0 siblings, 0 replies; 5+ messages in thread From: Minchan Kim @ 2009-06-07 23:50 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel, Nick Piggin On Mon, Jun 8, 2009 at 1:28 AM, Hugh Dickins<hugh.dickins@tiscali.co.uk> wrote: > On Mon, 8 Jun 2009, Minchan Kim wrote: >> On Sat, Jun 6, 2009 at 3:26 AM, Hugh Dickins<hugh.dickins@tiscali.co.uk> wrote: >> > On Fri, 5 Jun 2009, Minchan Kim wrote: >> >> > (As I expect you've noticed, we used not to bother with the spin_lock >> > on anon_vma->lock when we'd freshly allocated the anon_vma, it looks >> > as if it's unnecessary. But in fact Nick and Linus found there's a >> > subtle reason why it is necessary even then - hopefully the git log > > Actually, Linus put a lot of his git comment into the comment above > anon_vma_prepare(); but it doesn't pin down the case Nick identified > as well as Nick's original mail. > >> > explains it, or I could look up the mails if you want, but at this >> > moment the details escape me. >> >> Hmm. I didn't follow up that at that time. >> >> After you noticed me, I found that. >> commit d9d332e0874f46b91d8ac4604b68ee42b8a7a2c6 >> Author: Linus Torvalds <torvalds@linux-foundation.org> >> Date: Sun Oct 19 10:32:20 2008 -0700 >> >> anon_vma_prepare: properly lock even newly allocated entries >> >> It's subtle race so I can't digest it fully but I can understand that >> following as. >> >> If we don't hold lock at fresh anon_vma, it can be removed and >> reallocated by other threads since other cpu's can find it, free, >> reallocate before first thread which call anon_vma_prepare adds >> anon_vma to list after vma->anon_vma = anon_vma >> >> I hope my above explanation is right :) > > Not really: I don't think there was a risk of it getting freed at > that point, but there was a risk of its list head getting dereferenced > before we'd initialized it. > > Here's a link to Nick's 16oct08 linux-mm mail on the subject, then you > can follow the thread from there. In brief, IIRC, Nick found a race > which he proposed to fix with barriers, but in the end we were all > much happier just taking the anon_vma lock in all cases. > > http://marc.info/?l=linux-mm&m=122413030612659&w=2 Huge long. Thanks for searching it for me. I will read the thread and digest it. ;-) >> >> > And do we need the page_table_lock even when find_mergeable_anon_vma >> > succeeds? That also looks as if it's unnecessary, but I've the ghost >> > of a memory that it's needed even for that case: I seem to remember >> > that there can be a benign race where find_mergeable_anon_vma called >> > by concurrent threads could actually return different anon_vmas. >> > That also is something I don't want to think too deeply into at >> > this instant, but beg me if you wish!) >> >> Unfortunately I can't found this issue mail or changelog. >> Hugh. Could you explain this issue more detail in your convenient time ? > > Sure, I remembered it once I went to bed that night, it's an easy one; > wasn't ever discussed on list, just something I'd been aware of. > > Remember that anon_vma_prepare() gets called at fault time, when we > have only down_read of mmap_sem, so there may well be concurrent faults. > > find_mergeable_anon_vma looks at the vma on either side of our faulting > vma, to see if the neighbouring vma already has an anon_vma, which we'd > be wise to use if that vma could plausibly be merged with our vma later > e.g. mprotect may have temporarily split ours from the next, but another > mprotect may make them mergeable - it would be a pity to be prevented > from merging them just because we'd already attached distinct anon_vmas. Absolutely. > But, as I said, there may well be concurrent faults, on ours and on > neighbouring vmas: so one call to find_mergeable_anon_vma on our vma > may find that the next vma has no anon_vma yet, but the prev has one, > so it returns the prev's anon_vma; but a racing fault on the next > vma immediately gives it an anon_vma, and a racing fault on our vma > finds that, so its find_mergeable_anon_vma returns the next's anon_vma. > > So the two faults on our vma could both be in anon_vma_prepare(), > doing the spin_lock(&anon_vma->lock) on find_mergeable_anon_vma's > anon_vma, but those could still be different anon_vmas: but if > both lock the page_table_lock, we can be sure to catch that case. I can understand it completely. Thanks for quick replay and good explanation. I expect this thread can help other some day. :) > > When I said the race was benign, I meant that it doesn't matter in > such a case which one we choose; but we don't want to choose both! > > Hugh -- Kinds regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-06-07 22:59 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-06-05 14:35 [RFC] remove page_table_lock in anon_vma_prepare Minchan Kim 2009-06-05 18:26 ` Hugh Dickins 2009-06-07 15:16 ` Minchan Kim 2009-06-07 16:28 ` Hugh Dickins 2009-06-07 23:50 ` Minchan Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox