From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E34EC77B75 for ; Tue, 16 May 2023 20:12:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B7AA900003; Tue, 16 May 2023 16:12:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9673A900002; Tue, 16 May 2023 16:12:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80998900003; Tue, 16 May 2023 16:12:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6C1BD900002 for ; Tue, 16 May 2023 16:12:13 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3898C1603B5 for ; Tue, 16 May 2023 20:12:13 +0000 (UTC) X-FDA: 80797214946.04.2C12C59 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf24.hostedemail.com (Postfix) with ESMTP id F03F018000A for ; Tue, 16 May 2023 20:12:09 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=SKeJJZDD; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf24.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684267930; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JgliISvLympH0bMiJgZ/w3eCOwitsi3HkH/kfVh20Qw=; b=o99aOf8JqvyrsUJji/EmqREi7sQAhkAjQPdhmea60wAIipCDlci3KifWqgKsn33qb5e68O wJ0N9uFr39m/1z51TsftLyGsIrOecePtBs3HOSZrWghFrVXoE3iRLSh0DNp4/R05SyZqnH 3m0wZXq0E2SDhft6vArGvBFkwsPDdl0= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=SKeJJZDD; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf24.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684267930; a=rsa-sha256; cv=none; b=H5PDtT9jpop4P3D/asy1MzhLsipgQqnjRJziQ7+yYYsTaLcinicsE2dzWXtBHj1VS0vsOx 6iXG491e1dlnj6JF6OsllMXv1bCtFNyWg6dC45jP2p45dTAa9BDRuECgZH+I+Q+iQrW/4g NoMaNa892lxk3pDgQQBtZI9fBvRqJAs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684267929; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JgliISvLympH0bMiJgZ/w3eCOwitsi3HkH/kfVh20Qw=; b=SKeJJZDDYZMWk2YUEjxwm0KBvZew29wL85bbR+d1tON7jwjd4mqyXWQETQ3gIJWN+fdN0D fAg9wUaENTwLLlIB8Qku3bEwmufDbAFi7E3rcWSjUpxvU6JvaG07erOXc5subuTXP4PBLm rONqgg0Io38tRJve8XzRvBEkkyYUuHQ= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-657-KQiEUcWdNPenwsH2FaLeVA-1; Tue, 16 May 2023 16:12:08 -0400 X-MC-Unique: KQiEUcWdNPenwsH2FaLeVA-1 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-61acaa32164so26804966d6.1 for ; Tue, 16 May 2023 13:12:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684267927; x=1686859927; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JgliISvLympH0bMiJgZ/w3eCOwitsi3HkH/kfVh20Qw=; b=JiWYNdWD4ZOvEIUD3xi/QgWzelXCFU1lkaGvxhQU8HR3CRqqLPcfdBKN5DVzBcIfAR 9jzP42myxkkO8qdjqhAGJtZwPlyaqqNv/bziA6E0e6dbVkkK7VZy2lOfl1aXo6gej11P MXsSxrA1JXSdVLHUfkzu5b0noT0vLf6w6o8bL8sB7PR5nwsnO3arBBP8AcaKR7lZeg3k basthbFHcgpzzh5CIB0QLqCDGfxyzCJM0ifb0DNBVeIZZlTCLiazaVYRBdyT02VpIVTd Psu7wB9v3p0oZG3S5W8jMW3FxpXZyP4uyQpDlLzWqBffl0z7GW7U68XXhtF5wV3TnKFB sZzA== X-Gm-Message-State: AC+VfDwolBxWdLtrx63UUzc6VvWl0ia8PDJb8kHz+axJzHLX76pSpnet kU4lppq2NzwdUzllSTRogg0N2jU5ag/ocMrQzhdD4oqs+zETOD7Kk86cP77mihsZup8+oMYFYgw PrYab0MO30Ac= X-Received: by 2002:a05:6214:301c:b0:5df:55b5:b1a with SMTP id ke28-20020a056214301c00b005df55b50b1amr1036793qvb.4.1684267927404; Tue, 16 May 2023 13:12:07 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5M+E0N7PEGSOpEkXIk4PyBMXZ2QvvHJhIoEcbIR/bX2awY/vibw5DNGXEFW+GIdzbLWmss9g== X-Received: by 2002:a05:6214:301c:b0:5df:55b5:b1a with SMTP id ke28-20020a056214301c00b005df55b50b1amr1036760qvb.4.1684267927095; Tue, 16 May 2023 13:12:07 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-62-70-24-86-62.dsl.bell.ca. [70.24.86.62]) by smtp.gmail.com with ESMTPSA id i11-20020a0cfd2b000000b0061b73e331b2sm5916753qvs.30.2023.05.16.13.12.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 May 2023 13:12:06 -0700 (PDT) Date: Tue, 16 May 2023 16:12:05 -0400 From: Peter Xu To: "Liam R. Howlett" , Lorenzo Stoakes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mike Rapoport , Mark Rutland , Alexander Viro , Christian Brauner Subject: Re: [PATCH v2] mm: userfaultfd: avoid passing an invalid range to vma_merge() Message-ID: References: <20230515193232.67552-1-lstoakes@gmail.com> <20abea10-5307-498b-b9df-8f0b2fed1701@lucifer.local> <20230516164907.5bf2qe7nn5bo7iaa@revolver> MIME-Version: 1.0 In-Reply-To: <20230516164907.5bf2qe7nn5bo7iaa@revolver> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: 4xm8cfakbmnrm16qp66ybzok39kud6h1 X-Rspam-User: X-Rspamd-Queue-Id: F03F018000A X-Rspamd-Server: rspam07 X-HE-Tag: 1684267929-993797 X-HE-Meta: U2FsdGVkX1+06wue2Ipn7DEBqUVM1iAaXY/qSVuQQzRWnXawh863WJsknIbYp9Jrp4LB7caC0x9O0QQdGH2zgylHLwthJM0zB6bYuWoQMnP6LklJ94IJYecIcqvwvQ+s64M8goiKbBHScPHp0XeRTUw3R6UNGRtoEco1xFeuZEALLw9J+V1icO9kBBy2y7gTtKRZSZR0z7fxk/5NEOQxMyBKYHSmYwh67fiicftPtnQfG0a8QKUbssQR/JFIVKg2Lhd9dAteyNuAM3TZfKsQC5F27RDrOoKSidx3h9qsbBhnXYQXmv6M5+4lD7NVzdmDPSVZvxAXhySJ9AYx+SfWw5DYargHApM1dk2BPD+afj0Sj2AGjV3SVV7YP3vlC+ZaD1sc6GVhr5CCjVe1N8T3zSXtDWZ/iP4/zN+7rzr6dtHYSeHnBgE3tyARWvIo173cSTmwpZKTZVFf0dm4X8eZ0J4GW3QD4ha1yvfjK0zJZblGuEQ4eYKBdLEwm00y7xM028gXdIuC2P3NzKCpqWxHPCNMRyTS/0HcZKO5E0ZgmxWg8xr3VPalzJGWOM0vAiEa1HrNB70zi62r+zdoKUIpPBBZ7A870ov9oJrE2qmqOXT05GrLC+wFathvo/ED3SOVZH4ayNHcMThcysgvaD0sTicQK3S9mgCOPVf3rVzegogPkshmLaA4LSGHJbbhK+hUSKvEjPZiJTb5yY7/J/T18/QDVFJ6Wni2Z5/4MRSgfrPQ/WhpSW3i4NvIns7yEaSyA2KoSrR2OD61ItkdAJHo3GvbsK5jOKGJv/UIEbqpBtUh72b8CopyIHHRuA0NxxQjTK3tfZtKgPiwD0u0VVmHMAWWuEucJ9TLKAjk9NQ3GjBOHwLmWeFQHzZ87A7ygkBJJccEnath31ae/6JdxB8MqE4afiOtZ+IEQv/cBsZWmafaNHkCmCey+EORW9cWuf7j9AoTDb0boZgwhXNY0zj DWTIyBLg 9KmeIjVNTk7h6a5Ift2OZLlRAwYwHc4AgIZfr9ojvJntHYOBpQ2wbOZSXvckXG9abIOV7o9WrEmnGz684lpyL7Qv6FkAR5IwFL/AcGthNh2D/tHhCboEUMXo2V0551xy5piNi/jOAp+Kb2yDJ40LmStsQjwcWk1zbako7cAfcA7Azg4DxoKr8ocqVhR58CyciT3lVo3r0bA/jrna9uxCsT6ovfJHAQ+PqU5H9rQhIqfWfqUHP2kp/0R1IY9iWwrPw32nTfyQyZl6Ml7jHVIFr3gkIAyhlKX1OPchx0XbRvmd6zpzp1u1cvpqgC6dfcPQs8ucMEF7PGRSz11LKI8ukD3AZ936xY0u3vYxg3svWpnrhJ+O3jjF43RpY9SLlF5KtWoh1ICSm1+1upStZ5Y4xnMsGATtjZdXU58uv4yvPoj/9DU6lSQ5KclEYmwWmkbG5bRHlVSOq19/KAkySsEi/ehBaiX+Zxues12HdJkNJ0FEIB9g+0E9V8dPdRz1T/sAZQjig9Yx6xWbcG1gUahCH7bi/0I34tuwBH6Mylgvwi2QIGd6xmRZf9W76m33LCkfShCy9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Liam, On Tue, May 16, 2023 at 12:49:07PM -0400, Liam R. Howlett wrote: > * Peter Xu [230516 11:06]: > > ... > > > > > This seems like you're relying on:- > > > > > > > > *** > > > > CCCCCNNNNN -> CCNNNNNNNN > > > > I had a closer look today, I still think this patch is not really the right > > one. The split/merge order is something we use everywhere and I am not > > convinced it must change as drastic. At least so far it still seems to me > > if we do with what current patch proposed we can have vma fragmentations. > > > > I think I see what you meant, but here I think it's a legal case where we > > should have PPPP rather than CCCC (PPPPPNNNN --> PPNNNNNNNN). > > > > To be explicit, for register I think it _should_ be the case 0 where we > > cannot merge (note: the current code is indeed wrong though, see below): > > > > **** > > PPPPPPNNNNNN > > cannot merge > > > > While for the unregister case here it's case 4: > > > > **** > > PPPPPPNNNNNN > > might become > > PPNNNNNNNNNN > > case 4 below > > > > Here the problem is not that we need to do split then merge (I think it'll > > have the problem of vma defragmentation here), the problem is we simply > > passed in the wrong "prev" vma pointer, IMHO. I've patches attached > > showing what I meant. > > > > I checked the original commit from Andrea and I found that it _was_ correct: > > > > commit 86039bd3b4e6a1129318cbfed4e0a6e001656635 > > Author: Andrea Arcangeli > > Date: Fri Sep 4 15:46:31 2015 -0700 > > > > userfaultfd: add new syscall to provide memory externalization > > > > I had a feeling that it's broken during the recent rework on vma (or maybe > > even not that close), but I'm not yet sure and need to further check. > > I believe this was 69dbe6daf1041e32e003f966d71f70f20c63af53, which is my > work on this area. Any patches will need to be backported to before > the vma iterator for 6.1 Thanks for confirming this. > > I think keeping the asserts and ensuring we are calling vma_merge() with > arguments that make sense is safer. Yes, definitely. > > > > > > > > > > > By specifying parameters that are compatible with N even though you're only > > > > partially spanning C? > > > > > > > > This is crazy, and isn't how this should be used. vma_merge() is not > > > > supposed to do partial merges. If it works (presumably it does) this is not > > > > by design unless I've lost my mind and I (and others) have somehow not > > > > noticed this?? > > > > > > > > I think you're right that now we'll end up with more fragmentation, but > > > > what you're suggesting is not how vma_merge() is supposed to work. > > > > > > > > As I said above, giving vma_merge() invalid parameters is very dangerous as > > > > you could end up merging over empty ranges in theory (and could otherwise > > > > have corruption). > > > > > > > > I guess we should probably be passing 0 to the last parameter in > > > > split_vma() here then to ensure we do a merge pass too. Will experiment > > > > with this. > > > > > > > > I'm confused as to how the remove from case 8 is not proceeding. I'll look > > > > into this some more... > > > > > > > > Happy to be corrected if I'm misconstruing this! > > > > > > > > > > OK, so I wrote a small program to do perform exactly this case [0] and it seems > > > that the outcome is the same before and after this patch - vma_merge() is > > > clearly rejecting the case 8 merge (phew!) and in both instances you end up with > > > 3 VMAs. > > > > > > So this patch doesn't change this behaviour and everything is as it was > > > before. Ideally we'd let it go for another pass, so maybe we should change the > > > split to add a new VMA _afterwards_. Will experiment with that, separately. > > > > > > But looks like the patch is good as it is. > > > > > > (if you notice something wrong with the repro, etc. do let me know!) > > > > > > [0]: https://gist.github.com/lorenzo-stoakes/a11a10f5f479e7a977fc456331266e0e > > > > I think your test case is fine, but... no, this is still not expected. The > > vma should just merge. > > > > So I have another closer look on this specific issue, it seems we have a > > long standing bug on pgoff calculation here when passing that to > > vma_merge(). I've got another patch attached to show what I meant. > > > > To summarize.. now I've got two patches attached: > > > > Patch 1 is something I'd like to propose to replace this patch that fixes > > incorrect use of vma_merge() so it should also eliminate the assertion > > being triggered (I still think this is a regression but I need to check > > which I will do later; I'm not super familiar with maple tree work, maybe > > you or Liam can quickly spot). > > > > Patch 2 fixes the long standing issue of vma not being able to merge in > > above case, and with patch 2 applied it should start merging all right. > > > > Please have a look, thanks. > > > > ---8<--- > > From 6bc39028bba246394bb0bafdaeaab7b8dfd1694f Mon Sep 17 00:00:00 2001 > > From: Peter Xu > > Date: Tue, 16 May 2023 09:03:22 -0400 > > Subject: [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of > > vma > > > > It seems vma merging with uffd paths is broken with either > > register/unregister, where right now we can feed wrong parameters to > > vma_merge() and it's found by recent patch which moved asserts upwards in > > vma_merge(): > > > > https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/ > > > > The problem is in the current code base we didn't fixup "prev" for the case > > where "start" address can be within the "prev" vma section. In that case > > we should have "prev" points to the current vma rather than the previous > > one when feeding to vma_merge(). > > > > This will eliminate the report and make sure vma_merge() calls will become > > legal again. > > > > Signed-off-by: Peter Xu > > --- > > fs/userfaultfd.c | 27 +++++++++++++++++++++------ > > 1 file changed, 21 insertions(+), 6 deletions(-) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index 0fd96d6e39ce..7eb88bc74d00 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -1458,10 +1458,17 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > BUG_ON(!found); > > > > vma_iter_set(&vmi, start); > > - prev = vma_prev(&vmi); > > + vma = vma_find(&vmi, end); > > + if (!vma) > > + goto out_unlock; > > + > > + if (vma->vm_start < start) > > + prev = vma; > > + else > > + prev = vma_prev(&vmi); > > > > ret = 0; > > - for_each_vma_range(vmi, vma, end) { > > + do { > > The iterator may be off by one, depending on if vma_prev() is called or > not. > > Perhaps: > prev = vma_prev(&vmi); /* could be wrong, or null */ > vma = vma_find(&vmi, end); > if (!vma) > goto out_unlock; > > if (vma->vm_start < start) > prev = vma; > > now we know we are at vma with the iterator.. > ret = 0 > do{ > ... Will do, thanks. I think I got trapped similarly when I was looking at xarray months ago where xarray also had similar side effects to have off-by-one the iterator behavior. Do you think it'll make sense to have something describing such side effects for maple tree (or the current vma api), or.. maybe there's already some but I just didn't really know? > > > > cond_resched(); > > > > BUG_ON(!vma_can_userfault(vma, vm_flags)); > > @@ -1517,7 +1524,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > skip: > > prev = vma; > > start = vma->vm_end; > > - } > > + } for_each_vma_range(vmi, vma, end); > > > > out_unlock: > > mmap_write_unlock(mm); > > @@ -1624,9 +1631,17 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, > > BUG_ON(!found); > > > > vma_iter_set(&vmi, start); > > - prev = vma_prev(&vmi); > > + vma = vma_find(&vmi, end); > > + if (!vma) > > + goto out_unlock; > > + > > + if (vma->vm_start < start) > > + prev = vma; > > + else > > + prev = vma_prev(&vmi); > > + > > ret = 0; > > - for_each_vma_range(vmi, vma, end) { > > + do { > > cond_resched(); > > > > BUG_ON(!vma_can_userfault(vma, vma->vm_flags)); > > @@ -1692,7 +1707,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, > > skip: > > prev = vma; > > start = vma->vm_end; > > - } > > + } for_each_vma_range(vmi, vma, end); > > > > out_unlock: > > mmap_write_unlock(mm); > > -- > > 2.39.1 > > > > From bf61f3c937e9e2ab96ab2bed0005cb7dc74db231 Mon Sep 17 00:00:00 2001 > > From: Peter Xu > > Date: Tue, 16 May 2023 09:39:38 -0400 > > Subject: [PATCH 2/2] mm/uffd: Allow vma to merge as much as possible > > > > We used to not pass in the pgoff correctly when register/unregister uffd > > regions, it caused incorrect behavior on vma merging. > > > > For example, when we have: > > > > vma1(range 0-9, with uffd), vma2(range 10-19, no uffd) > > > > Then someone unregisters uffd on range (5-9), it should become: > > > > vma1(range 0-4, with uffd), vma2(range 5-19, no uffd) > > > > But with current code it's: > > > > vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd) > > > > This patch allows such merge to happen correctly. > > > > This behavior seems to have existed since the 1st day of uffd, keep it just > > as a performance optmization and not copy stable. > > > > Cc: Andrea Arcangeli > > Cc: Mike Rapoport (IBM) > > Signed-off-by: Peter Xu > > --- > > fs/userfaultfd.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index 7eb88bc74d00..891048b4799f 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -1332,6 +1332,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > bool basic_ioctls; > > unsigned long start, end, vma_end; > > struct vma_iterator vmi; > > + pgoff_t pgoff; > > > > user_uffdio_register = (struct uffdio_register __user *) arg; > > > > @@ -1489,8 +1490,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > vma_end = min(end, vma->vm_end); > > > > new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags; > > + pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); > > I don't think this is safe. You are telling vma_merge() something that > is not true and will result in can_vma_merge_before() passing. I mean, > sure it will become true after you split (unless you can't?), but I > don't know if you can just merge a VMA that doesn't pass > can_vma_merge_before(), even for a short period? I must admit I'm not really that handy yet to vma codes, so I could miss something obvious. My reasoning comes from two parts that this pgoff looks all fine: 1) It's documented in vma_merge() in that: * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name), * figure out ... So fundamentally this pgoff is part of the mapping request paired with all the rest of the information. AFAICT it means it must match with what "addr" is describing in VA address space. That's why I think offseting it makes sense here. It also matches my understanding in vma_merge() code on how the pgoff is used. 2) Uffd is nothing special in this regard, namely: mbind_range(): pgoff = vma->vm_pgoff + ((vmstart - vma->vm_start) >> PAGE_SHIFT); merged = vma_merge(vmi, vma->vm_mm, *prev, vmstart, vmend, vma->vm_flags, vma->anon_vma, vma->vm_file, pgoff, new_pol, vma->vm_userfaultfd_ctx, anon_vma_name(vma)); mlock_fixup(): pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); *prev = vma_merge(vmi, mm, *prev, start, end, newflags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), vma->vm_userfaultfd_ctx, anon_vma_name(vma)); mprotect_fixup(): pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); *pprev = vma_merge(vmi, mm, *pprev, start, end, newflags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), vma->vm_userfaultfd_ctx, anon_vma_name(vma)); I had a feeling that it's just something overlooked in the initial proposal of uffd, but maybe I missed something important? > > > prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags, > > - vma->anon_vma, vma->vm_file, vma->vm_pgoff, > > + vma->anon_vma, vma->vm_file, pgoff, > > vma_policy(vma), > > ((struct vm_userfaultfd_ctx){ ctx }), > > anon_vma_name(vma)); > > @@ -1570,6 +1572,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, > > unsigned long start, end, vma_end; > > const void __user *buf = (void __user *)arg; > > struct vma_iterator vmi; > > + pgoff_t pgoff; > > > > ret = -EFAULT; > > if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister))) > > @@ -1677,8 +1680,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, > > uffd_wp_range(vma, start, vma_end - start, false); > > > > new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS; > > + pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); > > prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags, > > - vma->anon_vma, vma->vm_file, vma->vm_pgoff, > > + vma->anon_vma, vma->vm_file, pgoff, > > vma_policy(vma), > > NULL_VM_UFFD_CTX, anon_vma_name(vma)); > > if (prev) { > > -- > > 2.39.1 > > ---8<--- > > > > -- > > Peter Xu > > > -- Peter Xu