From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77655C77B7D for ; Wed, 17 May 2023 13:50:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B1FD900004; Wed, 17 May 2023 09:50:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0631D900003; Wed, 17 May 2023 09:50:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E6DA5900004; Wed, 17 May 2023 09:50:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D35DB900003 for ; Wed, 17 May 2023 09:50:39 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 866281602D5 for ; Wed, 17 May 2023 13:50:39 +0000 (UTC) X-FDA: 80799882198.24.DA83219 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 543ED12000E for ; Wed, 17 May 2023 13:50:35 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MAw9vLQh; spf=pass (imf29.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684331436; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TI0VZH+WSVrN44Kmcp1GvtC02E21cPAJ7dTSUpnucn8=; b=QJTmQ9PtMA3PrpjiPnq6EVnzrgU3DCAQd+KuABa7Mw2XXhrewBfPgUhPunikgpw03EYQg6 AM9VeUJ5M5DUvmKnEcXmw0N4VJEh4g9vZf3Q6lotZV3EQzUhRf9feGImktCvBWxRN9CDoD dsoYPEwI2zWoaYEpTHR09itc7OAhhLE= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MAw9vLQh; spf=pass (imf29.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684331436; a=rsa-sha256; cv=none; b=8kgpZ0CmYVcpf9aUCxTpHAe+xCyPbACeSoNbx4ouux4Zj0M+pRcVuVYLviwP3oYQ/5ie3Q eTtoAzWF71V41QDo25ktGELMQO5z4JVfLLIyf9lTpgOSsBG9H9ARrNn2g88bgJKZB5ZrqQ iiaq3aPedd5WBz8Dabr7llQWYsbO3dQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684331435; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TI0VZH+WSVrN44Kmcp1GvtC02E21cPAJ7dTSUpnucn8=; b=MAw9vLQhPYQs0gaNnmleeGeT6uQ6XAPT6/WXDS5YJPAVhzwHQGA32FWRvbTXfQ096Eexxd T6y1smYkoqq6MQsTh5ey2hmR31LS2sVuzTE6ekYkkFK2jUJnow592rDyCg5u3B8Fq3JojF YrbA24jRUGRTwBTcNSY5bKfeazjXLfg= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-20-9hPtHtFcN2O2Zr5ptfE11Q-1; Wed, 17 May 2023 09:50:34 -0400 X-MC-Unique: 9hPtHtFcN2O2Zr5ptfE11Q-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-62387ccd3bdso1031636d6.1 for ; Wed, 17 May 2023 06:50:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684331433; x=1686923433; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=TI0VZH+WSVrN44Kmcp1GvtC02E21cPAJ7dTSUpnucn8=; b=kwSwHNPAR1sJ7JOuWHihkXKi3FDT7jwE6EFIf0cznFUY1BkUXa7KHdQok1SojJXGSU mmUcsCNsZIXA9XBrTQQxg0TJhGPVXI8afs4LlEuMdJ82IjR3/5NabptFym25Q+EpvstV ttiI9m1d1i+UiBPPC2vDyraukSiQa6A3T+Tya39FdfsaIMvCoqvm1xvgjdCT/KUAE7ar Fr2neX9QwSxLalzkLxb2Gqiakis47yadOPyEDCXk/JcgVK7Em7n2R4sYETlNR0NMt4TC jMx69O14UO2q6G6cFUKQ6AYs00oxokBiwrGvmeWUV/4X5hikQgCS/k5t8M3+Fud1uanW RcmA== X-Gm-Message-State: AC+VfDws3B3mcVUV4WRvUkQb30ohJF3h0FV5lnl4cZqUZmjcVdp7ztXf ju5gk2duZb/KbY955en+EwBFLAU8DSpevBayZN5JLKr/2fnlMksBJ0q/QvEJK3jawmd4lW9FlU8 1O0H/OUbX8FM= X-Received: by 2002:a05:6214:20ef:b0:532:141d:3750 with SMTP id 15-20020a05621420ef00b00532141d3750mr5081497qvk.2.1684331433431; Wed, 17 May 2023 06:50:33 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4uif3GRV4sMIoeC771Xly3pxTm+CaxTaWQYnbtcEJVhqzuAylyTOfP3N5RygXm6HOL/k0lLg== X-Received: by 2002:a05:6214:20ef:b0:532:141d:3750 with SMTP id 15-20020a05621420ef00b00532141d3750mr5081439qvk.2.1684331433047; Wed, 17 May 2023 06:50:33 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-62-70-24-86-62.dsl.bell.ca. [70.24.86.62]) by smtp.gmail.com with ESMTPSA id l18-20020a0ce512000000b006215c5bb2e9sm4911247qvm.70.2023.05.17.06.50.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 May 2023 06:50:32 -0700 (PDT) Date: Wed, 17 May 2023 09:50:31 -0400 From: Peter Xu To: "Liam R. Howlett" , Lorenzo Stoakes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Mike Rapoport , Mark Rutland , Alexander Viro , Christian Brauner Subject: Re: [PATCH v2] mm: userfaultfd: avoid passing an invalid range to vma_merge() Message-ID: References: <20230515193232.67552-1-lstoakes@gmail.com> <20abea10-5307-498b-b9df-8f0b2fed1701@lucifer.local> <20230516164907.5bf2qe7nn5bo7iaa@revolver> <20230516225251.xwmz2oyebo7k56ys@revolver> MIME-Version: 1.0 In-Reply-To: <20230516225251.xwmz2oyebo7k56ys@revolver> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 543ED12000E X-Stat-Signature: nccw68xsmgujgcxx15zxn97a5wj3fe7x X-HE-Tag: 1684331435-151791 X-HE-Meta: U2FsdGVkX18HJlXTJZBZ/Rgn36jl7U/oyzECRsZdXQgsxdFM6D8sBa6jnK3kbXuOd0oM1hPnHHQUZamRo2mh72MhH5yXxOiPMJ+9VmfKctEcf5hOmtx4GDFImvhZ6SpwDyTUwM1m1hljDAaNvyHgo3KO/g74nHOltpdq31VutiwBnxzLIncH5NKEEH8uL4aSrZC9bXgI7xBpDQciYLa9c6OmeWogMNxIhkZ6fUJunBXEyDH0BSk97AcjL53/pnkE882sEKC6ATwermJtHFBhD9tFtRjWPA7w7q4QupaaKHb8wO5uqJNc+mSQ8zlKjM11q+6Yccz8lXYkSpuNe+PkKD65J9kah6j5VpFd8h67S/lkN8GY/9JqqlNxHrose6qsyuMpFSTS89RP3eqx7GgIfv06u8vCCQuDE6k5Le/I3Gvx4nEJ89V7M1vKyxKr+vFOoIkhbaaAkmRUx7qpIoz6cj7sqYoAtxC3UX59qlAcwP+ZdGsKczvTGwXJJQYU8YRY1ov5UpxDZMxlrTMsh8+/ZzKxf1/UN+z17FfNjIvtayqPBfXcvBU33ImilF2mJtdyNP5YlQ8kQKAh+48tDMEEsRKs2yELVj+EFwaAbbIBbMh9j4JMj8zEqAkvWvzu8intNgWa5XkDnI7jjYybuA5TJ53nGClIcp8FL2Sm2nABM1kz34sBKNjP0fF+yo028pdH6NM0igJprW3eLNJ+QcgfCWNbjULF5F446h4e9mmNP2i21BsTGuuWsQNVIwqM40SpJFPZyqDPCqoD3i1CcTHhpv+M2H6HbZ7esXjwnv5BxJjsx9cxfuzCPE0sOiSZXShYY9VGtjhy2iSCa6w0dtN8EFTTZniP7Rme8C/BfgxFN4kgEkKm5oxegamDgUZnk4Sm7oB2vtAkhJn9W9DP2TEDXVRr68pIYp6MgKeniN3ds/GjesExct3So/yXRvOALQEq64ruV00VHQg/KC5alTG 1Il+OBow BM2feq4mVUJgh8hOt8rctnX4jtLnzsJhp1cvEz7VUngT96vi6+LkfUfUn6lrWes0TpCkazhOuZ3HsfUnyqbjnvELQCrsqIfXBqXv79DjHXMplCfzqsAlNx6CY5boP3g36zGsFfQz/R18Dre0hEe9ZVtyLO+EPa/BSMVEy7Uj47eLdqcUL+Wql4h3SLpVq/oZ2TzAfEEKXY2Va6PPgjbDJ3axf/fGnxPOtQXJjqjLya/JYltsiyg2PK0iH/6XpRw2LQcSbyYsN+ewQ+W/g8XGhwQvIl+94zC0wYB7UWM0qe99ml5FfsmrphMaQc+VYwI/vcIfIUtgd9ncoOiVv3RaUQ3QbhOFpXP3ywM7EgImQAnMsiJYnAG8HtKV+vxHdnC8F1EZXc3ZYcFnXv95NJSn5LR+wrzdgyNsCFMmZfxIofN/abe5iLKiGs39WU5OkmLBMCxALC+W8OkImE8v0+Fz85si0eq/LzAlju2yQ+gHYTdWjoZM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 16, 2023 at 06:52:51PM -0400, Liam R. Howlett wrote: > * Peter Xu [230516 16:12]: > ... > > > > > > > > > Signed-off-by: Peter Xu > > > > --- > > > > fs/userfaultfd.c | 27 +++++++++++++++++++++------ > > > > 1 file changed, 21 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > > > index 0fd96d6e39ce..7eb88bc74d00 100644 > > > > --- a/fs/userfaultfd.c > > > > +++ b/fs/userfaultfd.c > > > > @@ -1458,10 +1458,17 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > > > BUG_ON(!found); > > > > > > > > vma_iter_set(&vmi, start); > > > > - prev = vma_prev(&vmi); > > > > + vma = vma_find(&vmi, end); > > > > + if (!vma) > > > > + goto out_unlock; > > > > + > > > > + if (vma->vm_start < start) > > > > + prev = vma; > > > > + else > > > > + prev = vma_prev(&vmi); > > > > > > > > ret = 0; > > > > - for_each_vma_range(vmi, vma, end) { > > > > + do { > > > > > > The iterator may be off by one, depending on if vma_prev() is called or > > > not. > > > > > > Perhaps: > > > prev = vma_prev(&vmi); /* could be wrong, or null */ > > > vma = vma_find(&vmi, end); > > > if (!vma) > > > goto out_unlock; > > > > > > if (vma->vm_start < start) > > > prev = vma; > > > > > > now we know we are at vma with the iterator.. > > > ret = 0 > > > do{ > > > ... > > > > Will do, thanks. > > > > I think I got trapped similarly when I was looking at xarray months ago > > where xarray also had similar side effects to have off-by-one the iterator > > behavior. > > > > Do you think it'll make sense to have something describing such side > > effects for maple tree (or the current vma api), or.. maybe there's already > > some but I just didn't really know? > > Well, it's an iterator so I though a position was implied. But I think > the documentation is lacking on the vma_iterator interface and I should > fix that. Thanks. > > ... > > > > > From: Peter Xu > > > > Date: Tue, 16 May 2023 09:39:38 -0400 > > > > Subject: [PATCH 2/2] mm/uffd: Allow vma to merge as much as possible > > > > > > > > We used to not pass in the pgoff correctly when register/unregister uffd > > > > regions, it caused incorrect behavior on vma merging. > > > > > > > > For example, when we have: > > > > > > > > vma1(range 0-9, with uffd), vma2(range 10-19, no uffd) > > > > > > > > Then someone unregisters uffd on range (5-9), it should become: > > > > > > > > vma1(range 0-4, with uffd), vma2(range 5-19, no uffd) > > > > > > > > But with current code it's: > > > > > > > > vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd) > > > > > > > > This patch allows such merge to happen correctly. > > > > > > > > This behavior seems to have existed since the 1st day of uffd, keep it just > > > > as a performance optmization and not copy stable. > > > > > > > > Cc: Andrea Arcangeli > > > > Cc: Mike Rapoport (IBM) > > > > Signed-off-by: Peter Xu > > > > --- > > > > fs/userfaultfd.c | 8 ++++++-- > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > > > index 7eb88bc74d00..891048b4799f 100644 > > > > --- a/fs/userfaultfd.c > > > > +++ b/fs/userfaultfd.c > > > > @@ -1332,6 +1332,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > > > bool basic_ioctls; > > > > unsigned long start, end, vma_end; > > > > struct vma_iterator vmi; > > > > + pgoff_t pgoff; > > > > > > > > user_uffdio_register = (struct uffdio_register __user *) arg; > > > > > > > > @@ -1489,8 +1490,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > > > vma_end = min(end, vma->vm_end); > > > > > > > > new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags; > > > > + pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); > > > > > > I don't think this is safe. You are telling vma_merge() something that > > > is not true and will result in can_vma_merge_before() passing. I mean, > > > sure it will become true after you split (unless you can't?), but I > > > don't know if you can just merge a VMA that doesn't pass > > > can_vma_merge_before(), even for a short period? > > > > I must admit I'm not really that handy yet to vma codes, so I could miss > > something obvious. > > > > My reasoning comes from two parts that this pgoff looks all fine: > > > > 1) It's documented in vma_merge() in that: > > > > * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name), > > * figure out ... > > > > So fundamentally this pgoff is part of the mapping request paired with > > all the rest of the information. AFAICT it means it must match with what > > "addr" is describing in VA address space. That's why I think offseting > > it makes sense here. > > > > It also matches my understanding in vma_merge() code on how the pgoff is > > used. > > > > 2) Uffd is nothing special in this regard, namely: > > > > mbind_range(): > > > > pgoff = vma->vm_pgoff + ((vmstart - vma->vm_start) >> PAGE_SHIFT); > > merged = vma_merge(vmi, vma->vm_mm, *prev, vmstart, vmend, vma->vm_flags, > > vma->anon_vma, vma->vm_file, pgoff, new_pol, > > vma->vm_userfaultfd_ctx, anon_vma_name(vma)); > > > > mlock_fixup(): > > > > pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); > > *prev = vma_merge(vmi, mm, *prev, start, end, newflags, > > vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), > > vma->vm_userfaultfd_ctx, anon_vma_name(vma)); > > > > mprotect_fixup(): > > > > pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); > > *pprev = vma_merge(vmi, mm, *pprev, start, end, newflags, > > vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), > > vma->vm_userfaultfd_ctx, anon_vma_name(vma)); > > > > I had a feeling that it's just something overlooked in the initial proposal > > of uffd, but maybe I missed something important? > > I think you are correct. It's worth noting that all of these skip > splitting if merging succeeds. Yes, IIUC that's what we want because vma_merge() just handles everything there (including split, or say, vma range adjustments) if any !NULL returned. > > We know it won't match case 1-4 (we have a current vma). We then pass > in vma_end = min(end, vma->vm_end); Case 4 seems still possible and should be the case that mentioned in the patch 2, iiuc. But yes I think vma_end calculation is needed, afaik it is to cover the last iteration, where that's the only place possible that we may operate on "end" (where < vma->vm_end) rather than "vma->vm_end". It actually pairs with the initial "start" adjustment to me. > > vma_lookup() will only be called if end == vma->vm_end, so next will not > be set (and found) unless it is adjacent to the current vma and the vma > in question does not need to be split anyways. > > I also see that we use pgoff+pglen in the check, which avoids my concern > above. Right. It seems so far all concerns are more or less ruled out. I'll prepare a formal patchset, we can continue the discussion there. Thanks, -- Peter Xu