From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4DDFC04AA7 for ; Wed, 20 Sep 2023 16:12:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 326A66B0184; Wed, 20 Sep 2023 12:12:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D5B46B0186; Wed, 20 Sep 2023 12:12:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 19E896B018F; Wed, 20 Sep 2023 12:12:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0C46B6B0184 for ; Wed, 20 Sep 2023 12:12:02 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D5BFC14032E for ; Wed, 20 Sep 2023 16:12:01 +0000 (UTC) X-FDA: 81257467242.10.3198906 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by imf01.hostedemail.com (Postfix) with ESMTP id 9071C4002C for ; Wed, 20 Sep 2023 16:11:59 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bZQfFaMw; spf=pass (imf01.hostedemail.com: domain of jannh@google.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695226319; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zAzaV/SRyN9p4hGa3qLm9e9pe4b6sS/pBR+spEyhDRk=; b=Y+VOnFDeZB52t4GM9WQtxMKoEFy4WLwKbPFQDbFBaJpnVqORYUkPbgPhxEveHp3q4nZlAd Bzh2wPFA9txJWsSnxIxaIq/A54q46D+Ly2j+9UHdhWT0+ldQiqtFQEkJ66tbO2UsKkiiQQ Z9EEKdi05HtDBh3bVKjFQl7WMufJKnQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695226319; a=rsa-sha256; cv=none; b=163e13gjdSi9SCXQXcKnfW/wlCnT4L5+nDhe1Np28xrXVOm3jiFNZJsHvJnH2mmvKCansj PYmDnChO/HHS8toWa3BSIql3SgjJac4qCb7AuCmYyxpEoJIksZmpfXWwbibuq2qJGaAnhc o2p8RWq036QJ1FnG+8V131zFun/XyPQ= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bZQfFaMw; spf=pass (imf01.hostedemail.com: domain of jannh@google.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-4051039701eso108245e9.1 for ; Wed, 20 Sep 2023 09:11:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695226318; x=1695831118; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zAzaV/SRyN9p4hGa3qLm9e9pe4b6sS/pBR+spEyhDRk=; b=bZQfFaMwa+NMWOyVlpuwbwabHwtlTeejkdRPuP//vLchJ00hhCs1U074ALTel5AzBo d2Q4whY1rXe2JD5JxzMAi9NKm+4BOKk20DjyRXkGJoBOYxpM66gBJjy+bXJv+dO1stBe pO1aZsCkkqmYY7zuMBL3WaSpH6KYvBesSfLCRlMA8NaIoaMBB1nyyoQRmSpHrmeihz+L xziHnpJFOuq/GbcJcHWD2yYDG9nlYFHbxO7iz/cDdeS+l0G//9uvvMh7qoYorrIBw3P5 LDdWbXJ0GOeJ2GitrggjZQIaqZM4jXaDNZqliKub0BpeT+GvgwSGQLqy8Y40QkOV5hgM hURg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695226318; x=1695831118; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zAzaV/SRyN9p4hGa3qLm9e9pe4b6sS/pBR+spEyhDRk=; b=JZ5mXH4yAGpcPJGLDWqMG5U5s9TpKKMIj8gwGV+aQ6hin5pbR4rtS653lmg3ijR13Q 1KsU3MElElFYlAaehPgRMb9ggT30x7TsUf943vTlIdjYoJWRSayMZqp8m1ChHGMr2ZEm KqWX8Y9OifrZncdEjdQPQrCGnSpog8yBEkf0G+oYVeE9BtZLEwkwiW4npe0BxLXk7L1f JcyuGtHLeGRQ1lBKU+1WSVIGvW+8QU9qphszF5vo7fQ84ULgL+S4RVdfbK8BBI/gfd1A Y0+tD/SDzdy+afzV360IXMxf8bn16oZcWYoOy3LyuvUbE8Dej4FP2HZlLjMrWK8siTrP HUJA== X-Gm-Message-State: AOJu0Yx5R/UslO2LbGeIkzOvSHjpeGGdwz7MD/B/r6od2ewzN3YdZ2fu BdjMazaos5jhC2cpn++nwUPYwmy0ToAFb686wfYAWQ== X-Google-Smtp-Source: AGHT+IHy0hvFkvPj2pc9M2fqH7uFFxdkVHYrAu/hwKzERNMYWLTY0Kzs43mFyS5OI/qFiRhyWmIsIIv0zWuuaq9PecI= X-Received: by 2002:a05:600c:3217:b0:3f7:3e85:36a with SMTP id r23-20020a05600c321700b003f73e85036amr88893wmp.7.1695226317753; Wed, 20 Sep 2023 09:11:57 -0700 (PDT) MIME-Version: 1.0 References: <20230914152620.2743033-1-surenb@google.com> <20230914152620.2743033-3-surenb@google.com> In-Reply-To: From: Jann Horn Date: Wed, 20 Sep 2023 18:11:19 +0200 Message-ID: Subject: Re: [PATCH 2/3] userfaultfd: UFFDIO_REMAP uABI To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, shuah@kernel.org, aarcange@redhat.com, lokeshgidra@google.com, peterx@redhat.com, david@redhat.com, hughd@google.com, mhocko@suse.com, axelrasmussen@google.com, rppt@kernel.org, willy@infradead.org, Liam.Howlett@oracle.com, zhangpeng362@huawei.com, bgeffon@google.com, kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: gpber4of6ogsn7mnbnwn1sjncbkzuhka X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 9071C4002C X-Rspam-User: X-HE-Tag: 1695226319-463135 X-HE-Meta: U2FsdGVkX18A61j3dgO9Imq4Yt0W7jeZbvHEOCJRFc8UwT58gNitiOnIi5Jo9vVwNk28D7y2xxsUsT3jZRg3wq5raD+yt1h5X5xmR6VrQJRw51S/ten+D20qg8YVADEGE/aXuJ6N7ToxC2+CT6uSboIUjCwqWtCG+DKjNSWumJWiOK3BEZRv7vZWyRZjIfoEoL8oiN3YIO3FlHt5rHm+dbo9EQuN1XtIP9/7UG48FleOzjdEBFfB/lf1xjpWpNJ+8vmySGco0VkyhZtDsWh/9xxHYwxbeShIGIs2srpBdmehwMNIQQiCahPJZlm2GTU8re2DCs/EfxBtt57SG55mstTEFDMxm8f/14LX/WUE/L51+QEitDKhqQ7ckqMADXTdaxyOp0OYSPOCCC2XRXFg4I3HAQAW664TQNPUIGhQjLA46svp1dTh4EzLS/6tmYZ7zn/zUMtL6LQDowKS9d8Pd9VHYE13BNwmVvHDLcNWrvuPl4s26wwILLt1xu9gQAZVOzKF1qoWny6neP40cMWmzkQZEa+9TlYGbtjULyoA981gLKLDgigxAufbtJLMtkbSC11Qj8X4UzximwbUBMRfEzbFEgATUgo9tSduOzsdCsOzG0e2A7BKoNHk3R1zq3VTRhNHY+WIWWe9Z/q+bEEUZJp/BM2yOer3+ktm+hVM2dEpQ1HtOfBWpLHpjAouaFL1eR7v5JTtTPFY9rAmT7cWfBBuBFRNNOue3gncmeaYyLA7P84Z6W4UUJ+ajTMUShq13ydlM5rtk+gYZ1tb1lpRFBcOHytSpwzgG1Cirik3UZuah1R+GwNvDBItVLkWqVQeT1vyQH/U6PUsIh30zNQ137Rk3Q8UkCrxsuC9agbYXUuYzJVuXouMLs1UYv+qlE0q1ou5FrahS9oNb+Km/GLHbo7Ij/FYxcvEpooZUKJ1qaCQKhdcjpPZmioVobCWMm71sSeY09x2QBde7vqD6bF QSsdA9lL gCYZiOdvDu06kwBnpZ6+LKW4Uuh2yeE6RCiWO4Woj4cSMW52E6k3MlN9nkUxzXELQcLtW0hYTupu11t/h4SNBn5FWaWKbbEPhVPFATCuqVwWSShkc6qbUPssoQiIswSywQKcfBov+rGfySjfy9y2naPxe8/XsJZ0Qe551qYfuPuQaLzh0hQC+HAy2MMV+Dc6t3nvb7UJLKMylawBUn2TP9p/WSDZU5X8B+IcutYzWIIaHS54WYWPlMKnsduSZGqZsLuFecDMkYQxymy0IgsOJuUD3aKRQ6R2F7CTUanWOK0y5OU5KZC6oi21KBrL+Nd18Xk+5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000015, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 20, 2023 at 3:49=E2=80=AFAM Suren Baghdasaryan wrote: > On Tue, Sep 19, 2023 at 4:51=E2=80=AFPM Jann Horn wrot= e: > > On Wed, Sep 20, 2023 at 1:08=E2=80=AFAM Suren Baghdasaryan wrote: > > > On Thu, Sep 14, 2023 at 7:28=E2=80=AFPM Jann Horn = wrote: > > > > On Thu, Sep 14, 2023 at 5:26=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > From: Andrea Arcangeli > > > > > > > > > > This implements the uABI of UFFDIO_REMAP. > > > > > > > > > > Notably one mode bitflag is also forwarded (and in turn known) by= the > > > > > lowlevel remap_pages method. > > [...] > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > [...] > > > > > +int remap_pages_huge_pmd(struct mm_struct *dst_mm, > > > > > + struct mm_struct *src_mm, > > > > > + pmd_t *dst_pmd, pmd_t *src_pmd, > > > > > + pmd_t dst_pmdval, > > > > > + struct vm_area_struct *dst_vma, > > > > > + struct vm_area_struct *src_vma, > > > > > + unsigned long dst_addr, > > > > > + unsigned long src_addr) > > > > > +{ > > > > > + pmd_t _dst_pmd, src_pmdval; > > > > > + struct page *src_page; > > > > > + struct anon_vma *src_anon_vma, *dst_anon_vma; > > > > > + spinlock_t *src_ptl, *dst_ptl; > > > > > + pgtable_t pgtable; > > > > > + struct mmu_notifier_range range; > > > > > + > > > > > + src_pmdval =3D *src_pmd; > > > > > + src_ptl =3D pmd_lockptr(src_mm, src_pmd); > > > > > + > > > > > + BUG_ON(!pmd_trans_huge(src_pmdval)); > > > > > + BUG_ON(!pmd_none(dst_pmdval)); > > > > > > > > Why can we assert that pmd_none(dst_pmdval) is true here? Can we no= t > > > > have concurrent faults (or userfaultfd operations) populating that > > > > PMD? > > > > > > IIUC dst_pmdval is a copy of the value from dst_pmd, so that local > > > copy should not change even if some concurrent operation changes > > > dst_pmd. We can assert that it's pmd_none because we checked for that > > > before calling remap_pages_huge_pmd. Later on we check if dst_pmd > > > changed from under us (see pmd_same(*dst_pmd, dst_pmdval) check) and > > > retry if that happened. > > > > Oh, right, I don't know what I was thinking when I typed that. > > > > But now I wonder about the check directly above that: What does this > > code do for swap PMDs? It looks like that might splat on the > > BUG_ON(!pmd_trans_huge(src_pmdval)). All we've checked on the path to > > here is that the virtual memory area is aligned, that the destination > > PMD is empty, and that pmd_trans_huge_lock() succeeded; but > > pmd_trans_huge_lock() explicitly permits swap PMDs (which is the > > swapped-out version of transhuge PMDs): > > > > static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, > > struct vm_area_struct *vma) > > { > > if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pm= d)) > > return __pmd_trans_huge_lock(pmd, vma); > > else > > return NULL; > > } > > Yeah... Ok, I think I'm missing a check for pmd_trans_huge(*src_pmd) > after we lock it with pmd_trans_huge_lock(src_pmd, src_vma). And we > can remove the above BUG_ON(). Would that address your concern? Sounds good. It'll end up splitting huge swap entries but I guess the extra code for moving huge swap entries might not be worth it.