From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE5F0E7D0BD for ; Fri, 22 Sep 2023 01:57:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5430C6B0251; Thu, 21 Sep 2023 21:57:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F2E36B0252; Thu, 21 Sep 2023 21:57:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3BA8B6B0256; Thu, 21 Sep 2023 21:57:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2DE3F6B0251 for ; Thu, 21 Sep 2023 21:57:59 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EE83A8033D for ; Fri, 22 Sep 2023 01:57:58 +0000 (UTC) X-FDA: 81262572636.26.47C01C3 Received: from mail-yb1-f174.google.com (mail-yb1-f174.google.com [209.85.219.174]) by imf23.hostedemail.com (Postfix) with ESMTP id 2402914000B for ; Fri, 22 Sep 2023 01:57:55 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=FU7djV9V; spf=pass (imf23.hostedemail.com: domain of surenb@google.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695347876; a=rsa-sha256; cv=none; b=tUk+TixVXEkSy0BcGPsjAlh4zWNKKDvmtgAPnD78x0z5ybd6pkGNovArueKAjkEhirg72u 8it32rex3qYAzxHz7Y0U2mVHzGrrm0StgxE00sCDRNhBOQKe+C3Z9qT3dYC37B407bz6fC WWa/awMgYbzgWmEudb6OQhbxAnKX+hU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=FU7djV9V; spf=pass (imf23.hostedemail.com: domain of surenb@google.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695347876; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0VO+Ge+AE8f/BpXh60ZRTwmlgGyLBpX4NWc4kauL7gk=; b=u/ohEo/nVhc+NnmFEsCPLu6R9wBXGm4DB8nv991auuejOc/wcncm+ywSEjbKOUehI0t8cZ ZpdTyywnrWjd0wdUl136erFYjVAmIPMVscoX6ifMVRAoh7m8kzkktvKH5AGm7VvbMgU3A5 9ujKKNWr2ggnWXMMwd8btE6Q/SGb190= Received: by mail-yb1-f174.google.com with SMTP id 3f1490d57ef6-d818d4230f6so1993437276.1 for ; Thu, 21 Sep 2023 18:57:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695347875; x=1695952675; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0VO+Ge+AE8f/BpXh60ZRTwmlgGyLBpX4NWc4kauL7gk=; b=FU7djV9V/hAtrS6/0lG+z/Rh0XxwbEh4ydcveEUE1qgK0GrLib+U1CgX+nybzuM/9H SBniLFuTO2LeoB6OYJX9zI+GGD9u2lx5v8rErhwV6fyw7YEeOY3q5/EqWsHZAXJ5pZcN xp8+rXer6yxfISXWFp0RoXsHRkvZKACx0UaqrArlv11HEJ2MqJJV1VdswpDASW9ceFDI PZ4aRAarQTAdx6tit4MZqwbOzYzBWTCu/zRGnZGL9HBYvh9e9F+BOtc69xUhbi+xVlvY NcY6mBV+U5Dp3zaXwL/174C1x5AgICSy0X2WCcaqyDJCGMEAk953FPXAKJifQD88x1a1 Tw+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695347875; x=1695952675; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0VO+Ge+AE8f/BpXh60ZRTwmlgGyLBpX4NWc4kauL7gk=; b=Cl5BKZaMZiPH3QfRIqvHJtSQ25zEpL6p7m2rOZlxggxss4NP90u67t819RH626FE/M yTXN27imw67OiYiBmDKtL9d3R2BDmlrexMTNTXYLw2ht+B0xEkOYVXybKYwDPP/Tr78k MofJMmslmDFW2Vbdmwakfk4jPQl6czfVdjE9XSo0jfDGLtJnFSPzmuz+1F8N4AG5FWYd 0tPF1rmn86GyorZHvFQ/hQE7TxjYAJ01xzLYkBqgeQNFVKTku33r7b/kpI1HxjOfZ4+z yzDoe5Gh3CssY5MzuqFgH4h1hWYL0zEUAa/gmki5B9hVQ7Yy+6ABknbyGNkK/1u9I/99 Uzlw== X-Gm-Message-State: AOJu0Yx9l5CiLdixjjYx8S+2xuCLR/EL9U4q8djqDCzRkyhIED2M67Fd 4OVor0AJik0elvMGIO20dUzP2gdO7ailQJmA54C0CQ== X-Google-Smtp-Source: AGHT+IFDkyyPEsrC9Wj9nF/S+MbVDC6dnqaz+XU/fSuXkDfbU5BW9gEPn4qeHv0lwMKIpSIhy5Aq1Uhhnng66UQqtWU= X-Received: by 2002:a25:cf48:0:b0:d7e:df89:dda1 with SMTP id f69-20020a25cf48000000b00d7edf89dda1mr7254798ybg.7.1695347874999; Thu, 21 Sep 2023 18:57:54 -0700 (PDT) MIME-Version: 1.0 References: <20230914152620.2743033-1-surenb@google.com> <20230914152620.2743033-3-surenb@google.com> <354f2508-74d5-2723-502c-32d009f77a3e@redhat.com> In-Reply-To: <354f2508-74d5-2723-502c-32d009f77a3e@redhat.com> From: Suren Baghdasaryan Date: Thu, 21 Sep 2023 18:57:42 -0700 Message-ID: Subject: Re: [PATCH 2/3] userfaultfd: UFFDIO_REMAP uABI To: David Hildenbrand Cc: Matthew Wilcox , akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, shuah@kernel.org, aarcange@redhat.com, lokeshgidra@google.com, peterx@redhat.com, hughd@google.com, mhocko@suse.com, axelrasmussen@google.com, rppt@kernel.org, Liam.Howlett@oracle.com, jannh@google.com, zhangpeng362@huawei.com, bgeffon@google.com, kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2402914000B X-Stat-Signature: 5z49p47b5wcfxqa5ndgpsdhzoz3int7u X-Rspam-User: X-HE-Tag: 1695347875-19640 X-HE-Meta: U2FsdGVkX19Dlo0dF3t40kU2mnlQJEKVgin4dAN9wlwso81ymBTRHKL/mYgF7XaltBDMTBqpqcBV2xaBBMelRQWNNWBgK/2mk3wRemf+GHYjxb+xpxSEhO69mUN+NKaDhhnAd3Cx8eJnBkGKdV2OrL9RMM6g10Qumgcul0o4HA34tBMvGCwzZa0AkGJHaZbSoUXsMVERS0biGjlSxvtAsHRu/xDrC6EteT1uthx7PvFqgYcAZFJdMuqfZT96JlHrby/3FTZIbdOfMDaUnYMYw6XyNYLsvk3lOySNMMQxybRo2TPsvVpQ8iiovZuoItd2B2rZk647w2CN8q9E0/H/I2CqJj0CBklGYSCtJt2h0SE/mXSRq+QZIDW9ncWkhU6LcpW8KueHEfblmMTaa4otscng5jvVoapvhlQj6SKZqFja8khy2Il0OJkEX9QytKqMcY1aRTnKaRT4Vu0yvP+KtCiJA0A3/0yqAbsY04lSQE978uL9nd1mbp5rj+qIztCh4W4sShdDK7WQsnIA/kTWeYHsVhr/egA41rut4iP9nKZmi1+8eGty2YCPRHL2mT6x3VR2MfqW+9NmewFEds/jEXUq4eqsDfrvKFDrYzi7FbKcvJEnYS3fwv8PPjp3mWHuCB0GAMfdyHu/oYMn482y62Ee6RcextF0pzS8kfirnyyGCi4lU60DVPCE6aVDhdiueQ1WlXtHMhMMEitn8VtI2x9G3lexpd82tEZA4YX1RLG42uv9+vaaNDvGI/+KddnX+w7Hd3+UuGuWPlXx0oE/9B187Yk+EdRt+fUkTB5bMaF+lWq/ERK7PWM3/Nqm+OsYte9o7Iy2BmvghEm1bTLfv0NhodyLobWULh4zNPwaI4qY3lOxtrDVmRdEysOSv4wTAP6RDwrdRmbZa582VxyqLejMGrmKogObxe4CcfrZV1S9Q5qUiEVFR8qTwo2RtkFJg8B+xC+E7xmap4FXiHn kjnKzB7s YETWnM3QwgBgEFhRQFqhvgRzo3L779PfvQVviqgK5VAiBPaZCkRmn1UpVBpKrjxv90PuLkZLbJb2ydpqG+9JvmRVW1ZSaMh09CEsvMyVj4hMd9isHwmZVD3uQqESO5o2EumbprlX75sAx16Ir7P+7EilaqAc5H+3HqH55fPKyXWqVP4LQZKZyNsLp0y0m3ktJiemi1bifFG0GMgTrmamBbBwwIjlVISyRbD6NawiPKmFebU1azdOT4XmlFY4IpjAGBqDlmMQqyWlBKQ5GgOAw38mv+/hmR2NqKuX846Ph2u5L+L0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 21, 2023 at 11:17=E2=80=AFAM David Hildenbrand wrote: > > On 21.09.23 20:04, Suren Baghdasaryan wrote: > > On Thu, Sep 14, 2023 at 6:45=E2=80=AFPM David Hildenbrand wrote: > >> > >> On 14.09.23 20:43, David Hildenbrand wrote: > >>> On 14.09.23 20:11, Matthew Wilcox wrote: > >>>> On Thu, Sep 14, 2023 at 08:26:12AM -0700, Suren Baghdasaryan wrote: > >>>>> +++ b/include/linux/userfaultfd_k.h > >>>>> @@ -93,6 +93,23 @@ extern int mwriteprotect_range(struct mm_struct = *dst_mm, > >>>>> extern long uffd_wp_range(struct vm_area_struct *vma, > >>>>> unsigned long start, unsigned long len, bool= enable_wp); > >>>>> > >>>>> +/* remap_pages */ > >>>>> +extern void double_pt_lock(spinlock_t *ptl1, spinlock_t *ptl2); > >>>>> +extern void double_pt_unlock(spinlock_t *ptl1, spinlock_t *ptl2); > >>>>> +extern ssize_t remap_pages(struct mm_struct *dst_mm, > >>>>> + struct mm_struct *src_mm, > >>>>> + unsigned long dst_start, > >>>>> + unsigned long src_start, > >>>>> + unsigned long len, __u64 flags); > >>>>> +extern int remap_pages_huge_pmd(struct mm_struct *dst_mm, > >>>>> + struct mm_struct *src_mm, > >>>>> + pmd_t *dst_pmd, pmd_t *src_pmd, > >>>>> + pmd_t dst_pmdval, > >>>>> + struct vm_area_struct *dst_vma, > >>>>> + struct vm_area_struct *src_vma, > >>>>> + unsigned long dst_addr, > >>>>> + unsigned long src_addr); > >>>> > >>>> Drop the 'extern' markers from function declarations. > >>>> > >>>>> +int remap_pages_huge_pmd(struct mm_struct *dst_mm, > >>>>> + struct mm_struct *src_mm, > >>>>> + pmd_t *dst_pmd, pmd_t *src_pmd, > >>>>> + pmd_t dst_pmdval, > >>>>> + struct vm_area_struct *dst_vma, > >>>>> + struct vm_area_struct *src_vma, > >>>>> + unsigned long dst_addr, > >>>>> + unsigned long src_addr) > >>>>> +{ > >>>>> + pmd_t _dst_pmd, src_pmdval; > >>>>> + struct page *src_page; > >>>>> + struct anon_vma *src_anon_vma, *dst_anon_vma; > >>>>> + spinlock_t *src_ptl, *dst_ptl; > >>>>> + pgtable_t pgtable; > >>>>> + struct mmu_notifier_range range; > >>>>> + > >>>>> + src_pmdval =3D *src_pmd; > >>>>> + src_ptl =3D pmd_lockptr(src_mm, src_pmd); > >>>>> + > >>>>> + BUG_ON(!pmd_trans_huge(src_pmdval)); > >>>>> + BUG_ON(!pmd_none(dst_pmdval)); > >>>>> + BUG_ON(!spin_is_locked(src_ptl)); > >>>>> + mmap_assert_locked(src_mm); > >>>>> + mmap_assert_locked(dst_mm); > >>>>> + BUG_ON(src_addr & ~HPAGE_PMD_MASK); > >>>>> + BUG_ON(dst_addr & ~HPAGE_PMD_MASK); > >>>>> + > >>>>> + src_page =3D pmd_page(src_pmdval); > >>>>> + BUG_ON(!PageHead(src_page)); > >>>>> + BUG_ON(!PageAnon(src_page)); > >>>> > >>>> Better to add a src_folio =3D page_folio(src_page); > >>>> and then folio_test_anon() here. > >>>> > >>>>> + if (unlikely(page_mapcount(src_page) !=3D 1)) { > >>>> > >>>> Brr, this is going to miss PTE mappings of this folio. I think you > >>>> actually want folio_mapcount() instead, although it'd be more effici= ent > >>>> to look at folio->_entire_mapcount =3D=3D 1 and _nr_pages_mapped =3D= =3D 0. > >>>> Not wure what a good name for that predicate would be. > >>> > >>> We have > >>> > >>> * It only works on non shared anonymous pages because those can > >>> * be relocated without generating non linear anon_vmas in the rma= p > >>> * code. > >>> * > >>> * It provides a zero copy mechanism to handle userspace page faul= ts. > >>> * The source vma pages should have mapcount =3D=3D 1, which can b= e > >>> * enforced by using madvise(MADV_DONTFORK) on src vma. > >>> > >>> Use PageAnonExclusive(). As long as KSM is not involved and you don't > >>> use fork(), that flag should be good enough for that use case here. > >>> > >> ... and similarly don't do any of that swapcount stuff and only check = if > >> the swap pte is anon exclusive. > > > > I'm preparing v2 and this is the only part left for me to address but > > I'm not clear how. David, could you please clarify how I should be > > checking swap pte to be exclusive without swapcount? > > If you have a real swp pte (not a non-swap pte like migration entries) > you should be able to just use pte_swp_exclusive(). Got it. Thanks! > > -- > Cheers, > > David / dhildenb >