From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0C57E7D0A4 for ; Thu, 21 Sep 2023 18:04:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B0336B0207; Thu, 21 Sep 2023 14:04:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 161446B0208; Thu, 21 Sep 2023 14:04:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 027A86B0209; Thu, 21 Sep 2023 14:04:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E7BE96B0207 for ; Thu, 21 Sep 2023 14:04:46 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A540780ED9 for ; Thu, 21 Sep 2023 18:04:46 +0000 (UTC) X-FDA: 81261380172.20.5B34F70 Received: from mail-yw1-f180.google.com (mail-yw1-f180.google.com [209.85.128.180]) by imf02.hostedemail.com (Postfix) with ESMTP id C57808002E for ; Thu, 21 Sep 2023 18:04:44 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="kV3p/lDo"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of surenb@google.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695319484; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AdMCh/+yCsduMbxg3dBOP8pPO/0inss321VX5z7Cslw=; b=bOp469Dh2gLmP1kgydVhW0XCy/HlS1ZNyhyX/eWRrP1JI2RFrczgyTg2MsFAJo+iwWdarh KB/8AE4dlYeEOk1kaHIwzSb/dnRbW71zV7eeLq2dMx+TOlyzOEB/IrXw9Bo9UrcA+Runbu viJ/g4PhDIk373vSstGhZDJQJxUe+GI= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="kV3p/lDo"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of surenb@google.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695319484; a=rsa-sha256; cv=none; b=UR2gU7/ddhTRJK8xeRoy4vV4ptFYjc9WqkmWy+BSy27i3uaZ+uVVHnPPhJ5L6de4tKy1BH FBcB3qA0L/8iw8dOLG49qo4LphZ4nIIm7PM4gp/WQygdONqG5823eo17ZqSHuZA1WY1mDb DIRQfTIrVLrLMvZovUQgtYeLHiruxf8= Received: by mail-yw1-f180.google.com with SMTP id 00721157ae682-59c0d329a8bso15460997b3.1 for ; Thu, 21 Sep 2023 11:04:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695319484; x=1695924284; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AdMCh/+yCsduMbxg3dBOP8pPO/0inss321VX5z7Cslw=; b=kV3p/lDogNJX7DOxPdAftza74QXvj3dZjjY42grk8VWO2o5RysNhPnfBCT+MykryHw i6+quIOS7lYpUO1y0kqbcyd10lkDCvsmuOOOPmtkiEi+tfi9U2LdPB94AxULuGFDJvXN ZknfBL5ce/6gWaYH7KKCTthdCWfpAHeJKGpFgiSDQzCpJgjlMxm/b2W3CE0hWyr4DmaW 8qG+sxGUyZSJLlAgclq4qvT/L46sRJBhBCRwwUoAMk4zafu/2cn0/4h6DtynW95i0TG5 rP1KsYONDQfWajae/i6mUtqNvlS1Je7hef5Vjy1HFLzb45slB6oePHJh06LQlIQ5zzhk 7kXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695319484; x=1695924284; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AdMCh/+yCsduMbxg3dBOP8pPO/0inss321VX5z7Cslw=; b=TpCmIcJRFI/de7f/mICOsThcIVfzN1tML7mindeP4W8rkPPw8lef9C5XpKKyYSD5AW gdRqt5ecCHd1O0k3UNUBIDA2QkTnh1yz30zCK6Nj2XX3aVGZFFZPeNeZj8vG6oTqTfMi ylPLpSLmiZjUXjowqPh5mWvfwlyjV0JnqZg59yJDeVTnqtaHmkXz8uXa3OMd73mgWbcQ oKKX+5OAKj5EW85MEHd/2J45A89rd1ChmmovdSzyNkjSQKy7aIJSvP5RQHACc7+iwgpP DUEf3rcgT84XVjn8CwO6QUn4VG+yPKYbBTHNtwqQXPN77JaRSi4hXJaxbNpz3EV84RrJ w4WA== X-Gm-Message-State: AOJu0Yx2AwI4k9waPMP9Dug+hUBFiIdpeQM9l/lWg98Hp/bDj9OotKNs WorClbrtFGfyYKIfu3ThbZyBhooVVCAXLs0YZpagZA== X-Google-Smtp-Source: AGHT+IFlXUKdQWPSBHJn1j/MDvMjTmPT/VMWQg4H/xOs81/J/jwuJ8JJXX1pfb+Adw8urZJBbplJpBr1hxqfZ6IXHLI= X-Received: by 2002:a25:dbca:0:b0:d5c:ce73:6528 with SMTP id g193-20020a25dbca000000b00d5cce736528mr5584392ybf.35.1695319483599; Thu, 21 Sep 2023 11:04:43 -0700 (PDT) MIME-Version: 1.0 References: <20230914152620.2743033-1-surenb@google.com> <20230914152620.2743033-3-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 21 Sep 2023 18:04:30 +0000 Message-ID: Subject: Re: [PATCH 2/3] userfaultfd: UFFDIO_REMAP uABI To: David Hildenbrand Cc: Matthew Wilcox , akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, shuah@kernel.org, aarcange@redhat.com, lokeshgidra@google.com, peterx@redhat.com, hughd@google.com, mhocko@suse.com, axelrasmussen@google.com, rppt@kernel.org, Liam.Howlett@oracle.com, jannh@google.com, zhangpeng362@huawei.com, bgeffon@google.com, kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: C57808002E X-Stat-Signature: hmrq76h4d713qh6iurw3qykgtzm3xhak X-HE-Tag: 1695319484-298048 X-HE-Meta: U2FsdGVkX18IzkPF6pWhALK4x4R3ioFTiz2Bt0OmIHfu/4Sjcf+PUhHzfFUWwMckvD+04ttsD78HPbF3F7ZNG3IXSbhrCMouoHqX5mfzXUUDCIhZHXVATsIUMuoTpIc66JzxbGHCjjrEYMmZHUJ0diKbl6C54ZEAOf9IJ9LCewXhdbBAT2QBGPL5BV5IsUQ1xfh9j4n4hFxai9r/PIxyLQ2neRT3UKQ7AKmRojyl26CeupkIDRjfsnzN6vMnmD8jlfZH6c9yNRnmLzVym5siCwkZgDwxsN1ewPonvjQQljj1hyKNzsHrr3uAj5ZeWiPdd34K14gVrf/n7w4fZX8eFiRCkcc5WvsznCW7rDYg7SFvlhdn0YwhEgT/s0aqd8n/BIldWLjfY/Ciq2m9YZvm7lEaMA1+VVnyLsrzrlqafWB4VFGyTzWX9DpTjF5hZXtvZypHpPFt227NUksSdcVMGzAHiNesOI6KQipuSpS3G04tN4awF3NSqXLmCMZg1rJ3cCNC8r7PvyZdlzFpPm8zvoTHwIo2tSCMK/3/44zO1Cte9JZtNeBxxCh64AEBgJeuPuviDJRGR7zIz0f2ACUw//d8VhQJQSKl1eiCegUhhIbCdv4yxPiLCS/XIXSZL5OuDttqGYaCNPBHkqnlQhawJdxMaU4MSBfPe7EL5R7Qfwk1fB/Us7jRm6iBp4+GY5IS5T65jOFWKCFNHcZ0L6dKug9m2r/0jaZ8GktdrQiUVqksDpQkQ8FsHCXSlA/bPL0pvXUb403jh8KerbrlEXULkg+TUiB3E0C0dvXjC6ojk4mT+/lJ/SvIa6HG5mL606hgCLbnE3WaStRwptml3SiQdDFkRNQXkOWdFbZeCGzq5RA998JT15n+4XnvEJYabwtAVDEQOXhVQvD/dKB0kkQ36AGpIheMua75sx+MjgrQuiQjesIWeUXCOf9ay89d6qpnyaZdzeNMdoGHXx0YX94 f07X0YSq 2wl/nlxMb4CDe+4Z1TRjXGCTQtte5s0/l7XgK9569GdMIygkAlmQBo9pPCrTuRp3gPHaJbFJsSMwhHf4jEKbMjeeYd6fLPH14QwQvlGoMhKNdRMyCfpi7BnGmrtwEBSR9I+2LGJmsIKIR884i9L6iABpGWGJu3fAGH+Pg1bRSnBuKYB7U3YTLXhpcdxj+LqBJxTNcCHL1ETXd6i8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 14, 2023 at 6:45=E2=80=AFPM David Hildenbrand wrote: > > On 14.09.23 20:43, David Hildenbrand wrote: > > On 14.09.23 20:11, Matthew Wilcox wrote: > >> On Thu, Sep 14, 2023 at 08:26:12AM -0700, Suren Baghdasaryan wrote: > >>> +++ b/include/linux/userfaultfd_k.h > >>> @@ -93,6 +93,23 @@ extern int mwriteprotect_range(struct mm_struct *d= st_mm, > >>> extern long uffd_wp_range(struct vm_area_struct *vma, > >>> unsigned long start, unsigned long len, bool en= able_wp); > >>> > >>> +/* remap_pages */ > >>> +extern void double_pt_lock(spinlock_t *ptl1, spinlock_t *ptl2); > >>> +extern void double_pt_unlock(spinlock_t *ptl1, spinlock_t *ptl2); > >>> +extern ssize_t remap_pages(struct mm_struct *dst_mm, > >>> + struct mm_struct *src_mm, > >>> + unsigned long dst_start, > >>> + unsigned long src_start, > >>> + unsigned long len, __u64 flags); > >>> +extern int remap_pages_huge_pmd(struct mm_struct *dst_mm, > >>> + struct mm_struct *src_mm, > >>> + pmd_t *dst_pmd, pmd_t *src_pmd, > >>> + pmd_t dst_pmdval, > >>> + struct vm_area_struct *dst_vma, > >>> + struct vm_area_struct *src_vma, > >>> + unsigned long dst_addr, > >>> + unsigned long src_addr); > >> > >> Drop the 'extern' markers from function declarations. > >> > >>> +int remap_pages_huge_pmd(struct mm_struct *dst_mm, > >>> + struct mm_struct *src_mm, > >>> + pmd_t *dst_pmd, pmd_t *src_pmd, > >>> + pmd_t dst_pmdval, > >>> + struct vm_area_struct *dst_vma, > >>> + struct vm_area_struct *src_vma, > >>> + unsigned long dst_addr, > >>> + unsigned long src_addr) > >>> +{ > >>> + pmd_t _dst_pmd, src_pmdval; > >>> + struct page *src_page; > >>> + struct anon_vma *src_anon_vma, *dst_anon_vma; > >>> + spinlock_t *src_ptl, *dst_ptl; > >>> + pgtable_t pgtable; > >>> + struct mmu_notifier_range range; > >>> + > >>> + src_pmdval =3D *src_pmd; > >>> + src_ptl =3D pmd_lockptr(src_mm, src_pmd); > >>> + > >>> + BUG_ON(!pmd_trans_huge(src_pmdval)); > >>> + BUG_ON(!pmd_none(dst_pmdval)); > >>> + BUG_ON(!spin_is_locked(src_ptl)); > >>> + mmap_assert_locked(src_mm); > >>> + mmap_assert_locked(dst_mm); > >>> + BUG_ON(src_addr & ~HPAGE_PMD_MASK); > >>> + BUG_ON(dst_addr & ~HPAGE_PMD_MASK); > >>> + > >>> + src_page =3D pmd_page(src_pmdval); > >>> + BUG_ON(!PageHead(src_page)); > >>> + BUG_ON(!PageAnon(src_page)); > >> > >> Better to add a src_folio =3D page_folio(src_page); > >> and then folio_test_anon() here. > >> > >>> + if (unlikely(page_mapcount(src_page) !=3D 1)) { > >> > >> Brr, this is going to miss PTE mappings of this folio. I think you > >> actually want folio_mapcount() instead, although it'd be more efficien= t > >> to look at folio->_entire_mapcount =3D=3D 1 and _nr_pages_mapped =3D= =3D 0. > >> Not wure what a good name for that predicate would be. > > > > We have > > > > * It only works on non shared anonymous pages because those can > > * be relocated without generating non linear anon_vmas in the rmap > > * code. > > * > > * It provides a zero copy mechanism to handle userspace page faults. > > * The source vma pages should have mapcount =3D=3D 1, which can be > > * enforced by using madvise(MADV_DONTFORK) on src vma. > > > > Use PageAnonExclusive(). As long as KSM is not involved and you don't > > use fork(), that flag should be good enough for that use case here. > > > ... and similarly don't do any of that swapcount stuff and only check if > the swap pte is anon exclusive. I'm preparing v2 and this is the only part left for me to address but I'm not clear how. David, could you please clarify how I should be checking swap pte to be exclusive without swapcount? > > -- > Cheers, > > David / dhildenb >