From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C583DCD3440 for ; Tue, 3 Sep 2024 21:23:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 32EF18D01DF; Tue, 3 Sep 2024 17:23:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2DE9F8D016E; Tue, 3 Sep 2024 17:23:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A6588D01DF; Tue, 3 Sep 2024 17:23:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id EA26B8D016E for ; Tue, 3 Sep 2024 17:23:47 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 73E0FA0963 for ; Tue, 3 Sep 2024 21:23:47 +0000 (UTC) X-FDA: 82524704094.30.0B4EB03 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 57EBA1C0008 for ; Tue, 3 Sep 2024 21:23:45 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=EQWDzXep; spf=pass (imf18.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725398529; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=H9BUz8/XF0Y9T8CJfW/HZEGzqGxZl9wZoaf3L2iMKE4=; b=Zi5WlOsyOE5vuOr4BGBijMReOkjsQPxN3v9+ApcYUSZ0CrM4tMTJk+Cr/kEk7uSnUpGoxn MuaseixT5yQUV5hLEpzzIyFl7sU0hPedRDXiFH9J+PqyxVN0knUQXUNngw/ZgPzKmLpsjH vpedehHGb6fKhpahqU2D0vaVUNIbHQ4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725398529; a=rsa-sha256; cv=none; b=R69BX4htZsNHOv12VEepsg0uzvCteqFVDrWaIxiGNk3icVusyK748MCOmxN7pedv2E1wJ2 PbAcRQK/63AKVX2skJCXTXepdMkxbNq0qq+GLs9wiYRQ0BJZvJHfPTJhmZFkOkYbAmu6qR l/YhDzT7cyladIfUKB0M8X6+/SNps6Q= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=EQWDzXep; spf=pass (imf18.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1725398624; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=H9BUz8/XF0Y9T8CJfW/HZEGzqGxZl9wZoaf3L2iMKE4=; b=EQWDzXephu9FfoFD8D7UZUW+6jULG/8DnO0f9hTZizO4LZUAIeug6Q96JGvZ+lZHpthAXY Cfig3aQNT0tHSYkXuAqg1UFME5kqfTmBVLYCj64kXjuQ1eeagSSkGFys1yimLfVS0n6vqE WU6bEeCm42atTLrjghzvmI+DNs/Zre8= Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-628-wLp2nl1lN5G8SArvR64L2g-1; Tue, 03 Sep 2024 17:23:43 -0400 X-MC-Unique: wLp2nl1lN5G8SArvR64L2g-1 Received: by mail-il1-f199.google.com with SMTP id e9e14a558f8ab-39f6eac9348so12452145ab.3 for ; Tue, 03 Sep 2024 14:23:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725398622; x=1726003422; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=H9BUz8/XF0Y9T8CJfW/HZEGzqGxZl9wZoaf3L2iMKE4=; b=tdrG+CNTn2aps0cJP/BdWVGamd3a58IQRrCZOT7R3T+Jzqbh2vvrV6cRIybDfqQwe0 dqRQtYQJxHKXwlOTDI+a0dhIXmlStYRW8zPveK4KkvnN42ODhv6xeVF9aA2NDD0lg6a8 NbfwzNLibQuCW0rmG4UZ/G+cabSxs8furI1f+WEbALl97didCffwDubSf2aMscw7b+iS 4BctjXhmQ79vWN8877+Y+W4XhkZE0WIoNo9k85KgdFBBEtZhwDW4HH26zbnPNYFxdjCZ kQn74nUwOGSZFBLvYnvgPJxB5D1uxV8lLXsx03OdSGsIddkxepY/RCvmJNivgRmS7QfE v6lg== X-Forwarded-Encrypted: i=1; AJvYcCX5UOfMIr8fkK4WnkroCdkeOSjDNSxicDpO5UKxIbhfBYbmbtxavikuxAOYMOynGjmH20gtEvchfw==@kvack.org X-Gm-Message-State: AOJu0Yx5Ggy8jGZDGX4n5rOL9rQNRCADF7LCgGe72VQ9ZgLmqFbTgt+5 r9a4smSQPYw3wUjG+LAqwt5nbLWNpFIsV9uC459HhlEtT1gELxyW1lMoZMjbLCxnOySf4aCGJuO rCO4GGBCwAYta91WLT1VN31cfc2sz55nnKhCKPHpaUoWih3pu X-Received: by 2002:a05:6e02:1a27:b0:39f:58f9:8d7c with SMTP id e9e14a558f8ab-39f58f999d6mr98609255ab.26.1725398622632; Tue, 03 Sep 2024 14:23:42 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGxvBmrKrUXlRMceLmp36Br27W1bNa2xkEzECgslwDjjql0QNiHatEZ4EKpj7lp4mRqT0s0iw== X-Received: by 2002:a05:6e02:1a27:b0:39f:58f9:8d7c with SMTP id e9e14a558f8ab-39f58f999d6mr98608875ab.26.1725398622195; Tue, 03 Sep 2024 14:23:42 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-39f3af969dfsm32923855ab.14.2024.09.03.14.23.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Sep 2024 14:23:41 -0700 (PDT) Date: Tue, 3 Sep 2024 17:23:38 -0400 From: Peter Xu To: Yan Zhao Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Gavin Shan , Catalin Marinas , x86@kernel.org, Ingo Molnar , Andrew Morton , Paolo Bonzini , Dave Hansen , Thomas Gleixner , Alistair Popple , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Sean Christopherson , Oscar Salvador , Jason Gunthorpe , Borislav Petkov , Zi Yan , Axel Rasmussen , David Hildenbrand , Will Deacon , Kefeng Wang , Alex Williamson Subject: Re: [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Message-ID: References: <20240826204353.2228736-1-peterx@redhat.com> <20240826204353.2228736-8-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: n1jdtcekikznmbaxz654q9brjyyfd3ow X-Rspamd-Queue-Id: 57EBA1C0008 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1725398625-900468 X-HE-Meta: U2FsdGVkX1/81Gytmevki/1hQqfEay3WAJo1iuz6eyyw9upS6oBWShgnUcPbDd4oJMLd+rAeF9P/DgVkerh6B83adT/egHLO0nxV1sSvSS0vHllpCxD089FoXSBN11vnyfuAvQhjtM6QmgYG2HMKYK1fhuC3PhBmjK3GgVS+Thx4pvgYHKgoR8v8VPp8dqn72arz/23eja/yn7CGK7WnnvASaqeR9pxxTJ+OGZ+2cqRe9qfz45V6daq+xxPQQMQUQn8y04hYqApBHhthJ2uvsx2an342qq0301HsUM9Al+7hu92pv2fdI3jrx9qRoXz5ateX131gP3hqWwUZF/4ACoF3ymgv7OPIrEUqX1B48dS0xN3Sw9ZFcJ5He/wKu/+tNIzPAT3rbYRFFWKFEYvr+wkoziwtuSFzIWHfl1cCY+85F5WPwEILpZbO+3eq6/+melyB3EIyLaNk5qhnXwh4MVPPjcFHRJ2G7/CjrerIiq4m0xgZNOdJYvGrRp4RRS3OOATIKWRSeNVsM2IO9DiGFKWqniFlU6VuyOJbf+AZ5AMfQrNctId/6x3Gu0AoEfVAMagM9ruguxiyhSpLhLx3UrEXpu/vDP9w3eueRb8Ep5UdMtwMiCI3ZSZ0xeFnx+5PndJp12LOrlAaWtPbVBoJtpo/iHW4WLnir/eMjY1KIBcnBkDakdT9HyMl17duyp7yqyhziZQzZwuKSdFFFPJnJssTA9edDZqM/7bRRCf+7OuYz5U7+HST+ykT+7fGQ/eU/aRpmS65GU6EuTYu2UZJi0zFp6wOY7f3ULtzCUEFvJxklZ+6RQPvIqz6+m4QqZFtqkZ2YkhSONFf4pfcHThUD0THvwqKySJ4TKw22LAWg0c4dfqPLB7TEoEySyiOUSIH5RjGU4olb/jHH+EZ9v3XcZoOuUGoxhJhpawdQhbTpMTSvVcZjO/qmDoH97lDO8FD51OUdbkwRDhwC3MDimc DYBLLh7A YHWLw5ngfx9NgHrsWKzzYsVOxw/3LYp1ujVm1A38yrH4rs6Uu+MZ5WR51zLTSDPUebKNk2TdoJL80q68RQp8kmXV7EUq619Y27Si3WYDwd9HEg0LK3+f0jdD6fNPpcMhiulfEe9t15SaiAXIcxkqDjCM7lgwZKQrWD32MvGnn3NBTk6i89Wrw05PWymvnficZ6UGohOvw/CKdbJSsjGrIX7n9sxtie9MTrF5U+6YD9q8ElvFf+2eLHE3LsI/JCCChwHY5qtjNlYacjSr+7/ANJ6TFL8XyaD+YfzZ2kkzpaJmfrmiktss3YJ+MHkiib8VUiJwXiv87PQx5irQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Sep 02, 2024 at 03:58:38PM +0800, Yan Zhao wrote: > On Mon, Aug 26, 2024 at 04:43:41PM -0400, Peter Xu wrote: > > Teach the fork code to properly copy pfnmaps for pmd/pud levels. Pud is > > much easier, the write bit needs to be persisted though for writable and > > shared pud mappings like PFNMAP ones, otherwise a follow up write in either > > parent or child process will trigger a write fault. > > > > Do the same for pmd level. > > > > Signed-off-by: Peter Xu > > --- > > mm/huge_memory.c | 29 ++++++++++++++++++++++++++--- > > 1 file changed, 26 insertions(+), 3 deletions(-) > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index e2c314f631f3..15418ffdd377 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -1559,6 +1559,24 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > pgtable_t pgtable = NULL; > > int ret = -ENOMEM; > > > > + pmd = pmdp_get_lockless(src_pmd); > > + if (unlikely(pmd_special(pmd))) { > > + dst_ptl = pmd_lock(dst_mm, dst_pmd); > > + src_ptl = pmd_lockptr(src_mm, src_pmd); > > + spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); > > + /* > > + * No need to recheck the pmd, it can't change with write > > + * mmap lock held here. > > + * > > + * Meanwhile, making sure it's not a CoW VMA with writable > > + * mapping, otherwise it means either the anon page wrongly > > + * applied special bit, or we made the PRIVATE mapping be > > + * able to wrongly write to the backend MMIO. > > + */ > > + VM_WARN_ON_ONCE(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd)); > > + goto set_pmd; > > + } > > + > > /* Skip if can be re-fill on fault */ > > if (!vma_is_anonymous(dst_vma)) > > return 0; > > @@ -1640,7 +1658,9 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > pmdp_set_wrprotect(src_mm, addr, src_pmd); > > if (!userfaultfd_wp(dst_vma)) > > pmd = pmd_clear_uffd_wp(pmd); > > - pmd = pmd_mkold(pmd_wrprotect(pmd)); > > + pmd = pmd_wrprotect(pmd); > > +set_pmd: > > + pmd = pmd_mkold(pmd); > > set_pmd_at(dst_mm, addr, dst_pmd, pmd); > > > > ret = 0; > > @@ -1686,8 +1706,11 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > * TODO: once we support anonymous pages, use > > * folio_try_dup_anon_rmap_*() and split if duplicating fails. > > */ > > - pudp_set_wrprotect(src_mm, addr, src_pud); > > - pud = pud_mkold(pud_wrprotect(pud)); > > + if (is_cow_mapping(vma->vm_flags) && pud_write(pud)) { > > + pudp_set_wrprotect(src_mm, addr, src_pud); > > + pud = pud_wrprotect(pud); > > + } > Do we need the logic to clear dirty bit in the child as that in > __copy_present_ptes()? (and also for the pmd's case). > > e.g. > if (vma->vm_flags & VM_SHARED) > pud = pud_mkclean(pud); Yeah, good question. I remember I thought about that when initially working on these lines, but I forgot the details, or maybe I simply tried to stick with the current code base, as the dirty bit used to be kept even in the child here. I'd expect there's only performance differences, but still sounds like I'd better leave that to whoever knows the best on the implications, then draft it as a separate patch but only when needed. Thanks, -- Peter Xu