From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78D37ECE564 for ; Tue, 10 Sep 2024 12:16:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE3B98D005E; Tue, 10 Sep 2024 08:16:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E92AE8D0056; Tue, 10 Sep 2024 08:16:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D81538D005E; Tue, 10 Sep 2024 08:16:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BAD3A8D0056 for ; Tue, 10 Sep 2024 08:16:20 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2085AA6C85 for ; Tue, 10 Sep 2024 12:16:20 +0000 (UTC) X-FDA: 82548726120.04.D07CC7E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf17.hostedemail.com (Postfix) with ESMTP id EDE9E4000A for ; Tue, 10 Sep 2024 12:16:17 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fwmy469Z; spf=pass (imf17.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725970476; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TqNTsGxo7KShGTs/iiAEuAgoPybBC8iCW7tKXvfqSTE=; b=ZnG1ECGgXXkeSP6msJximcLc4Fh+f+W140DXGJ2k80JouzYaht95NRjejokWVV2jJZ9FUy kKMMiZp96led4+TCMm9yILnaoPVU4ArqCPBw5ZXneED4DvhXanTa4zqxq/3dVECX4YnUC9 qgSQPUHqNQKC7bH9pcidaX6nBqSW6Iw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725970476; a=rsa-sha256; cv=none; b=30SCV148eBOElgmayjNy4dPvVQZEqEe9ROJnpMlmGiIJ/c96tISNMr7n8M8vVDa/tog0UY MAfWeoxWoopLfrCdmkJ9+hWIdLXar4Ssc1j0rpWeVxaHQWlxva8l+iAer8LYsdHID7Lr7l WD9hddFFvo2KkprX7+lfRjKpbOB7HLs= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fwmy469Z; spf=pass (imf17.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1725970577; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TqNTsGxo7KShGTs/iiAEuAgoPybBC8iCW7tKXvfqSTE=; b=fwmy469Znubw9uqU/lVnt0prsnNShV5HAWs78pWERWYtfEEVxuafgoAgLZ1Y4rusQD5901 fYBJEya6EmzqjsZtBP5CZCZXJ3EWrp6ERrNbNYvMqavXYYr/+CfehbC88NNUN+Hkidpent m7XB5ldAw4ekdpC/N1lyK2+2WXarPNo= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-262-t-y59tTdNh-UaY9jTko6Sg-1; Tue, 10 Sep 2024 08:16:16 -0400 X-MC-Unique: t-y59tTdNh-UaY9jTko6Sg-1 Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-4582a894843so13536321cf.2 for ; Tue, 10 Sep 2024 05:16:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725970575; x=1726575375; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=TqNTsGxo7KShGTs/iiAEuAgoPybBC8iCW7tKXvfqSTE=; b=ost10eHW4nF3VUGD7kOCog0PNJwscpFkU35b4Vv3BzTJZiNSIcYdxQgvrbdgg7A9mz OH+F96YSzE2XqqYYPFKJa+CQdNSHVe4JUC3qzXxDJ4ucJqPUnqkJPFqbVKYTSINpQZgE 54j6YboIirEZWaqDmmoS/y1fBpNaeDsDSwjhhDo6yZ3SGG9cJjK/tCQFW1ijTIOcUBan qyt7Z/C5Wv8X7BghrmN2M3slVATlX7zyVAdw/sY+/RIgMugaDBUWWMVS672xWXpHeOuI zodsqweQoPZNeA7SguCOtb5w4jzyY1k2Q7dIXgO8tzraJZ+MmzfYVZtMC+oMr0vHjEua D8Ww== X-Forwarded-Encrypted: i=1; AJvYcCW3FMSKJjMgKAVTqKIyhdp7wxcJu+wXp6GHBLOvPn+m9EiPF5yi2g8lAhKlvEImuY8SlTLjPCDGXA==@kvack.org X-Gm-Message-State: AOJu0YyNNlcr+wTr4977MpIvlSxgitB4/HXKn/Elto1W5Xa3lDGIkjOn RgnkbuA9BJJKHoi0k8PgCc49c/z8eexYFDC2ybrZlL9otIne/41gNc2diOGm5UBT1emAJZr5wLw TNz3vVcQc5zOW0a+2tb8ZEcPXqztBIHxRDDIhgmNOTYMklxkN X-Received: by 2002:ac8:5f47:0:b0:458:1578:56a6 with SMTP id d75a77b69052e-4581f480530mr147993551cf.24.1725970575409; Tue, 10 Sep 2024 05:16:15 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFFv3wgyd3Hg6lqpOUUjpJkTGHHsYgJs2xr7aDITSvRwlUJdHVQc4gd4/fSmPETPVJLM55vBA== X-Received: by 2002:ac8:5f47:0:b0:458:1578:56a6 with SMTP id d75a77b69052e-4581f480530mr147993141cf.24.1725970574851; Tue, 10 Sep 2024 05:16:14 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-45822eb001bsm29057461cf.54.2024.09.10.05.16.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Sep 2024 05:16:14 -0700 (PDT) Date: Tue, 10 Sep 2024 08:16:10 -0400 From: Peter Xu To: Yan Zhao Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Gavin Shan , Catalin Marinas , x86@kernel.org, Ingo Molnar , Paolo Bonzini , Dave Hansen , Thomas Gleixner , Alistair Popple , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Sean Christopherson , Oscar Salvador , Jason Gunthorpe , Borislav Petkov , Zi Yan , Axel Rasmussen , David Hildenbrand , Will Deacon , Kefeng Wang , Alex Williamson Subject: Re: [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Message-ID: References: <20240826204353.2228736-1-peterx@redhat.com> <20240826204353.2228736-8-peterx@redhat.com> <20240909152546.4ef47308e560ce120156bc35@linux-foundation.org> <20240909161539.aa685e3eb44cdc786b8c05d2@linux-foundation.org> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: EDE9E4000A X-Stat-Signature: 5xodahuiwnta3pyjh8nc6krj1h8a17om X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1725970577-722499 X-HE-Meta: U2FsdGVkX1/IIBB96gncrP4b7SnTnPRbpvvxiM60hwsiP4RLO85OlqvN60LIhUfFDPqKsV21Xow6MzDm6VnKHIMjkgpMdu0891u78FPmqdRQwTO27OzuWk3MPmNf0K/h+nlI/U4sjFcCD2fYC1oE0VBKSohGjvSab2xhLnSe1DdehQmixZcgdq/OdNJbOk3oODvmLUb/e6ldPWS8/kffGVHNSmGaHTeMnO5gGFAu+zr8gOJOtjr62TX+I0Dz1kF7hc9Tvp77Z3dImxggNQcTjEitcyXg5Zn/u+m4lGLCya0yBqWfEqphdzT9k0bfaflQnvqpOvQSo/HSO/rlPDufemWK9S8ZMlWsljB1jKhXt9YNTFF5/QY7F/TD3Sg3qnN0tlvWA9mU+lEc+IsBspRBXRLNyjiiWRd5Plzn5Mmd0j58jwHGXp8x+G98ntYceMFFIWDz+gV6eym/vWgyKYjM4dyMteezG51N7ef1qo4+6m/Hn4TE8fs9agP1thWw/pbQaYLLE3Tqhiesz3lUP8dEEccAPnhqRe0r+wuUookyJUYprEqbGSna2hBx+zy4t8/iBf0xMA5dBRzl1uKnzanHczLqjhbuk1Mklqz+pkeGoMVFuztDu2SHuP5MUMXTqsOUZG+8RfipKx+ptxKIsJgb/JApVZUoQSP3sw3c6AEaGkTXPOqgQixsE4IYZxg5/dIJbPGIxpqoZBIVq2Ramg0XSW0Zwqrk7bz7quGCmx8JqNXeWe5f8S7JCnwX6NypDtuAoizyK3WYNC19G3jbXVjAcvZenhv8PmSdo23CpWF+eqQtbJeenkzCruWpLD2kUZ+v8RGmBOsElBMPUn8BG4QeRh+kya7YEODf0UF8X8mL5BeV6Z1uDy66h7uPJsXQrWs4mnVUers2iKAsc0grZcU3Vh2/Or1hQZWTwvxVQwytpiJamjeEw40y2gXtoCN+CtZGzPrVDff+opw2kre6Lgy PlQOwhGj Ww+sCyKdLOa01Stza5xqMGu9u+91qQGg45WtNJ2aDamCNwjuTxx/ESvRb8WNDG9DVnx1Rkm0kgxcmC+FPec4Z9JShQw9TxdphEKU12Jt5yDwUMfYJG2SH0kIxb5ifr9oOwpcxkMP/UfX/Zzc8rZLV6pKEPLMu8gD+dCPDyMl1Fx0lNwZetxFMQX4qiDkTxxDAdHHDKHGEPoKZjnRS55omMI5fcQ0hGxInU+7+lBl5h4VJcrn1yeGv5QkZuEpy+TantfYl8oWUP9OHSGt+LknamjftBIPcwO/shz3jXK027UsjAgJD0v4FoyJQmgFrgeZ89CQYzWw0T7IUDlOoTnAsZl+4GHV3Rl9PCJrGP+4ZeMUjxf6Sn+JCBtgQUgHn5NQ67i6D54uT1DdkrNUSIp0d6CZVRX4zbYZUvoN2rGz51Zj05o5trimvFuMEyQc4KV/zFukXFPVxnFFjZ214hbHU1wrIOA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Sep 10, 2024 at 10:52:01AM +0800, Yan Zhao wrote: > Hi Peter, Hi, Yan, > > Not sure if I missed anything. > > It looks that before this patch, pmd/pud are alawys write protected without > checking "is_cow_mapping(vma->vm_flags) && pud_write(pud)". pud_wrprotect() > clears dirty bit by moving the dirty value to the software bit. > > And I have a question that why previously pmd/pud are always write protected. IIUC this is a separate question - the move of dirty bit in pud_wrprotect() is to avoid wrongly creating shadow stack mappings. In our discussion I think that's an extra complexity and can be put aside; the dirty bit will get recovered in pud_clear_saveddirty() later, so it's not the same as pud_mkclean(). AFAIU pmd/pud paths don't consider is_cow_mapping() because normally we will not duplicate pgtables in fork() for most of shared file mappings (!CoW). Please refer to vma_needs_copy(), and the comment before returning false at last. I think it's not strictly is_cow_mapping(), as we're checking anon_vma there, however it's mostly it, just to also cover MAP_PRIVATE on file mappings too when there's no CoW happened (as if CoW happened then anon_vma will appear already). There're some outliers, e.g. userfault protected, or pfnmaps/mixedmaps. Userfault & mixedmap are not involved in this series at all, so let's discuss pfnmaps. It means, fork() can still copy pgtable for pfnmap vmas, and it's relevant to this series, because before this series pfnmap only exists in pte level, hence IMO the is_cow_mapping() must exist for pte level as you described, because it needs to properly take care of those. Note that in the pte processing it also checks pte_write() to make sure it's a COWed page, not a RO page cache / pfnmap / ..., for example. Meanwhile, since pfnmap won't appear in pmd/pud, I think it's fair that pmd/pud assumes when seeing a huge mapping it must be MAP_PRIVATE otherwise the whole copy_page_range() could be already skipped. IOW I think they only need to process COWed pages here, and those pages require write bit removed in both parent and child when fork(). After this series, pfnmaps can appear in the form of pmd/pud, then the previous assumption will stop holding true, as we'll still copy pfnmaps during fork() always. My guessing of the reason is because most of the drivers map pfnmap vmas only during mmap(), it means there can normally have no fault() handler at all for those pfns. In this case, we'll need to also identify whether the page is COWed, using the newly added "is_cow_mapping() && pxx_write()" in this series (added to pud path, while for pmd path I used a WARN_ON_ONCE instead). If we don't do that, it means e.g. for a VM_SHARED pfnmap vma, after fork() we'll wrongly observe write protected entries. Here the change will make sure VM_SHARED can properly persist the write bits on pmds/puds. Hope that explains. Thanks, -- Peter Xu