From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CCF4ECE58F for ; Tue, 10 Sep 2024 00:08:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D3B8C8D0008; Mon, 9 Sep 2024 20:08:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CEC2E8D0002; Mon, 9 Sep 2024 20:08:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B8BFC8D0008; Mon, 9 Sep 2024 20:08:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9BA8F8D0002 for ; Mon, 9 Sep 2024 20:08:25 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 1CEAD4063E for ; Tue, 10 Sep 2024 00:08:25 +0000 (UTC) X-FDA: 82546891770.10.44BCEF8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 0644318001E for ; Tue, 10 Sep 2024 00:08:22 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=L3uxbiYI; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725926851; a=rsa-sha256; cv=none; b=eDq0MH3wKwhzHQa2EynIN8LfZRihJ7np/elycchleKWbJifWdpljkQHbrtbpXjbX2oR9Kz RVmU/Q3RSCqBddGqN2plzKOUp3WOCzvUieQw9q8XWT5sv1SDm5Kh9uESBN2RIrZjEfHnLf TEcJHuI816kz9aTjHOqgGpIFv3T8EXQ= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=L3uxbiYI; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725926851; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nS90PBsQLo/NAADX0RLPD+B7rTpNZ5FJaTlKjF8OXUU=; b=PLa+2Op7TRiqgjkDkb2PYgD9Sd2eha0wT/FfOOlPoMh7isl/CDijLiDjOzMe+1XX1lIL4Q Zhjh4Jy6krU/+rce4JCcHeBb+zdbshlnrEa7TEN59rIAX2H/rGJ0shBI2r1o4uZzAjtbU5 IjQe5+/mz0/kBvbbnJkEKZTLLAEk5aA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1725926902; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nS90PBsQLo/NAADX0RLPD+B7rTpNZ5FJaTlKjF8OXUU=; b=L3uxbiYIsAJaCg4EWsOmjRioxJpvgccM77IkHsUe5xZbF/kCfog8k9VO/EXzZDa6y2ENKH 1sWg3wGb604hs7NCRptA8yCNjDIAI4vbx8rYd5xf/Sz4sAEv+k3UBlgTMios5ZAjsf7tHr dxeNxDd2UGIDep0cZsD1G2ZwI1sxcpo= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-450-glLfNX_xMrOClFKXWiuX1w-1; Mon, 09 Sep 2024 20:08:21 -0400 X-MC-Unique: glLfNX_xMrOClFKXWiuX1w-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-6c353d32ea0so79005936d6.1 for ; Mon, 09 Sep 2024 17:08:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725926901; x=1726531701; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=nS90PBsQLo/NAADX0RLPD+B7rTpNZ5FJaTlKjF8OXUU=; b=fKLg4WAxjPiIh+EvWmmX74dfCW4IZK+vWE7EJXtvp/9kycMtbKfZ0OQLYUb1GRO9x/ oiPbwRsxm06Pj5LK7B43l6AoTsAZCRaIgWUVqPnvO223MnUTS7lzdg+mcsThoGKHFuJS gx9P++Tc6SUQ5Kxchzzj57Qaw2hwtMzulFoWD+79j+gNMq1oHRSa6V8C1yE0jvYg5YBW 85KDi23z/GTGJezFTV8mXYjQ42GmadqtZixjnVC5WD/ZDavJzPrT/DTLoyWE5bDBvtwE 9W6mDsDsUMtrC0LX2RFchDisH/3ji5fSDkj9ZZ9Av41Q7XmmWU88KvdQNSzOnZAyjZ7M y9ag== X-Forwarded-Encrypted: i=1; AJvYcCWdnqcIprAstHao1qN+A+51zUrKfOSXjDB+xGMA8lsK7dzdb1QZVQJcc9+/iQJmZG8idHdqNEa/Rw==@kvack.org X-Gm-Message-State: AOJu0YwQlkLUCXV7H5bxs+bX5qngoCBjcjINd3k6muCVpv/ky2LAhVmn ZSVkqaYaqXIQ1V1VSAZVH+xYkoChObtJxyX9eYOVzTQOqphXW11A5QS2vCDbRugf9rLVvJHnLJ1 NxqCBC70fsnLoNlvmmOzwmIKKCDUS8ndG4B2yRZGPIi2+5dW5 X-Received: by 2002:a05:6214:2dc2:b0:6c5:53b8:c8b1 with SMTP id 6a1803df08f44-6c553b8c8e3mr29372066d6.13.1725926900637; Mon, 09 Sep 2024 17:08:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE4mcc+7qwVSx7ydxnAtNmRQVI9fbDsEeVC4thly4922XNAYI8zD2xHJ+6aII9StNVEuLJ8dw== X-Received: by 2002:a05:6214:2dc2:b0:6c5:53b8:c8b1 with SMTP id 6a1803df08f44-6c553b8c8e3mr29371626d6.13.1725926900246; Mon, 09 Sep 2024 17:08:20 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6c53432968esm25143526d6.23.2024.09.09.17.08.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 17:08:19 -0700 (PDT) Date: Mon, 9 Sep 2024 20:08:16 -0400 From: Peter Xu To: Andrew Morton Cc: Yan Zhao , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Gavin Shan , Catalin Marinas , x86@kernel.org, Ingo Molnar , Paolo Bonzini , Dave Hansen , Thomas Gleixner , Alistair Popple , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Sean Christopherson , Oscar Salvador , Jason Gunthorpe , Borislav Petkov , Zi Yan , Axel Rasmussen , David Hildenbrand , Will Deacon , Kefeng Wang , Alex Williamson Subject: Re: [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Message-ID: References: <20240826204353.2228736-1-peterx@redhat.com> <20240826204353.2228736-8-peterx@redhat.com> <20240909152546.4ef47308e560ce120156bc35@linux-foundation.org> <20240909161539.aa685e3eb44cdc786b8c05d2@linux-foundation.org> MIME-Version: 1.0 In-Reply-To: <20240909161539.aa685e3eb44cdc786b8c05d2@linux-foundation.org> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: o96ikpjd8rq5sruh8m56uy1paiir5rup X-Rspamd-Queue-Id: 0644318001E X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1725926902-260068 X-HE-Meta: U2FsdGVkX1/c6dEWzyy8lsZNJntbscf99an9iZK+8jmwrgi3lHmIqLka0JY4CWaiTTDvOx4ziwMJlMoHtrXLvTFT8RbF+asCV6OCxiVUJN6KSPB8fUGH+1kOCK8q0yhkub6xf5GUKJvML97sWLK7BKrLTWD4MnYjRT2MuSQRWeBfiOw2kaNE34HgI1GeKXtYY9ctQVxvcbUKu871B43qeGaOIfUOSIpn3PBR/eIxdhtRcUYBFvBkAxOhIRuO8f/huGcAJrHgKpMNn/RSs2kuhPeHIqH/RLMsahOIEOEQV3OC0Qh55B0TruWNVBm/9V8wwROoHu09i79g9LXh717sovUfxdlTvUzJOhW28WPyjL4Uu9VK9IfAcpUw369eNYH9WEbvT4z+sHo8BWOIglMYqNwRrBEgGNTQYcyt2fUYYhoYI8X6wWBD07DGDIhNGIhncO4I2xO2gasKHJOVuBuik+qtlDKwhyOsEUQbF/cLZmKdMQubxKcBNPW9u9ZRmAtThTKiA2AAwNCdfYbTzTKHXiayZweZ6pVLwIu0wdwxe74BsuWWKRPIxFs5zrlsS+pYHmpX9TyYrs+lf6eJegkYUCHS7ptijcRy9C4Ps3u4tCQXax/yRkPj5hcZ0uVm3iXTsFSoR1UmDzpQeqgZB8NigIJasHDxx324e5vjvwx9bA9KAQqLa8twBGlNl8hYHrwGnD9yvZxOssgpbhCVnTVOs2x2OnHT+y6rfz5WzPg3/4HwfYygMQrb0ERzWfZhFmEReRb+Q7+lAjrltehKggoNyOvlA+31YH28H883PKrXiLUsTUWwczF40DnFIUCu9dsCcFj/gIjNTEnSkeE2/cuPhd/ggmK5lRdDusJ9yWeJ8XA1F8+oexuXZ/4wV7DrO6akU4YrXabFQtu4BvCEu9YTyzEsQoSjIb8HPq7h26v03cBws6dylH/BqVmuUlYENYF6XOtduTp9BIYAacBfl/n WuR8Daco Cu1jUku5OLUdR0EtsPnK6hFAoku6Upq9gPdsZneeWTHYNY4NJZKX5ManPIusmDslbFCJ9DWJigZemjNkC92hzXvltQChJZcFjhdWaLrDT1dMgoRJC69Zx6yNPhnw29erYfAlwy0dUEENYxZsemdIS1JrNlpk7znuvryfDJC/z3gzmOmps337aPeNOMJdorckCiO9HSkXX7rr6X/Khv73/g4oQ8xL7ouIDiXhIcfnHlQgo7hlBGp2nqIsAooUwb3Bhres3TnmSqEWVijwKiePup2tKAtWwCPaIPj6f0X9vH5PeBFPXeLm+z3OT50t0x6V4mKViorNrql+OUrenWZhgwmnnKGBVuCrxijSGd4e3s1MrWwy1UbOnJkmQTNAMRwysBKf+Th3RnGiKFIBYXkElwj2mil408w7I7vWk+oC23rYwaXf5pMtB9SbY5q3hn7+SKU+ywU8mWasdtvIAU7Zc5MpxUj6wfU5JgqLVI4X+L6RSXvc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Sep 09, 2024 at 04:15:39PM -0700, Andrew Morton wrote: > On Mon, 9 Sep 2024 18:43:22 -0400 Peter Xu wrote: > > > > > > Do we need the logic to clear dirty bit in the child as that in > > > > > __copy_present_ptes()? (and also for the pmd's case). > > > > > > > > > > e.g. > > > > > if (vma->vm_flags & VM_SHARED) > > > > > pud = pud_mkclean(pud); > > > > > > > > Yeah, good question. I remember I thought about that when initially > > > > working on these lines, but I forgot the details, or maybe I simply tried > > > > to stick with the current code base, as the dirty bit used to be kept even > > > > in the child here. > > > > > > > > I'd expect there's only performance differences, but still sounds like I'd > > > > better leave that to whoever knows the best on the implications, then draft > > > > it as a separate patch but only when needed. > > > > > > Sorry, but this vaguensss simply leaves me with nowhere to go. > > > > > > I'll drop the series - let's revisit after -rc1 please. > > > > Andrew, would you please explain why it needs to be dropped? > > > > I meant in the reply that I think we should leave that as is, and I think > > so far nobody in real life should care much on this bit, so I think it's > > fine to leave the dirty bit as-is. > > > > I still think whoever has a better use of the dirty bit and would like to > > change the behavior should find the use case and work on top, but only if > > necessary. > > Well. "I'd expect there's only performance differences" means to me > "there might be correctness issues, I don't know". Is it or is it not > merely a performance thing? There should have no correctness issue pending. It can only be about performance, and AFAIU what this patch does is exactly the way where it shouldn't ever change performance either, as it didn't change how dirty bit was processed (just like before this patch), not to mention correctness (in regards to dirty bits). I can provide some more details. Here the question we're discussing is "whether we should clear the dirty bit in the child for a pgtable entry when it's VM_SHARED". Yan observed that we don't do the same thing for pte/pmd/pud, which is true. Before this patch: - For pte: we clear dirty bit if VM_SHARED in child when copy - For pmd/pud: we never clear dirty bit in the child when copy The behavior of clearing dirty bit for VM_SHARED in child for pte level originates to the 1st commit that git history starts. So we always do so for 19 years. That makes sense to me, because clearing dirty bit in pte normally requires a SetDirty on the folio, e.g. in unmap path: if (pte_dirty(pteval)) folio_mark_dirty(folio); Hence cleared dirty bit in the child should avoid some extra overheads when the pte maps a file cache, so clean pte can at least help us to avoid calls into e.g. mapping's dirty_folio() functions (in which it should normally check folio_test_set_dirty() again anyway, and parent pte still have the dirty bit set so we won't miss setting folio dirty): folio_mark_dirty(): if (folio_test_reclaim(folio)) folio_clear_reclaim(folio); return mapping->a_ops->dirty_folio(mapping, folio); However there's the other side of thing where when the dirty bit is missing I _think_ it also means when the child writes to the cleaned pte, it'll require (e.g. on hardware accelerated archs) MMU setting dirty bit which is slower than if we don't clear the dirty bit... and on software emulated dirty bits it could even require a page fault, IIUC. In short, personally I don't know what's the best to do, on keep / remove the dirty bit even if it's safe either way: there are pros and cons on different decisions. That's why I said I'm not sure which is the best way. I had a feeling that most of the people didn't even notice this, and we kept running this code for the past 19 years just all fine.. OTOH, we don't do the same for pmds/puds (in which case we persist dirty bits always in child), and I didn't check whether it's intended, or why. It'll have similar reasoning as above discussion on pte, or even more I overlooked. So again, the safest approach here is in terms of dirty bit we keep what we do as before. And that's what this patch does as of now. IOW, if I'll need a repost, I'll repost exactly the same thing (with the fixup I sent later, which is already in mm-unstable). Thanks, -- Peter Xu