From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EE87C10F1A for ; Thu, 9 May 2024 08:21:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1810F6B008A; Thu, 9 May 2024 04:21:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1318E6B008C; Thu, 9 May 2024 04:21:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3B816B0092; Thu, 9 May 2024 04:21:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D36556B008A for ; Thu, 9 May 2024 04:21:55 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 47C78161436 for ; Thu, 9 May 2024 08:21:55 +0000 (UTC) X-FDA: 82098164190.28.B76548B Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf05.hostedemail.com (Postfix) with ESMTP id 714CE100011 for ; Thu, 9 May 2024 08:21:53 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HQfiTk+g; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715242913; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3uvc5L0N0IrJvhHtxGTt/rLIh4P1YRMBVY8Tc45LZiE=; b=2WjF1ctODATXHdlkPXR1mP1xXZDE1jQ0znNNKtk0mVqLmx3IHQqRqQt6lXee7ZiR4Qx1yK +MeYki+b7kkje5/mO32kCHkW9+EOrq4xWRMkV6sOVNuXQwtb3DEm4I0VRvDGWrfOBZSMvY USIQhYA+MRP6c59ddVgflEEY008TXRI= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HQfiTk+g; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715242913; a=rsa-sha256; cv=none; b=O54a/DZ2CTnsit1JeryIhv9ya2EZI5EVWDAjFFt2uFhhfSlFqMQQ/WGKuvixX0pBRQ9zc0 HMGiG06C0n2Xs2RIpfxPFyaXMdsH1AYf67lZ5ma7fp8JlI1PxN+I9JvUahrn2/OMt7L188 HgW2kRyJF35zAhIwMaOS0MhaTRxg1Bg= Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-572c65cea55so1099228a12.0 for ; Thu, 09 May 2024 01:21:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715242912; x=1715847712; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3uvc5L0N0IrJvhHtxGTt/rLIh4P1YRMBVY8Tc45LZiE=; b=HQfiTk+gRyskpa/R0QWNNOM3sLrr63udx1nfDfc12T3UUoPgM5oMCKEQw6IPh0L/9c tlVH9++7g7iSmaPeobdRV7tndwbT6xmSQJgwuilCqhRfPICtdOZxZtak6SdT7F5fu+hT QIT3HL7vczBBXeSAJRF+9h9cd8V7Sw4CUdU3qWz56zqaAEm6fhVCOqJWBpLNubH+b6eg dRjjoNsXquoF9bPx9ELrWtbkGdJiye8G3C6JLgiH+J7T3dhnVPwQMyxUx2/eCfFOubAK EXhiBmd9rPMbWiB8EfHOzYySqkxRLoaI25+kQ6X2MVtkxWf8MQFgtP7Y/N+kMROkUNvA iwGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715242912; x=1715847712; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3uvc5L0N0IrJvhHtxGTt/rLIh4P1YRMBVY8Tc45LZiE=; b=tpO6q18RpCdKqUK9SJBUZfMvQ1X5vJX+nFcIgHIpppsXc6IsGAvK5F2NQYm3OuWnlp Aobse5mII5ZDpzKc0KzbBwG82s4tfkJF/3Fc7yTSa2gNxmsOqAqFAwbL6LHkgAeWJBLW UdJ7y4964qZPFiaQpQbX80jmuYgH6ZBixdi8pf9nrk+gQr5YHiyv+k4n/ET2K4+Qn93K 4VDejXiYXYAfjn4y4RHVJEkWmnlY9WWw1zI2TtevLCdsM6VaFahJQZizKuOhE7MBimHh 3Kr0rNDA+JyLrVcQ/0OzSi8HcfqI+P8F/0d+0yAX93d6buZMqjwVwD4DqI+SWIQ/ykfE bZ1A== X-Forwarded-Encrypted: i=1; AJvYcCWEvBaR3C7+9vmPLuH3hlWCzEcrlrKnI2dGcFo4BVcpafE7iPaSkKD7NhSuWafP0yO8c6CcFwFb/b2WG4bXT5Xcz9Q= X-Gm-Message-State: AOJu0YwJkc56zxohXBXH40ROsrr5rpo+5CpSDYw+G4pJWhuRNfMFuHu7 /qsauvkgxJ327tq4gApOVxjDoWzNYg9+SSjpjxAOdziYXbCqaIhEkvyqeHyLGs2T2jyddldwPK5 9ZYVfsR5HuypXGrJ3o+o07BV+BEY= X-Google-Smtp-Source: AGHT+IHESwZO+wOaWv8ASTC/Ph6luCzkAHQV4yOwfchkcprT6g3v71swVJ1O67hKJtn3VP9US4nDOVonhvmWH8LcfXY= X-Received: by 2002:aa7:c485:0:b0:572:7d77:179d with SMTP id 4fb4d7f45d1cf-573326edde8mr1337387a12.5.1715242911497; Thu, 09 May 2024 01:21:51 -0700 (PDT) MIME-Version: 1.0 References: <20240501042700.83974-1-ioworker0@gmail.com> <20240501042700.83974-3-ioworker0@gmail.com> <0077A412-0AF1-4022-8F49-EE77AE601ECB@nvidia.com> <10BA9EED-A2BB-44C2-B80A-26527CDFFA50@nvidia.com> <1B2017A4-A252-4C1F-9608-D43ECEAD53B1@nvidia.com> <20240508155253.GK4650@nvidia.com> <30469615-2DDC-467E-A810-5EE8E1CFCB43@nvidia.com> <20240508163526.GM4650@nvidia.com> In-Reply-To: <20240508163526.GM4650@nvidia.com> From: Lance Yang Date: Thu, 9 May 2024 16:21:39 +0800 Message-ID: Subject: Re: [PATCH v4 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop To: Jason Gunthorpe Cc: Zi Yan , Alistair Popple , akpm@linux-foundation.org, willy@infradead.org, sj@kernel.org, maskray@google.com, ryan.roberts@arm.com, david@redhat.com, 21cnbao@gmail.com, mhocko@suse.com, fengwei.yin@intel.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, libang.li@antgroup.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Baolin Wang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: winq7ssw6hcdpy4hyhpbi7jt8ikzwsyd X-Rspamd-Queue-Id: 714CE100011 X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1715242913-247117 X-HE-Meta: U2FsdGVkX1/oG2EL958v67joOam7xp3bepCJ9tGppDAkoLoOIVVuKOKFUzbUrnPWBIwZRbMPO5V0DjiP7IEWHELzylRa1H4j04ZbO2OCOoRmZLkWEqbDMM0ZUtwyBTQI0l2eKCwGdVj5zy5zQ9WRajpp94xFnDz4n8Ab3bx9BYc8VIlpEbGtWWovUg+XPtPOhfbZE8c5yWkrp422N4u9VyE+vlPPYaoHWod4bL2XftlP9KMR3gcLBB/1aDjF8TmpA/MChygE7I8q3AhvVFj14IcGFF+aTr01xdZ1ntLi4sakRxg/4kdNSzNkkLcv5tFEAlPCxjCa3PONpkR4S+GrpD+J2oMbzg97D5W4571RQJyCLGIW+XlMoMc4//vBaFWwFF30aQok2QZoe8eAW4YqlvLamWFsbNogBfvLM+3BP0TYjqYnj/jRMCux5R2YL71ZrGfXG5Zc+nSbzhzCYXlcR+CbWCoxmzobkOgUBQQfWDUs1nh5xxTfZuUx2xNdqrYIaU36HsfbpHtv5tg3EO7/MEwzzBheAKjizVFxMtqzpTaAVBCfvt4+YtjqWmJ0RKKG0pzVua+EHvcWRpds5uaKoJU4UGVNLIBrtEFWhZLYMGydTQcJk1XC7rex2nRyBBhvtUJW4jD0IsW2Jy7p/L4uhkzueW7FCyDV3z88t+aLwfHtDfiHKZRhiII4EHTyF9NN1Eh9sCV4cvbmmS30w0LvhPibl5SndwbWfp5UQ38vicWq0O+ShGkQLP4WNK4A0FsQZ9IUcN7wOtsRi1XdP84eYdVjAMfYqajcNRiSbUqhTYUCf6KECPQKmqHzcNrMVm4TZaE6/3cu/E6dY+b5UdHMSbLY5kkmJEf5W8BjI178XV5wknSrpn0IdI1RGj23GUcIlsXKT/21FeTh0xulukrzeDXKbBPGeD4juP2hYgmxvGHBxa5+4QW1i0crJwF+wH4FC+UToxlF36k72NCqC4y 21fw9CH2 nq86D7K6FjCTPMoridjD5fNAyb1zXeBAA8vWh0skoBA2i7haGJ8RniQbDIoB3+QDkv24ml//PV658407Eaw3JV2j0u+x039e8MogpQtXvokMH9pjhvefe++YFG+nYHF2EDJAtWofWjKhlk6MBm52w8NK1t8nMqXaphSpjWj8FYmM3IEOsba7QkVrCSAYRGr4cnanzamMu5FJrnf1NFInqCPk+wGRO7VetkgP+S9M9FnWL7EOoqzo1gleXwZGUz5q6D57di5vaDU64uMIyJubs0sP3uE54ohlT5Pgsm6O6vgKTIY18R7fJAxFbxRMMlAtZe3XqqUaf3FTuSZ20/kaHj+L4LFwyrSWppsmg/fLWMAxy2gREiwWBBH7yDOxw92HdycS5JLcXuhwaeiuBRNr3KM1UNP8xPAir/cpY83d8oUjWdyHfn5BNYEuV50bYj8NRZi+36SRZKKZxWYo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hey Zi and Jason, Thanks a lot for reaching out! On Thu, May 9, 2024 at 12:35=E2=80=AFAM Jason Gunthorpe wr= ote: > > On Wed, May 08, 2024 at 12:22:08PM -0400, Zi Yan wrote: > > On 8 May 2024, at 11:52, Jason Gunthorpe wrote: > > > > > On Wed, May 08, 2024 at 10:56:34AM -0400, Zi Yan wrote: > > > > > >> Lance is improving try_to_unmap_one() to support unmapping PMD THP a= s a whole, > > >> so he moves split_huge_pmd_address() inside while (page_vma_mapped_w= alk(&pvmw)) > > >> and after mmu_notifier_invalidate_range_start() as split_huge_pmd_lo= cked() > > >> and does not include the mmu notifier ops inside split_huge_pmd_addr= ess(). IMO, It might be reasonable to exclude the mmu notifier ops in split_huge_pmd_locked(). IIUC, before acquiring the PTL, callers need to te= ar down the secondary mappings via mmu_notifier_invalidate_range_start() with the range aligned to HPAGE_PMD_SIZE. > > >> I wonder if that could cause issues, since the mmu_notifier_invalida= te_range_start() > > >> before the while loop only has range of the original address and > > >> split huge pmd can affect the entire PMD address range and these two= ranges > > >> might not be the same. As Baolin mentioned [1] before: "For a PMD mapped THP, I think the address is already THP size alignment returned from vma_address(&folio->page, vma)." Given this, perhaps we don't need to re-align the input address after starting the pagewalk? IMO, if any corner cases arise, we should catch them by using VM_WARN_ON_ONCE() in split_huge_pmd_locked(). Zi, what do you think? [1] https://lore.kernel.org/linux-mm/cc9fd23f-7d87-48a7-a737-acbea8e95fb7@l= inux.alibaba.com/ > > > > > > That does not sound entirely good.. > > > > > > I suppose it depends on what split does, if the MM page table has the > > > same translation before and after split then perhaps no invalidation > > > is even necessary. > > > > Before split, it is a PMD mapping to a PMD THP (order-9). After split, > > they are 512 PTEs mapping to the same THP. Unless the secondary TLB > > does not support PMD mapping and use 512 PTEs instead, it seems to > > be an issue from my understanding. > > I may not recall fully, but I don't think any secondaries are > so sensitive to the PMD/PTE distinction.. At least the ones using > hmm_range_fault() are not. > > When the PTE eventually comes up for invalidation then the secondary > should wipe out any granual they may have captured. > > Though, perhaps KVM should be checked carefully. > > > In terms of two mmu_notifier ranges, first is in the split_huge_pmd_add= ress()[1] > > and second is in try_to_unmap_one()[2]. When try_to_unmap_one() is unma= pping > > a subpage in the middle of a PMD THP, the former notifies about the PMD= range > > change due to one PMD split into 512 PTEs and the latter only needs to = notify > > about the invalidation of the unmapped PTE. I do not think the latter c= an > > replace the former, although a potential optimization can be that the l= atter > > can be removed as it is included in the range of the former. > > I think we probably don't need both, either size might be fine, but > the larger size is definately fine.. > > > Regarding Lance's current code change, is it OK to change mmu_notifier = range > > after mmu_notifier_invalidate_range_start()? > > No, it cannot be changed during a start/stop transaction. I understood and will keep that in mind - thanks! Thanks again for clarifying! Lance > > Jason > >