From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 713AEC25B76 for ; Wed, 5 Jun 2024 14:20:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D2A1B6B0082; Wed, 5 Jun 2024 10:20:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CFFE76B0085; Wed, 5 Jun 2024 10:20:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC7996B008C; Wed, 5 Jun 2024 10:20:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9F7556B0082 for ; Wed, 5 Jun 2024 10:20:55 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 460C540FE1 for ; Wed, 5 Jun 2024 14:20:55 +0000 (UTC) X-FDA: 82197046470.17.7237B78 Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) by imf29.hostedemail.com (Postfix) with ESMTP id 62A42120009 for ; Wed, 5 Jun 2024 14:20:53 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MmlxygZW; spf=pass (imf29.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717597253; a=rsa-sha256; cv=none; b=v3nZz6XZkV6t77Wux5zfD0Q8pt3wFLfb8hTi5uGHoZbqbP4jYIYg+Xjrpxpc8TcEzNJZnb 53xz31FqUOjBr6wwWx8RlJSnffK6rb+gl7yb2xPOGNsuZSVSmp3KT2QwheRwiqzAvNjGsj bKIb6pIIOV1375yPrksbyaypVAo6FFI= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MmlxygZW; spf=pass (imf29.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717597253; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=f6/nFcvo4Plib4eYfTDh6OIq6sf+MB/iDA/4kpOYR+k=; b=0ITw6Lu3a9LzW3Uqo46jjPVxSfMg677noP9niCueudeWB6Lb60NwSJH9naX3H3nUXeVsDH UFRgR4krFdnOWL8pNWQbhbeyWGJL3IuXKHLtPruF4AP1REErHVXLHEB2AV9zCSuoyKeBbi pEPOGlntiOaixN1H/qvl3tH4pRPOSV4= Received: by mail-lf1-f54.google.com with SMTP id 2adb3069b0e04-52b976b5d22so4324676e87.1 for ; Wed, 05 Jun 2024 07:20:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717597252; x=1718202052; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=f6/nFcvo4Plib4eYfTDh6OIq6sf+MB/iDA/4kpOYR+k=; b=MmlxygZWNXUbrjPbB8Cgg+1iquEgs0Ktl50OsTrEVnY53e7NtuvFmkeMeenmBOoV8X /dq3Kv78628o1CaDLvNSKdJ/T8OveH4l4a3Zpu3zl4gKqExn4vf8/FIkQI5vm8/En2UD 2RuHrFr86JgpDfaK8thWQKl/swQk/uXus2WvZB9V0vnG7Py24VHAgc0895cf/HS8Oyuf izbexaTPmvK9lz3t3uUkUwbh7sU3SOWFNLY9x76Edbf4KP6RJhUC4AnwUlfgA+y8Jt7d gNEP787S4nUmJi4McYl5qRJrD9mTwUJaPYG7tSCuOLIcYDoRJgSzDHESpT0qdfJyUB2k AJ7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717597252; x=1718202052; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=f6/nFcvo4Plib4eYfTDh6OIq6sf+MB/iDA/4kpOYR+k=; b=oHKy/bPJnanvBne2qLsA4kpO0AAAxT3f/9j1oNptlc8TfG4mekersmZSV+5UTQyoE+ Y0pNzzS5wzVWsFIXdgCB1RmTJJcwm/jcTVMVg5vnKJ0l+2/pFIs5mmKKvsJvDgpnR3TG csAHtkzGimxnlSmOVei7WjBbfaiICB3NIIETJGhs05A8yVH6Vlln3868YZwYFlfMBBVK fu7tH82uxFXjC2nc6hmdeZkF9dzvM+Fct3p8XHmL/s7Neg5mcFszmL6i3YYl+/bwzdid B7GJ7c3vVLf08YX41KiOI3tsMRFNAnnZYKCGmJ7P3w5W/MJylROEUsPElRQLpDosB9mF SWeg== X-Forwarded-Encrypted: i=1; AJvYcCXnPioHwM/zwJEAs/JnFtzvae+UKl0SDJKzlx9c1hIT+K1r7joRyEq9J15wesX7t1PE6a63kqYDfSHlfI63tCjpMgc= X-Gm-Message-State: AOJu0YxRZer0mUG0/OW6jAW5g0brmA1/76t1qLQpJzWE7xjUv0hJykh5 cC0t9mWHATQ73YQ8HP6jDH5hWS74iTkQRcZVUF7FgIVh8x7PNN39mfyZQvEUNeMqoc4/gisr65b kMcV9jfq9O9KppMQ7UTjpIGHhV6A= X-Google-Smtp-Source: AGHT+IEclwAhv2Gs76M218qecK/f9MCwW8rrRU378XIhtJLYeZe+NoqxXepfOxLbW0F6m2D9FYleT8L7sWbciSpknMc= X-Received: by 2002:a05:6512:4891:b0:52b:af6b:7b4c with SMTP id 2adb3069b0e04-52baf6b7bffmr1031490e87.30.1717597251361; Wed, 05 Jun 2024 07:20:51 -0700 (PDT) MIME-Version: 1.0 References: <20240521040244.48760-1-ioworker0@gmail.com> <20240521040244.48760-3-ioworker0@gmail.com> In-Reply-To: From: Lance Yang Date: Wed, 5 Jun 2024 22:20:39 +0800 Message-ID: Subject: Re: [PATCH v6 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop To: David Hildenbrand Cc: akpm@linux-foundation.org, willy@infradead.org, sj@kernel.org, baolin.wang@linux.alibaba.com, maskray@google.com, ziy@nvidia.com, ryan.roberts@arm.com, 21cnbao@gmail.com, mhocko@suse.com, fengwei.yin@intel.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, libang.li@antgroup.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 62A42120009 X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: 4ookg6hnf4aw6tra8ob3b8m5axf9fm4u X-HE-Tag: 1717597253-772255 X-HE-Meta: U2FsdGVkX19arTVVz+vp8LEEMIM97eYHXGKH2D8S/PvCt0PEvhNxmiBYaFAWU20Y8fipqMpYYTZ0nTXXRzu+4XO0WtcTB1xZ93djDsXMNXl8hGZDkJL/mCfpc62SNdDnYC9UqtsoRRslM12Uyn2oKTmV+X4kwu3NbvhTyu7RoNAw5M8F9SKQn/KhP0wjXkNwBApA2Ld42ZsNPPeaFHN7XLUAwwLz4e8HaXZBR+cfcTWaRgh5WXrIBqTe47xiNTdI6FnzvIJyZ/QMpBiKUyekMok+gCu/7xtbaGEqHTsI0naYPkVQvWfHMwmBTFPQGuq5e7QhY3QyIA2LQRp4VDF9APbQjkTYeGGAFzoffG10OhYqCMVBlYXanXf5yC4JpBYsmpUrczewd2wVjYfKh8T456FU5w5f9LP0gOrKZEE3KdYSDGyNNYOB1nPej7Y8+mR2ufl/cTlYmdxO67uvlGJpjGZV4zeAw1x7ANl3jXojR3Bi5bfOoN/CQGxr7KejEQePWsV9CCojI17FQfI3SI1aM5EQYDTYUKXiNqZUHlt0SzAveawQFeX4gOVvYO3fHTPKf0Drhx3+9zS4gSIHsAKo9pv0Y79QZdOnS7nbh6olcU8p/KZAiFJ5P4oqneHn2kA/AdBolXqq7g/GmWqIEpHROZ6tFrikb8EtywUDmydFCJ/0A465G4E+Balnmq9NQNs+UhO4srFCTZApbXLyeIKOwpv6rtT0e4+RCt0AEaS5C1GNmPzqjUyQqOzoYcfku0/lm7nvsHSiHfh2a6KytJ9oUNJIy6QbuCNPYawoluaHHNVmYCJ4NRgHN7K0JSESaow37Fx0Z4DK/rzjOwrJx9c22gWMIxpDceaOBSYOa9dCmKGF/WxgjyuyBR9iRflcxjl5EOBguuj/PWKd/OSKaBiObZ4+0RpZJLYMLDtnzGmtg8CSDTvpSyfA9J2iCCui8yvbda/7uQkeSGSzAfb0I6K 6qZia4Ud uYOwKOajFpenn9bFQ2JoXsP7PaGND0XopWTge4mneYEQOOwIrtkTZsSGw7+/3bEMfcHaUsaalZmj7yZde1liEpiWLAoIA1SVxpIupm7bzwiP7EdQ3naOt6cJZFV6xghen83a6arRLesYm7+QCGLZ8FBMqX9rqkk7DjSC9A2jbueHw/Fi5t4WJFYlhliej5E9scnj172YNlMn1RjeRKL2AytsdlTZriceT0k8xNs/e6zXTdzSJ/vkmeAgdplvzXePaLJOFVwLYJx7qzpaewxFZIcvrIGXoxAK0T6UZiaKF7u7qspfOlsgXZAQMzT9tWTCjHfx+E1VyQMt9I1DDGMRq65HE+HQRZ2Ncbif3+PP9bIekuF7b0CGMW6RIHyRkFCDVIFe8+wdfCxTwy8gGVGbIbOwkynYizI7dJIkbrHh5nSZ1SFqxWjJP2grWe8WOTtLFUldnL5aO2sKy5IcWPcOnVX5sylG0conRJZxLcXBiVnLCdgKU8Ofama6nj+PjFUJhAxpe X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi David, On Wed, Jun 5, 2024 at 8:46=E2=80=AFPM David Hildenbrand = wrote: > > On 21.05.24 06:02, Lance Yang wrote: > > In preparation for supporting try_to_unmap_one() to unmap PMD-mapped > > folios, start the pagewalk first, then call split_huge_pmd_address() to > > split the folio. > > > > Since TTU_SPLIT_HUGE_PMD will no longer perform immediately, we might > > encounter a PMD-mapped THP missing the mlock in the VM_LOCKED range dur= ing > > the page walk. It=E2=80=99s probably necessary to mlock this THP to pre= vent it from > > being picked up during page reclaim. > > > > Suggested-by: David Hildenbrand > > Suggested-by: Baolin Wang > > Signed-off-by: Lance Yang > > --- > > [...] again, sorry for the late review. No worries at all, thanks for taking time to review! > > > diff --git a/mm/rmap.c b/mm/rmap.c > > index ddffa30c79fb..08a93347f283 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -1640,9 +1640,6 @@ static bool try_to_unmap_one(struct folio *folio,= struct vm_area_struct *vma, > > if (flags & TTU_SYNC) > > pvmw.flags =3D PVMW_SYNC; > > > > - if (flags & TTU_SPLIT_HUGE_PMD) > > - split_huge_pmd_address(vma, address, false, folio); > > - > > /* > > * For THP, we have to assume the worse case ie pmd for invalidat= ion. > > * For hugetlb, it could be much worse if we need to do pud > > @@ -1668,20 +1665,35 @@ static bool try_to_unmap_one(struct folio *foli= o, struct vm_area_struct *vma, > > mmu_notifier_invalidate_range_start(&range); > > > > while (page_vma_mapped_walk(&pvmw)) { > > - /* Unexpected PMD-mapped THP? */ > > - VM_BUG_ON_FOLIO(!pvmw.pte, folio); > > - > > /* > > * If the folio is in an mlock()d vma, we must not swap i= t out. > > */ > > if (!(flags & TTU_IGNORE_MLOCK) && > > (vma->vm_flags & VM_LOCKED)) { > > /* Restore the mlock which got missed */ > > - if (!folio_test_large(folio)) > > + if (!folio_test_large(folio) || > > + (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD))) > > mlock_vma_folio(folio, vma); > > Can you elaborate why you think this would be required? If we would have > performed the split_huge_pmd_address() beforehand, we would still be > left with a large folio, no? Yep, there would still be a large folio, but it wouldn't be PMD-mapped. After Weifeng's series[1], the kernel supports mlock for PTE-mapped large folio, but there are a few scenarios where we don't mlock a large folio, su= ch as when it crosses a VM_LOCKed VMA boundary. - if (!folio_test_large(folio)) + if (!folio_test_large(folio) || + (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD))) And this check is just future-proofing and likely unnecessary. If encounter= ing a PMD-mapped THP missing the mlock for some reason, we can mlock this THP to prevent it from being picked up during page reclaim, since it is ful= ly mapped and doesn't cross the VMA boundary, IIUC. What do you think? I would appreciate any suggestions regarding this check ;) [1] https://lore.kernel.org/all/20230918073318.1181104-3-fengwei.yin@intel.= com/T/#mdab40248cf3705581d8bfb64e1ebf2d9cd81c095 > > > goto walk_done_err; > > } > > > > + if (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)) { > > + /* > > + * We temporarily have to drop the PTL and start = once > > + * again from that now-PTE-mapped page table. > > + */ > > + split_huge_pmd_locked(vma, range.start, pvmw.pmd,= false, > > + folio); > > Using range.start here is a bit weird. Wouldn't this be pvmw.address? > [did not check] Hmm... we may adjust range.start before the page walk, but pvmw.address does not. At least for now, pvmw.address seems better. Will adjust as you suggested. > > > + pvmw.pmd =3D NULL; > > + spin_unlock(pvmw.ptl); > > + pvmw.ptl =3D NULL; > > > Would we want a > > page_vma_mapped_walk_restart() that is exactly for that purpose? Nice, let's add page_vma_mapped_walk_restart() for that purpose :) Thanks again for your time! Lance > > > + flags &=3D ~TTU_SPLIT_HUGE_PMD; > > + continue; > > + } > > + > > + /* Unexpected PMD-mapped THP? */ > > + VM_BUG_ON_FOLIO(!pvmw.pte, folio); > > -- > Cheers, > > David / dhildenb >