From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B138C54E68 for ; Thu, 21 Mar 2024 14:55:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BFF786B007B; Thu, 21 Mar 2024 10:55:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BB0F76B0082; Thu, 21 Mar 2024 10:55:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A77FA6B0083; Thu, 21 Mar 2024 10:55:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9723B6B007B for ; Thu, 21 Mar 2024 10:55:36 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6B8031C194A for ; Thu, 21 Mar 2024 14:55:36 +0000 (UTC) X-FDA: 81921345072.17.3FC08F9 Received: from mail-yb1-f179.google.com (mail-yb1-f179.google.com [209.85.219.179]) by imf24.hostedemail.com (Postfix) with ESMTP id 8ABAB180021 for ; Thu, 21 Mar 2024 14:55:34 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MBxVpVVO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.219.179 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711032934; a=rsa-sha256; cv=none; b=zsYFbRdJpdqEIHw3ttXCC41P65atcPF4U3WCIyCd1NJkK2htU/BPdV5K2ZU4sHZEOF3SVa 7fHXMddujSViYoB+t1NfBOvuiNSUUTEkzpnL73oln0E2oocOQtJdDV1YipXRtNZuj8sLUz V4wN5RN0SzjodudkNPyGqzvCyJr8fyo= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MBxVpVVO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.219.179 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711032934; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O9okbR8OL96sSf2IB4vvfZKNVvRdwRzTGD+ZoPzW3rc=; b=ehM7tQbCSB+ySZNBUL9B3Cv/+bzvrUFhlpOQfdiE+goCVVs/EmeEYf7xsjOkVD1xRA8FqW FT/52IGnh7hagjdcqD6w4jM2ppTEnyQexxRg9f2FfPbcDiSVjrGSzAGW1bLQH6Jdo/yRb5 CSrYxlnV64akbtiNprG4qnikd5/kI4I= Received: by mail-yb1-f179.google.com with SMTP id 3f1490d57ef6-db4364ecd6aso1022714276.2 for ; Thu, 21 Mar 2024 07:55:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711032933; x=1711637733; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=O9okbR8OL96sSf2IB4vvfZKNVvRdwRzTGD+ZoPzW3rc=; b=MBxVpVVOzwXlUoYHgQ02XrK3AkiGJheY1SXENBp3/XQF3nXqxTFi+iQHq2UIxb7qV7 whaiJizdBTss2zaC1AkShnX1pcWStEWHZtqhPw5FjfMetdzX7EO4lXddFJshAbMVG17A rtg5oE6s3SCNKR1D4uvJNcViHWsTzdhkmySeMXuUU1Tto1JNTkmAMUYTgoBUo7c/oAiq S+WyEZj8WKToLt+pvT+MpsOq+gkUFWSXAEpeUrGpyUVTr+mkkd40UyIuVZyB6f8dvE4s w9tTFQoIm9qZkEJIiENpMOeTjBO9qOKX+QWtzgm4+CIoBQjx+3G13XK2C+qT9z9eBAvD 8ogw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711032933; x=1711637733; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=O9okbR8OL96sSf2IB4vvfZKNVvRdwRzTGD+ZoPzW3rc=; b=P9ZPco8NOsgX8QA5w/wXDyNkQ67COEnRp2FS2OidPlZdYTHxhBjfb5nggwwmBeaE3r OryptIehMRPJ3Cp5OKdDiA0XcVshGO4MT+D/hhQfJWCD94kz7vPoDo2rREYaaYe7fW2h K/r4CbOZXsD694084F8H+jx55lJxzmcp2PE1Ebz39UleQ/B3548Y30otvS/lCjJQUzSD /RYB8Y25r3QLEKolahuNVKqBiHgcY2nNfYoyHaflMN1xikmoZOyVHg/IgbJbQEOiWBVG VYjpdevvpTpVX04JiFzGEyArWfi7wXLqih1MnDz1I1f9XZLcRuN8ws/lzOnwMiB9lu5r ZH1Q== X-Forwarded-Encrypted: i=1; AJvYcCXvNJ9gSFtu4eEJXSy4jW8K8mQc0nOZaas4mWmon2xO9h3Sw3yVLT3BczHyQMX72UFIof93ibncfbCV9CfCRgvoFVI= X-Gm-Message-State: AOJu0Yzrv3/nTnAEp2BGiRxgtlUJ/1oJYAQRE4tgWOJN/KBGaVLmkLAP sv42L+oiL4g5osDQIfEbIjybEJr4VXBIzzbbHrXdD1ZIFzcBNdn5bggZrDIOLk111fvzNyWXlyY 5SjURY6nZoP+Kh/mMDwO7pi5FiQ8= X-Google-Smtp-Source: AGHT+IHJvcwqqVXTtvQyBSUp28DwTVgCQkYFl2W4XIt5+pd5oRbNxiwwujUARlkIt7BMX3SDfNCgMzs2G/n87mYiRI0= X-Received: by 2002:a25:c504:0:b0:dc2:279f:f7e with SMTP id v4-20020a25c504000000b00dc2279f0f7emr1975834ybe.10.1711032933307; Thu, 21 Mar 2024 07:55:33 -0700 (PDT) MIME-Version: 1.0 References: <20240311150058.1122862-1-ryan.roberts@arm.com> <20240311150058.1122862-7-ryan.roberts@arm.com> <7ba06704-2090-4eb2-9534-c4d467cc085a@arm.com> <269375a4-78a3-4c22-8e6e-570368a2c053@arm.com> In-Reply-To: <269375a4-78a3-4c22-8e6e-570368a2c053@arm.com> From: Lance Yang Date: Thu, 21 Mar 2024 22:55:21 +0800 Message-ID: Subject: Re: [PATCH v4 6/6] mm: madvise: Avoid split during MADV_PAGEOUT and MADV_COLD To: Ryan Roberts Cc: Barry Song <21cnbao@gmail.com>, Andrew Morton , David Hildenbrand , Matthew Wilcox , Huang Ying , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko , Kefeng Wang , Chris Li , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 8ABAB180021 X-Stat-Signature: ut4j13ftdac6qzcjipnmtei4x6pwa94s X-HE-Tag: 1711032934-304703 X-HE-Meta: U2FsdGVkX196ZSGLYJpHJGBno6wPiXZd1M7de1ZMrBPLJRMDvgECGnRBhA/IkwCj9S+XGmZwBQgLRK+KU7MZLp1NBceUw6hK17Z0t1fyrEmwx+rQS0WkpkClr+9lXKSVUccCs9IF2j82Zi6k85Bc0A4LbSlnbcdG/JyqepvFCJJyVCZtkX29vTyUoN+6ANxuxeZ2CLD4LEyjmcR6vVG7cNlENo0znrl8+HugLaCVD18Q8POwImp96iHZrSg6DA+YLuyq5Yq/i0X14tW+EqEsZ0/6I1E95EiwnJtZDj0zUXaAiBI989kk0/w1COWaaqBUTjXuyLWf6n39H9MdT9zzUo62NI+8IMJvmO+YAmTrgEAVDl1y/X4bmJb27nw7Z4WjN3JS6TpiQjWUWlQjsxVoBwUKaJmmvjiuHfvNYl8LRAxaTeKlWtcYX55yD4+JmPjZogC24qILVFiO63ghCITSe9LVfIYZXEk2EkSWmZIqLfawDDW+BC1Q9rsI38BXNXuCoqYS+XInKivMI21skbWhmQJQtYpfhu6w7hxx24D4S3UTpHtvkFt422kcCeg7lP5WQm7BqLt48ahLRLRdbW7/2vIDK9zEAWIGSR1EIsyV6oF6mzNIwnj6AObAPHVcYEOaYqbqEFv35Nom4mkpU9h2bkDPo6lrKpuIq8MkEuvEf8eH7SgP42nYrfUJTLWwNJJZUNJNAcHEAg7bcrJ2dtBIrF/coUM+6EIY3GUVUibROn+t+286lAvSZflAK4Y8e7B7bxfwgk7Xa/Jdhspgl6WO4rjfkuFfkEEU0bPcYq7FQV8TTF2GNjTp8fh/ek9GgpFqzxopAJzgIzq4DhNSdlwOUsHHzpuwtmBf8GbhMzIrplMfNFh0Ivz+tGKdH6RXvlk8PMbblMCpAifcau8m5jyIvnkp/gVsq07DurAaGcTUNnSHXJFMGq704CpAuK2UzrwWiavW1OsJKetcuJsMWks NzXwBip9 //5XsquT7n+dPVZC/TRnRa4t5VYIN5zO/DpuhQJBWufAePKI7+QqFjVu9+70D5ynHwLn6TwgspZQ4wqHQv1zLjpM0C+mTFrND1evrWb+9HbDUO/Ao5bT4+OZSBL4GAUGnyDks5XwOJ+IpYgRY+AIPSgQecuPlrM/0aQLHMBQGCacxlu+1ldCPXomIUfilX9w+rqp3QDHw5sYmgyGXByFK3AsBGkfgHmJ9IaWXiugDX+M6jK2M0MxtdnjwZ4buGFAyXLHQ1zOOlqWCzULarKpC3LPl4qAwkZQVffxGqwR2maqwoPMhjZXxtOLrRbG/P2RICguYtyw8ci5aKIzg1QfC13VVrYFjbgtol6l+lCIloy8vIAm31U9z4q1+nrB5A72QEba275FMVDpw5//JiVjN0tubO1bA5ouWJ0vKVs+FzrOHHTx86UzEP3wc/2u1qiko7iwSQXuZqEShSWlgPZ5+zsS+wqKQE/mXKC3UJgfu0cKqS91yw+w4ElpxcEDYkhiT11bCADJPzOORuO8JI3PZ+Ch3uhKgYoizFKAw9unZ3dqgLkN7PJBo3Od3RuW/+fGFuA5ZcwOwj9n0U2fybDE/yUt+ZqUJtYXj+GZQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 21, 2024 at 9:38=E2=80=AFPM Ryan Roberts = wrote: > > >>>>>>>> - VM_BUG_ON_FOLIO(folio_test_large(folio), folio); > >>>>>>>> - > >>>>>>>> - if (!pageout && pte_young(ptent)) { > >>>>>>>> - ptent =3D ptep_get_and_clear_full(mm, ad= dr, pte, > >>>>>>>> - tlb->ful= lmm); > >>>>>>>> - ptent =3D pte_mkold(ptent); > >>>>>>>> - set_pte_at(mm, addr, pte, ptent); > >>>>>>>> - tlb_remove_tlb_entry(tlb, pte, addr); > >>>>>>>> + if (!pageout) { > >>>>>>>> + for (; nr !=3D 0; nr--, pte++, addr +=3D= PAGE_SIZE) { > >>>>>>>> + if (ptep_test_and_clear_young(vm= a, addr, pte)) > >>>>>>>> + tlb_remove_tlb_entry(tlb= , pte, addr); > >>>>> > >>>>> IIRC, some of the architecture(ex, PPC) don't update TLB with set_p= te_at and > >>>>> tlb_remove_tlb_entry. So, didn't we consider remapping the PTE with= old after > >>>>> pte clearing? > >>>> > >>>> Sorry Lance, I don't understand this question, can you rephrase? Are= you saying > >>>> there is a good reason to do the original clear-mkold-set for some a= rches? > >>> > >>> IIRC, some of the architecture(ex, PPC) don't update TLB with > >>> ptep_test_and_clear_young() > >>> and tlb_remove_tlb_entry(). > > Afraid I'm still struggling with this comment. Do you mean to say that po= werpc > invalidates the TLB entry as part of the call to ptep_test_and_clear_youn= g()? So > tlb_remove_tlb_entry() would be redundant here, and likely cause performa= nce > degradation on that architecture? I just thought that using ptep_test_and_clear_young() instead of ptep_get_and_clear_full() + pte_mkold() might not be correct. However, it's most likely that I was mistaken :( I also have a question. Why aren't we using ptep_test_and_clear_young() in madvise_cold_or_pageout_pte_range(), but instead ptep_get_and_clear_full() + pte_mkold() as we did previously. /* * Some of architecture(ex, PPC) don't update TLB * with set_pte_at and tlb_remove_tlb_entry so for * the portability, remap the pte with old|clean * after pte clearing. */ According to this comment from madvise_free_pte_range. IIUC, we need to call ptep_get_and_clear_full() to clear the PTE, and then remap the PTE with old|clean. Thanks, Lance > > IMHO, ptep_test_and_clear_young() really shouldn't be invalidating the TL= B > entry, that's what ptep_clear_flush_young() is for. > > But I do see that for some cases of the 32-bit ppc, there appears to be a= flush: > > #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG > static inline int __ptep_test_and_clear_young(struct mm_struct *mm, > unsigned long addr, pte_t *= ptep) > { > unsigned long old; > old =3D pte_update(mm, addr, ptep, _PAGE_ACCESSED, 0, 0); > if (old & _PAGE_HASHPTE) > flush_hash_entry(mm, ptep, addr); <<<<<<<< > > return (old & _PAGE_ACCESSED) !=3D 0; > } > #define ptep_test_and_clear_young(__vma, __addr, __ptep) \ > __ptep_test_and_clear_young((__vma)->vm_mm, __addr, __ptep) > > Is that what you are describing? Does any anyone know why flush_hash_entr= y() is > called? I'd say that's a bug in ppc and not a reason not to use > ptep_test_and_clear_young() in the common code! > > Thanks, > Ryan > > > >> > >> Err, I assumed tlb_remove_tlb_entry() meant "invalidate the TLB entry = for this > >> address please" - albeit its deferred and batched. I'll look into this= . > >> > >>> > >>> In my new patch[1], I use refresh_full_ptes() and > >>> tlb_remove_tlb_entries() to batch-update the > >>> access and dirty bits. > >> > >> I want to avoid the per-pte clear-modify-set approach, because this do= esn't > >> perform well on arm64 when using contpte mappings; it will cause the c= ontpe > >> mapping to be unfolded by the first clear that touches the contpte blo= ck, then > >> refolded by the last set to touch the block. That's expensive. > >> ptep_test_and_clear_young() doesn't suffer that problem. > > > > Thanks for explaining. I got it. > > > > I think that other architectures will benefit from the per-pte clear-mo= dify-set > > approach. IMO, refresh_full_ptes() can be overridden by arm64. > > > > Thanks, > > Lance > >> > >>> > >>> [1] https://lore.kernel.org/linux-mm/20240316102952.39233-1-ioworker0= @gmail.com > >>> > >>> Thanks, > >>> Lance > >>> > >>>> > >>>>> > >>>>> Thanks, > >>>>> Lance > >>>>> > >>>>> > >>>>> > >>>>>>>> + } > >>>>>>> > >>>>>>> This looks so smart. if it is not pageout, we have increased pte > >>>>>>> and addr here; so nr is 0 and we don't need to increase again in > >>>>>>> for (; addr < end; pte +=3D nr, addr +=3D nr * PAGE_SIZE) > >>>>>>> > >>>>>>> otherwise, nr won't be 0. so we will increase addr and > >>>>>>> pte by nr. > >>>>>> > >>>>>> Indeed. I'm hoping that Lance is able to follow a similar pattern = for > >>>>>> madvise_free_pte_range(). > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> } > >>>>>>>> > >>>>>>>> /* > >>>>>>>> -- > >>>>>>>> 2.25.1 > >>>>>>>> > >>>>>>> > >>>>>>> Overall, LGTM, > >>>>>>> > >>>>>>> Reviewed-by: Barry Song > >>>>>> > >>>>>> Thanks! > >>>>>> > >>>>>> > >>>> > >> >