From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF83FEB64DD for ; Wed, 12 Jul 2023 17:04:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 322366B0071; Wed, 12 Jul 2023 13:04:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D27D6B0072; Wed, 12 Jul 2023 13:04:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 19B1E6B0078; Wed, 12 Jul 2023 13:04:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0CC316B0071 for ; Wed, 12 Jul 2023 13:04:20 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A5BEEA02F7 for ; Wed, 12 Jul 2023 17:04:19 +0000 (UTC) X-FDA: 81003583038.22.17CA59E Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf16.hostedemail.com (Postfix) with ESMTP id A7BB218004C for ; Wed, 12 Jul 2023 17:04:05 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=oBUkGXWd; spf=pass (imf16.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689181445; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iVeD9z3PSK+aTOmTD9k9UOsEux3BkF5Ba5FOkdbVMMY=; b=ac/DK3arELSrFLpEGTzxucONkHKABkmijM65vyILra6N+CSjgQ8mrfENTEhvJy4qIYrM2P F72J7OTtyyrW8jN6eUmhdt0NER1AIYQrYkUveN7x5pj8hnERivzB1PUiAogUZuovM7rL9M GwrzLzamBSDgk+l7Who7l8mcsI16zmM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689181445; a=rsa-sha256; cv=none; b=wJ/jo7RPzpUwbFFQ8mDhYQ0XsWJeWSMWprrZWgi2tRTYKJU1ybDaYKNgYOMi8+NyJXeujQ n1MVdYQEdsid/FlSam5rDXhpso72SMBGOlnsyf1QuqJSHfFNXTB2qxEXzC0fjoftBh6x8X YGlmIFG5b+ObJV4jqXA0YqiGCDKK7K0= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=oBUkGXWd; spf=pass (imf16.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-403b622101bso9081cf.1 for ; Wed, 12 Jul 2023 10:04:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689181445; x=1691773445; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=iVeD9z3PSK+aTOmTD9k9UOsEux3BkF5Ba5FOkdbVMMY=; b=oBUkGXWdvJ2JxocSbrod93y4yf8u6vIBXK9palvGTFrYpkNReBBTzs8NIKHoXVq9Cl gha2Znimw/T76YjOUILZAzyD9jk0B4VRXeSswhyty/RzU+qmXeK+0e1MYkon9NS58UeF TnslBH760mLeXap2axCSc2Is6XsXsZRpq1E9I4748r1Uz1SXf+dZCVdag6xCsHniv/Ey WIMlvrbeWXThZdzkUFkxMD8ED0F3PdT9V4V4xLDz6K+Xfyv+05xbDr0fEHUr6SK2eKwI 8APyOje9W4pWYrW8Lszh7jAoDvOjdYLy6x32Pj6K11CBZd1jK4zZ7LlbYWiyoDztjwaD X8Lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689181445; x=1691773445; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iVeD9z3PSK+aTOmTD9k9UOsEux3BkF5Ba5FOkdbVMMY=; b=AiAs+37wvR6pJ29IRMjJG4YTFKUQ1W1cw5DO3mYM0kYwr0JEfk4ubarTkqUxbiJhJf Mz7Ju9fFftShSyTU+kPz/mY+FjXxx2x3pgWgORYHgbPOzX1DyggjqE2RePRpuofZ9rkM k9KBJAwoGV1A4cDVeC7pC/sMY4Va8DKKI+rnqT6LX00IrRl7bZPsxXuoM6T7hkWHRvgd DZgr1VtiNOiiNQSfXrV7kC3giu+A2xlNJXqy9XFVK5N52expUZ/Vl66rq8ZGa9jJHVhk rgTfG9r12rJFXybCcxLIk15m4CbD8tjseAG6K1aye1TQd2Orn0JL/stGYRQHHabpnCf3 tfkA== X-Gm-Message-State: ABy/qLYXW5Tve64ypD65tl5gHaUh85N/1E9nqx1UfV5bNbDR5LN5rQak x+nJQFrYx5E/hNGaWj81ZSbg/0ancw3aTybqvBqQJQ== X-Google-Smtp-Source: APBJJlGZ1d/pAUKgErJQIh+061YDvP9bTZu16m9NABkbpf09nzWY2JK0PAIPYbODZAbSKg1amLnx3kRjl6rFxzFpkQY= X-Received: by 2002:a05:622a:241:b0:403:a43d:be41 with SMTP id c1-20020a05622a024100b00403a43dbe41mr332490qtx.20.1689181444608; Wed, 12 Jul 2023 10:04:04 -0700 (PDT) MIME-Version: 1.0 References: <20230712060144.3006358-1-fengwei.yin@intel.com> <20230712060144.3006358-3-fengwei.yin@intel.com> <6cc5a915-a28c-983f-9b32-6040f033970b@intel.com> In-Reply-To: <6cc5a915-a28c-983f-9b32-6040f033970b@intel.com> From: Yu Zhao Date: Wed, 12 Jul 2023 11:03:28 -0600 Message-ID: Subject: Re: [RFC PATCH v2 2/3] mm: handle large folio when large folio in VM_LOCKED VMA range To: Yin Fengwei Cc: hughd@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: A7BB218004C X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: nqpme7jumma4okwmt6idtadqi73br1xr X-HE-Tag: 1689181445-729638 X-HE-Meta: U2FsdGVkX19N/zaJBGCHT0rPdBZoPwqJDUghLJ59IpZQMDBbw/QRekE33G7c/iPz0gb6qg36tWY8NvuedqjK5Jmqb0b8amnrSCepm4PkZaEjcxG4xBSIxw6BJFUSW+V0j0EbkwNtySwXe7spbqZ/eap1yNgScwYU0D57Y0mJHIs0gZwOQLIPz5F1NQVsFiU4p0jXJ8RSvitwHde6kfsPU0PZPNn/yp3OlYOUFJW3psvz75JXi/G17swDD4vPDU5/kmkRU/b6lnTThsKMa4d7NdCx3Ea00izTnJypGdXwM2XqCl4bazigBi/JAtulBkUA10twEhZy46mASIE2S2EOuluTyZQp/vA2QrPAg7dy4dWDLlVAB8XUg8PeS8ZLVMOeO9lFC6fLCMbskYPqeQeQe25eB0wLlDNZfx9H7chnmLRlv21jVVYVCrXarRWPUceVHQdxP6ljNgrLpUoSpPE4P/cg7nqu/H0R3sd7v55GryPTnsTfLA7W88azmuN4K/hc2WTSP7u1iRRtVkTvxrDJ1wnM3SihKYbp9fqTZAyOgjBqWXJXZlt9phJoDl6jc81AQ5QX1VzMnphZvTuwi+icyDBrkj5ppswGUL/1MQSfRvTa0UkcOHUYnQlHxib6yUu0DRUGLHHLL1gvZxJKRfh7JxZnR0VOlsTMGBOX3oXrrKNmmKkaf4difXaLUPGu1xB9IbUUGsjtjo2j27gPxqVTm303MU+JW3EfNJiSwkVrHNRX8+7KD2D55oOjbJ06ADGM+0oYiWWlUQUpqGiJWWzidOBXMR9a9ZMpySL+f7uYUmbmbquI9XfergMBmxoZIBdn1Sy/HnMi+XUYsAiLWSCiGl+e+GixwLTDaxjtQISLKuVDPnN82tdKJV0xoQQlYl+GIjsr7H37ttQfPH1cZ16Y/h3myIUdcQL68kH1BDkqv8BnvVQVODdmGZTuzT7e6l4y47Yc2dxfy8woZA9Lu7W Cn5iQrT4 9OsS92aZMK0MY2/GxNjbOGHtsVd61ejATGvZfTET1lboFfmSs5tPzUwAvqEs0QOIPINNRtC1NbV6ehPKbU5sP5/L4kLXlh/wSgGEr48F+SLveggS1rXT5eEYWqZgYBzWlc7goCV5XYVRLyXTdMckjQZnSq3dHTUCeJzbqFrvULU22w82OGVr+yUATl/owm8lQzhNRADuq+M1AYh6ZzM/BdTGy/FDcn1wBdS2pr6QtGi9x5LejxlLgEqPqwQ0rwpeAImQ3Gz9+U784RXvfyTLoo79sCrKlkzd8Oe41ZF31NuspPkQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 12, 2023 at 12:44=E2=80=AFAM Yin Fengwei wrote: > > > On 7/12/23 14:23, Yu Zhao wrote: > > On Wed, Jul 12, 2023 at 12:02=E2=80=AFAM Yin Fengwei wrote: > >> > >> If large folio is in the range of VM_LOCKED VMA, it should be > >> mlocked to avoid being picked by page reclaim. Which may split > >> the large folio and then mlock each pages again. > >> > >> Mlock this kind of large folio to prevent them being picked by > >> page reclaim. > >> > >> For the large folio which cross the boundary of VM_LOCKED VMA, > >> we'd better not to mlock it. So if the system is under memory > >> pressure, this kind of large folio will be split and the pages > >> ouf of VM_LOCKED VMA can be reclaimed. > >> > >> Signed-off-by: Yin Fengwei > >> --- > >> mm/internal.h | 11 ++++++++--- > >> mm/rmap.c | 34 +++++++++++++++++++++++++++------- > >> 2 files changed, 35 insertions(+), 10 deletions(-) > >> > >> diff --git a/mm/internal.h b/mm/internal.h > >> index c7dd15d8de3ef..776141de2797a 100644 > >> --- a/mm/internal.h > >> +++ b/mm/internal.h > >> @@ -643,7 +643,8 @@ static inline void mlock_vma_folio(struct folio *f= olio, > >> * still be set while VM_SPECIAL bits are added: so ignore = it then. > >> */ > >> if (unlikely((vma->vm_flags & (VM_LOCKED|VM_SPECIAL)) =3D=3D V= M_LOCKED) && > >> - (compound || !folio_test_large(folio))) > >> + (compound || !folio_test_large(folio) || > >> + folio_in_range(folio, vma, vma->vm_start, vma->vm_end))) > >> mlock_folio(folio); > >> } > > > > This can be simplified: > > 1. remove the compound parameter > Yes. There is not difference here for pmd mapping of THPs and pte mapping= s of THPs > if the only condition need check is whether the folio is within VMA range= or not. > > But let me add Huge for confirmation. > > > > 2. make the if > > if (unlikely((vma->vm_flags & (VM_LOCKED|VM_SPECIAL)) =3D=3D VM= _LOCKED) && > > folio_within_vma()) > > mlock_folio(folio); > !folio_test_large(folio) was kept here by purpose. For normal 4K page, do= n't need > to call folio_within_vma() which is heavy for normal 4K page. I suspected you would think so -- I don't think it would make any measurable (for systems with mostly large folios, it would actually be an extra work). Since we have many places like this once, probably we could wrap folio_test_large() into folio_within_vma() and call it large_folio_within_vma(), if you feel it's necessary. > >> @@ -651,8 +652,12 @@ void munlock_folio(struct folio *folio); > >> static inline void munlock_vma_folio(struct folio *folio, > >> struct vm_area_struct *vma, bool compound) > > > > Remove the compound parameter here too. > > > >> { > >> - if (unlikely(vma->vm_flags & VM_LOCKED) && > >> - (compound || !folio_test_large(folio))) > >> + /* > >> + * To handle the case that a mlocked large folio is unmapped f= rom VMA > >> + * piece by piece, allow munlock the large folio which is part= ially > >> + * mapped to VMA. > >> + */ > >> + if (unlikely(vma->vm_flags & VM_LOCKED)) > >> munlock_folio(folio); > >> } > >> > >> diff --git a/mm/rmap.c b/mm/rmap.c > >> index 2668f5ea35342..455f415d8d9ca 100644 > >> --- a/mm/rmap.c > >> +++ b/mm/rmap.c > >> @@ -803,6 +803,14 @@ struct folio_referenced_arg { > >> unsigned long vm_flags; > >> struct mem_cgroup *memcg; > >> }; > >> + > >> +static inline bool should_restore_mlock(struct folio *folio, > >> + struct vm_area_struct *vma, bool pmd_mapped) > >> +{ > >> + return !folio_test_large(folio) || > >> + pmd_mapped || folio_within_vma(folio, vma); > >> +} > > > > This is just folio_within_vma() :) > > > >> /* > >> * arg: folio_referenced_arg will be passed > >> */ > >> @@ -816,13 +824,25 @@ static bool folio_referenced_one(struct folio *f= olio, > >> while (page_vma_mapped_walk(&pvmw)) { > >> address =3D pvmw.address; > >> > >> - if ((vma->vm_flags & VM_LOCKED) && > >> - (!folio_test_large(folio) || !pvmw.pte)) { > >> - /* Restore the mlock which got missed */ > >> - mlock_vma_folio(folio, vma, !pvmw.pte); > >> - page_vma_mapped_walk_done(&pvmw); > >> - pra->vm_flags |=3D VM_LOCKED; > >> - return false; /* To break the loop */ > >> + if (vma->vm_flags & VM_LOCKED) { > >> + if (should_restore_mlock(folio, vma, !pvmw.pte= )) { > >> + /* Restore the mlock which got missed = */ > >> + mlock_vma_folio(folio, vma, !pvmw.pte)= ; > >> + page_vma_mapped_walk_done(&pvmw); > >> + pra->vm_flags |=3D VM_LOCKED; > >> + return false; /* To break the loop */ > >> + } else { > > > > There is no need for "else", or just > > > > if (!folio_within_vma()) > > goto dec_pra_mapcount; > I tried not to use goto as much as possible. I suppose you mean: > > if (!should_restore_lock()) > goto dec_pra_mapcount; (I may use continue here. :)). should_restore_lock() is just folio_within_vma() -- see the comment above. "continue" looks good to me too (prefer not to add more indents to the functions below). > mlock_vma_folio(); > page_vma_mapped_walk_done() > ... > > Right? Right.