From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78F73EB64DA for ; Sat, 8 Jul 2023 04:36:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B0D16B0071; Sat, 8 Jul 2023 00:36:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 85F636B0072; Sat, 8 Jul 2023 00:36:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 726818D0001; Sat, 8 Jul 2023 00:36:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 635716B0071 for ; Sat, 8 Jul 2023 00:36:23 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 333EF12019C for ; Sat, 8 Jul 2023 04:36:23 +0000 (UTC) X-FDA: 80987183046.04.3A3C0D4 Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by imf14.hostedemail.com (Postfix) with ESMTP id 686FE10000B for ; Sat, 8 Jul 2023 04:36:21 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ADHRhJhT; spf=pass (imf14.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688790981; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7bMoIuIpswGrRuq9dBxbRgEGX82Nkz8Bwpi7t8SdLYA=; b=epC+Jujr8lDSchbsgIrCQNPAzWGcFnue3d8zdVcHDTzPXSZZL4QTLK0eMnnJLQWpJWMrGX B2VKXFdxSO8pYpQYxvCIaPA6eNic2W+qa4q+GQ7hTTMkHJHWTlYrgj1kC6YTTIIjM4tecf a/U0OCoGbxa1Ydpzs2EWq/ZDFEinI2Y= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ADHRhJhT; spf=pass (imf14.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688790981; a=rsa-sha256; cv=none; b=KgCnJmT3MU2US6x3h4S56Ffk3JYIKNaUACjBwRR2DkUSWDCffpZkpCqvdqtEpXztCqrT42 u6g5lUyKRHwxsvUZNUZ78ePEDO6unk7tk+baLTNOgOZkJz4Do0cGAW93VLevn+JvEG2fcm xiKobKZj+CBZ2MxXYaT07qr9taokEDU= Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-401d1d967beso78881cf.0 for ; Fri, 07 Jul 2023 21:36:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688790980; x=1691382980; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7bMoIuIpswGrRuq9dBxbRgEGX82Nkz8Bwpi7t8SdLYA=; b=ADHRhJhT/5vEK4kj7EBAm2pdLgCCQJt9RXfUPplyGkER7We1CEm7aPlvzskKXFgHxF ys12yXkliKti5QLx1icUyYoh6jD8GBpxUXqRsfFVCKX53r5TUgmxDTzPltBh1KC33/y1 Wv9I0YBvPB5QFmsQD7TuqY2Qg9YYg1O1QS5eW/x4me32FMaKdo6oIh6MGxBFEnd1gqj3 ZnpBMMW3LmE0NTjfJRtVgP43ZRCve7J1QfuWej3v8xr3Odv7an5NHlanlYoVS4LSTuy1 UFXqEDhVJQCBXkmfzBUrLnWv5zY8AKoAoMrgEOMF0GWUhB0DnvQIfVyNaLsVJt528BA3 ClSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688790980; x=1691382980; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7bMoIuIpswGrRuq9dBxbRgEGX82Nkz8Bwpi7t8SdLYA=; b=AspDRsB+46ZZwPVKkyxfAnT6uYN0FmLcNkubb4Ss4xnsxvwGwl98/KZyluz8DSFHUL VXHDLAv61qB5iQKbtbeevWZlD13dg2o9HY97jNyutKmq8d02RMdaUt1lgEhtWfxmSzLA LaFJZWE5THVY5iYvbIS6GnqY8Wj0LD7wqph9CSHTLrKnCzOUQxE8xPItafHk8ykkU88I qZjP4pk+J6uGDEPvQoBlnDdIyjHnpU75u50SNo9mxa0eOk8hWSqtbFiGsURhKMMfjfi6 +l4tUm9bQOpGg8qs6597BJ1lZGJhSMbaHDtEzrocUtJ9TCmNSwjqVKwVAtHBLAYcH/mX Ubuw== X-Gm-Message-State: ABy/qLZPMmN47+RwNRk6r+Ca1kKrAAe1gq4vb47pH22FIVrVhWjtPaDq nKY5Bw9bNF6w9KUDUvmOOizy0aNaQcO1P0VYSPXTbA== X-Google-Smtp-Source: APBJJlFkxZr40QQ5uWEwLfm7Ew7kGZtfa9dIG55On+zG+yntGsOn1b2pvZB2Q2EBC8R9g0ygHo4sxW/kA6b2No9M8c8= X-Received: by 2002:ac8:5dcf:0:b0:3f0:af20:1a37 with SMTP id e15-20020ac85dcf000000b003f0af201a37mr43542qtx.15.1688790980296; Fri, 07 Jul 2023 21:36:20 -0700 (PDT) MIME-Version: 1.0 References: <20230707165221.4076590-1-fengwei.yin@intel.com> <4bb39d6e-a324-0d85-7d44-8e8a37a1cfec@redhat.com> <436cd29f-44a6-7636-5015-377051942137@intel.com> In-Reply-To: From: Yu Zhao Date: Fri, 7 Jul 2023 22:35:43 -0600 Message-ID: Subject: Re: [RFC PATCH 0/3] support large folio for mlock To: Matthew Wilcox Cc: "Yin, Fengwei" , David Hildenbrand , linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, shy828301@gmail.com, akpm@linux-foundation.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 686FE10000B X-Rspam-User: X-Stat-Signature: tgq18uhrxqx6kztj655aq1ssiihhuapd X-Rspamd-Server: rspam01 X-HE-Tag: 1688790981-901795 X-HE-Meta: U2FsdGVkX1/hfCQXYRLL9EcWG8uKj1uqGeNb+ZcgsihieJqkQvp+RqmSamit5gcMMoH+oqJwEPnDZ0YrJV3RQL2CTqZkLaPBsWdiyZVm4C6XE5tfFiairVebeUPzmnnt0KiJ+2dUESdDUXqjz28DzsVD1WonJqQdaRKqHIWZRe4khGP8dcwSjWJO+HFYIs9A4I3N33sGj86iDd1ecim25v3qgU0kCxPTfzNI0mRJjqtNCWf0rlB48GUZ2CXNwnnZYhY+oEQDv5Q/cDSaqbndplFoS173xX1peBtiGx4VweVkz7lPuHrPUtiP1sfUt5nJaQIct/oe03vQKzDSjoqP7iks86412LCdQYWCLTAoS/ngb/4kSd7ZIpydNcukWqfyGSHcfB8A7ENRce81gVMvZNQ+uoNaaDFQKetnKv8otbW9yWzlCP3Lnoy/HKDD6PkJeRcRYcQ96RDCBbJs0OmkCc02GwwSQTs5cvdbOFRJ4ggMA8GTthfjb5xBk8cq3ABYbUQxXChz63vkJRDyJDjwAxXAm4GziZHu1OiZqGc4aw3O/UzR2Eh0yBw2PkyJYmVa2JBN/jIiZRHOAqEGJtkTteIGjJ9pMXHbAGjLNYlzGnRaKeN/6T6olXFfr99nSqGj5Ar9vHxzRuEbfudktE5asmfaPVvgGaerB1vNW/7S6ems4Ol0O0s5ipyVZO5kacK5u6+gf0O7SBEBh5aeFfc0jAgx99n8ypHAW+95qDnMScUGrdg3HLJQ7FQmaSez56F9/9/mulZBd02/Q5bvDCRLHF8ANVkHsv5VNl7nR3FpXdiyrQsBU2XekWQ2A4Nu9lTmcI6712uNv3z1vC07+5z6HCurf3GQkStiqaYEJcTo6LKOwqqlsCjnsBG9RDBv65qx0Fk0m5M2Let4lFCPnpKozDpsHWq9jHi14qazKn35LxHq8EwXWFj8l5ekPgZnpQz0ARzIg8IwGlYG33Iznlr LqlOqnhl 5IkdFaIBDO1p4jvEJrFAKd3eWjJwjofsvEs+IFKrEvu81gRo1dUwHnAc8uPBaMsX0l562GNI4rxgoNOtH8KKJORJoy4gCbyBral8wcnjMKbLClQiaLSmIPKmORmr9jRPBffEwJVtDmD0Xrtt87UIpeJon96TYJFY4+vcOi0F7xmgGkLqgP1cu9PFIK41YmGSPZyo8LdLjrRV75eHNt8QzHiybgrP6pWLIk6M1TSOP3ZcBD4XmaCQHfGTjmocStSUHEiOSOwn2MvXMrr8YWhDMRd2+ochOKCOkpnVF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jul 7, 2023 at 10:02=E2=80=AFPM Matthew Wilcox wrote: > > On Sat, Jul 08, 2023 at 11:52:23AM +0800, Yin, Fengwei wrote: > > > Oh, I agree, there are always going to be circumstances where we real= ise > > > we've made a bad decision and can't (easily) undo it. Unless we have= a > > > per-page pincount, and I Would Rather Not Do That. But we should _tr= y_ > > > to do that because it's the right model -- that's what I meant by "Te= ll > > > me why I'm wrong"; what scenarios do we have where a user temporarill= y > > > mlocks (or mprotects or ...) a range of memory, but wants that memory > > > to be aged in the LRU exactly the same way as the adjacent memory tha= t > > > wasn't mprotected? > > for manpage of mlock(): > > mlock(), mlock2(), and mlockall() lock part or all of the calli= ng process's virtual address space into RAM, preventing that memory > > from being paged to the swap area. > > > > So my understanding is it's OK to let the memory mlocked to be aged wit= h > > the adjacent memory which is not mlocked. Just make sure they are not > > paged out to swap. > > Right, it doesn't break anything; it's just a similar problem to > internal fragmentation. The pages of the folio which aren't mlocked > will also be locked in RAM and never paged out. I don't think this is the case: since partially locking a non-pmd-mappable large folio is a nop, it remains on one of the evictable LRUs. The rmap walk by folio_referenced() should already be able to find the VMA and the PTEs mapping the unlocked portion. So the page reclaim should be able to correctly age the unlocked portion even though the folio contains a locked portion too. And when it tries to reclaim the entire folio, it first tries to split it into a list of base folios in shrink_folio_list(), and if that succeeds, it walks the rmap of each base folio on that list to unmap (not age). Unmapping doesn't have TTU_IGNORE_MLOCK, so it should correctly call mlock_vma_folio() on the locked base folios and bail out. And finally those locked base folios are put back to the unevictable list. > > One question for implementation detail: > > If the large folio cross VMA boundary can not be split, how do we > > deal with this case? Retry in syscall till it's split successfully? > > Or return error (and what ERRORS should we choose) to user space? > > I would be tempted to allocate memory & copy to the new mlocked VMA. > The old folio will go on the deferred_list and be split later, or its > valid parts will be written to swap and then it can be freed.