From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 815FAC4332F for ; Wed, 1 Nov 2023 18:11:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF583900015; Wed, 1 Nov 2023 14:11:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EA5B490000D; Wed, 1 Nov 2023 14:11:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D6D26900015; Wed, 1 Nov 2023 14:11:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C78D990000D for ; Wed, 1 Nov 2023 14:11:40 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9D318803A7 for ; Wed, 1 Nov 2023 18:11:40 +0000 (UTC) X-FDA: 81410178360.12.D43D077 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf14.hostedemail.com (Postfix) with ESMTP id B3EBF100012 for ; Wed, 1 Nov 2023 18:11:37 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=igC3cv5j; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698862297; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KFXG9E/24dOz3uE7iAGtCMg93Y3K/HY/Z5kpbtFMk+E=; b=PoUAmh3XgbYeauL72436+n6e+4Y3ffW7/228pdz2xeylzQtHej80oXfrCXOmi0nUs+Mhg8 CzatYg78AzUnc0PqgxGKQESuZzszQvpiqyzIJOl75Q951P17lF9USX+eyDkxqlRyRtWsSP xOISZdkvASW9oeZPFn1ub2i6a0vfTp0= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=igC3cv5j; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698862297; a=rsa-sha256; cv=none; b=xj9XLPqLoPuMfA5JHdduHCJ3/if64harFO9+353yyaed9Hb509Tw9kopUXucJRzEp2o3lo 8/XY/76ZK9d5DCxFdE+GTWCnb33cwQ6R7RfSwb/Fl+D2B8QwVGIBD9xyUXkSf+AJDAEDZU bTsPdsLjseEBGtiEuxC11pnNBFui574= Received: by mail-pj1-f42.google.com with SMTP id 98e67ed59e1d1-27ddc1b1652so116005a91.2 for ; Wed, 01 Nov 2023 11:11:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698862296; x=1699467096; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KFXG9E/24dOz3uE7iAGtCMg93Y3K/HY/Z5kpbtFMk+E=; b=igC3cv5jKqcSPxxdOrMx3b7Z7EmOMQEMLHDMUo/j8V1MfAbRqSl0S6C/DRA/HgHBt8 1+j0qyDyclznxP2ejm8+B/Eln+pGHDYMkIFtMK/LBTXrF7PRh/cFur+IcqCrQjcAMEeK PWoWE4r9T5b1tS1byxDpIAWtPg1JRUc8rQtkizAQWo5vWIEMx+ieg6hFaPyfEIJgbklU 0JOqTveQ/PC0cWnYEFpyx9Yxk3g1jxb8IC0z/ku7+qn/WW/8Vir65QHj15vlHZg4sSIb Dqi7cc5zIqeXli5Nrefty8qeJkw7h778RGyLLmcdutEYmz8qAS7GLWC7fUQnmVyVGIcN A0hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698862296; x=1699467096; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KFXG9E/24dOz3uE7iAGtCMg93Y3K/HY/Z5kpbtFMk+E=; b=DSlWttdFdFqiApjbGA4GH+LR7AEA4NOhEsE1m+GDauZHr7YE36JZ88JacYOcugZ00W 3EE5OIpYHGTYFbRw7xSUG/MeSV1z/dItr3hybWiYGc5McHkg0ee0rhaTUuv0rBXI+VO8 wwTs8Cvl5yFdw1ylJB4mrhx9fmoT8xMNejSbt14pigjmPG2+YUT5UGZoeiCeDoSKkDFa 3CNsoNcaVWfncQORDk/GTCXIpnLAfJM81PZ773AhW938wdmR5uR9rFU9+4Cp+XWh1+5r 6viHKSPZ+QfbGQJ2sjKZy//KmNIX8OVxd7dR3o/oVD9sK+mBl+oM2r7MB1E7vHMPCKH8 KC0w== X-Gm-Message-State: AOJu0YxbBMD/TiCo4MpqOp7lB8JUW4EP/h6CwtiPCwjQBrDofDBJjww6 UdJOMzxtEd8sEWgsoWqfBL4MIywSe4y2r6wkSZY= X-Google-Smtp-Source: AGHT+IHq2bUWyykKiAizgFLTKMG68uREfVJLn8kvLbTYyMrdN91Dxa+syscP1tRk/dGZuEQVtYubryieUgC8UL++r+I= X-Received: by 2002:a17:90a:ca13:b0:27f:fce3:2266 with SMTP id x19-20020a17090aca1300b0027ffce32266mr14439023pjt.24.1698862296552; Wed, 01 Nov 2023 11:11:36 -0700 (PDT) MIME-Version: 1.0 References: <20230929114421.3761121-1-ryan.roberts@arm.com> <6d89fdc9-ef55-d44e-bf12-fafff318aef8@redhat.com> <7a3a2d49-528d-4297-ae19-56aa9e6c59c6@arm.com> <148676a4-8267-42de-a3ad-a3734e3f4bd9@arm.com> In-Reply-To: <148676a4-8267-42de-a3ad-a3734e3f4bd9@arm.com> From: Yang Shi Date: Wed, 1 Nov 2023 11:11:24 -0700 Message-ID: Subject: Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory To: Ryan Roberts Cc: David Hildenbrand , Andrew Morton , Matthew Wilcox , Yin Fengwei , Yu Zhao , Catalin Marinas , Anshuman Khandual , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: yw9u7bu5c9iatrsb65fckiys984z11wo X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B3EBF100012 X-HE-Tag: 1698862297-843065 X-HE-Meta: U2FsdGVkX1/CnoK26gryPxWvasLtKMH+9j7Fsq3wkLvBvO2u+mpJClbG0qNu0kKwpZ0yUnn0ML+mWTN2iz8IBXQcd0l4sdED8E/yEQyJnLa8wth88O7Z5ek9Qqqs6t11CzZomDKBZGM+W3w3A7H9wqsaoZ7cdpD0yMmrOOkY55nkYOJRollSmr1c3gkz2GdATvsrw6FW6Eh8Mw78XrR1ikgPTU6DQQ3aUk8rH3d/E5TuErj4N/RCejZIY16hH3tjx0LNbls2Ptny1rqaN0GfqZTqSFe9VXwfhEeuMf3TqwzahoC/QB79V5W+caSd84AiXSZuKZmwWPx2R99MPrPU/L+Qdu44x35wXF3heUv3n5m/jGoWqYMnv2GoosSIeAbF4GDsfZdxFT3aFkLNOdIM7oZPxJhXRzfcl2vULaXeazNaAQLFTmIhSBYZ2Y5w5qYSg2d4H6bKoOweqCp7QsobBMzSnqL1Y8GOepiITVUOgg3vZp6idDyWZv2hIXmo8aaZZbuH10Rzu4fRhODmlXHXwAc5JrQSiamhiDjLMdjl5P3XF9e2zguBrisGLvj2IzCPz72MRLNc25zd1S6cnKsrVWfS/rmSmyfTvyLOdL964n4lBmkF8So3Mzl+Jh29q86+QatygqmCQ/t1RBK8BOPv2Ve6RYOWf+AWCF9AdorOm58YNRIZYjyZQV914jdiZDxa1BIwmNVXNX1N57nKFr+pFrZ0OY9ThPdCkytg9SHTvmS2HqyeNiGEWFjOnIqTzUNB7mlA6VFsT+0TefbhevQNVDM0vR4ekMW8UUYYULZt/8oVdaeWf7GOi4ugEQiaIth58o4oTfgwqgqneGC72GarQ2ACtn8Sy93crZFIFG+wyHEKqqJfyYIpFo+i9Wmu7/wxp43rqVF/rTbuedPeq+HXyTM0AfJBeFM+rA0sGj1iq+d00PpWk2LZPV3Ct7BQqG1H2UvA7kZTDElQU7XoAtm Bm5Glqf+ Oq48m01NVVJha12th/+gDYJqMFju/ba8UKg22gklUF5gZBSCTIpOxL2HGtHl0oIPqgYO4FkhDI/q+7cfgTKgPVHe/bBIqy/FVpmWCrzlSOEJ5VZO5Hulp3rTUjTJTCrhMC8YTDuiB5glH98fi9t3m3CJVLr8K0MBPc0ENukczudg/eAKYRLd/mnUcV+Qw+7WD66D1Jfh1K0PcnRNuSx910BaCKEWDNhUj0AO1DcDH0wW5Rj0WrImOlsfFR/LAJX0zmlfX+S2p1m5xeS9AlnUCrUGkN2Y6kCApDpG8zDTRAa/BXAk8YcERA7zCUAsKaVlt4oRqjJTBeZ3FlQQZPVJq5GhTDnDg8eiN8iKzdWvDY4RdXHErkr7+UP57fWYe+ds4QKnjvk9dZaHg33AVMfxESpmOOrRQpi2+VWOyuILTsh5QpYlSNnW2eRfA/NKcCYKnZIG+ZxlrCrMNE0s/vnhn8xJOjMm7sbZ9uWgObTOtbbB70Mo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 1, 2023 at 7:02=E2=80=AFAM Ryan Roberts = wrote: > > On 31/10/2023 18:29, Yang Shi wrote: > > On Tue, Oct 31, 2023 at 4:55=E2=80=AFAM Ryan Roberts wrote: > >> > >> On 31/10/2023 11:50, Ryan Roberts wrote: > >>> On 06/10/2023 21:06, David Hildenbrand wrote: > >>> [...] > >>>> > >>>> Change 2: sysfs interface. > >>>> > >>>> If we call it THP, it shall go under "/sys/kernel/mm/transparent_hug= epage/", I > >>>> agree. > >>>> > >>>> What we expose there and how, is TBD. Again, not a friend of "orders= " and > >>>> bitmaps at all. We can do better if we want to go down that path. > >>>> > >>>> Maybe we should take a look at hugetlb, and how they added support f= or multiple > >>>> sizes. What *might* make sense could be (depending on which values w= e actually > >>>> support!) > >>>> > >>>> > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-128kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/ > >>>> > >>>> Each one would contain an "enabled" and "defrag" file. We want somet= hing minimal > >>>> first? Start with the "enabled" option. > >>>> > >>>> > >>>> enabled: always [global] madvise never > >>>> > >>>> Initially, we would set it for PMD-sized THP to "global" and for eve= rything else > >>>> to "never". > >>> > >>> Hi David, > >>> > >>> I've just started coding this, and it occurs to me that I might need = a small > >>> clarification here; the existing global "enabled" control is used to = drive > >>> decisions for both anonymous memory and (non-shmem) file-backed memor= y. But the > >>> proposed new per-size "enabled" is implicitly only controlling anon m= emory (for > >>> now). > >>> > >>> 1) Is this potentially confusing for the user? Should we rename the p= er-size > >>> controls to "anon_enabled"? Or is it preferable to jsut keep it vague= for now so > >>> we can reuse the same control for file-backed memory in future? > >>> > >>> 2) The global control will continue to drive the file-backed memory d= ecision > >>> (for now), even when hugepages-2048kB/enabled !=3D "global"; agreed? > >>> > >>> Thanks, > >>> Ryan > >>> > >> > >> Also, an implementation question: > >> > >> hugepage_vma_check() doesn't currently care whether enabled=3D"never" = for DAX VMAs > >> (although it does honour MADV_NOHUGEPAGE and the prctl); It will retur= n true > >> regardless. Is that by design? It couldn't fathom any reasoning from t= he commit log: > > > > The enabled=3D"never" is for anonymous VMAs, DAX VMAs are typically fil= e VMAs. > > That's not quite true; enabled=3D"never" is honoured for non-DAX/non-shme= m file > VMAs (for collapse via CONFIG_READ_ONLY_THP_FOR_FS and more recently for When implementing READ_ONLY_THP_FOR_FS the file THP just can be collapsed by khugepaged, but khugepaged is started iff enabled !=3D "never". So READ_ONLY_THP_FOR_FS has to honor it. Unfortunately there are some confusing exceptions... But anyway DAX is not the same class. > anything that implements huge_fault() - see > 7a81751fcdeb833acc858e59082688e3020bfe12). IIUC this commit just gives the vmas which implement huge_fault() a chance to handle the fault. Currently just DAX vmas implement huge_fault() in vanilla kernel AFAICT. >