From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8882C4332F for ; Tue, 31 Oct 2023 18:29:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 377596B02B8; Tue, 31 Oct 2023 14:29:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 328846B02B9; Tue, 31 Oct 2023 14:29:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EF116B02BA; Tue, 31 Oct 2023 14:29:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0DE8D6B02B8 for ; Tue, 31 Oct 2023 14:29:49 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DB0BA160ABA for ; Tue, 31 Oct 2023 18:29:48 +0000 (UTC) X-FDA: 81406595256.12.5596B68 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf13.hostedemail.com (Postfix) with ESMTP id 000E320025 for ; Tue, 31 Oct 2023 18:29:46 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MIcAlO5z; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698776987; a=rsa-sha256; cv=none; b=lkX9APJ227eFEeLZAHbjaszHV7eq51MsP9PZ6VIUKqVa6TzviiyUpPbirftpxaUYG6WDC2 PTk2am41trIwwZdpP2tSHxz3K+FhyTk4ckwqrxOG1jb0IIh1/0JJME6KK9bBkzv0MMgakF thq2OGn9XtNL/QGAMAr2XfuamlY9ycY= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MIcAlO5z; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698776987; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1PKVfy4ZthRoQBgmQEAFsfsI4stxqvYuDQMfSgfvCZc=; b=DD8t8v/ZKdlFhsAhNoy0X57OjV9AumgFbIxMrWC3VQzQLf3/XqZKPjFxphlESS87IvH+av EXhw8GPRKnacUTBx71A7gcSDTu2AdBkYOX1ND1lgrvdUoV5fthN6MDNVBLoCyfwQbVJFb5 CFFwLZT5y7BnAJt+rlm/73yO3pk+xrE= Received: by mail-pj1-f41.google.com with SMTP id 98e67ed59e1d1-2809414efa9so1389829a91.1 for ; Tue, 31 Oct 2023 11:29:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698776986; x=1699381786; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1PKVfy4ZthRoQBgmQEAFsfsI4stxqvYuDQMfSgfvCZc=; b=MIcAlO5zzmEKUsqyjjMFI7hsNs0LqysAxSexaaZjR7pufxLthHWXTlqeKqVkg2MW15 p0Y6b2hsXrhkQ/MfDWVQqVy1OgsEn6YU5M+MvrmWjrSyRtczTENki6OsKpcKMx7jl6Lb Rk60EUaR9bq/LsslY7HLJyO6KcvKCX75DXZdhnUJY1Q8ZgN+hWGlhs2dVwxsTOn7qPva 7XpW/Dj6OocdHeUMVaQrau+pgxrhUjpHsO8aOXc99ajKLO6vOxEpjxkAOCy781UNEsLN fp1kVFp3+Qx1V5L4IBlITgBJiU9PLq3y9t6T9PaBIbo7Wt0hiQgh10cljlofKMMd7oVU 0qyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698776986; x=1699381786; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1PKVfy4ZthRoQBgmQEAFsfsI4stxqvYuDQMfSgfvCZc=; b=TpozdnVrqq1Tl3bAjHcR9RDQOKxysr2WB2ew8sFTUuYukU1Uf6RQMHvcsRxnsr7nUt cN25CTuJ7E1llMlrvSkrVsIiGwUPGYdh3m3vqZu+iC2LGd6q6pYPAjuFn7+44MAnGvg1 57X19a4Urb3AqnFplqMbgVYpxh/iSeA1/VFiBFiildPpvlV2/iBboVFsXhKK8HzRBVZU ab38oR/ytCG8zAKs8XkNYFtin3+iQFeQhlTaNq4qW1S1Og8ccTqxiwcShBP4M4K6u2zV tfcN7XMH/qpJZuuhWOrLpUHGTe7Pt8mtUsHkfjPAVhZ1le77zBOgLXGoy/HzGiy5H6nx 8i7A== X-Gm-Message-State: AOJu0YyWcg8JYjErT0z3VbMFfPPXLx5AaiEsKvJKIou99bQ9d2M25u+J d0BbMhmAygSkdbxLqI3JEIUXljG3eiiqhNLrTkg= X-Google-Smtp-Source: AGHT+IHqLkIwkQP219z9PHoWu6HjZxKQ8OlMO5fel3Q1LczSmnwlmvCqPu8jPp+hILckOmvy3NUjIiPLhcSZrepKF64= X-Received: by 2002:a17:90a:1a17:b0:280:2c55:77c5 with SMTP id 23-20020a17090a1a1700b002802c5577c5mr6842416pjk.46.1698776985860; Tue, 31 Oct 2023 11:29:45 -0700 (PDT) MIME-Version: 1.0 References: <20230929114421.3761121-1-ryan.roberts@arm.com> <6d89fdc9-ef55-d44e-bf12-fafff318aef8@redhat.com> <7a3a2d49-528d-4297-ae19-56aa9e6c59c6@arm.com> In-Reply-To: From: Yang Shi Date: Tue, 31 Oct 2023 11:29:34 -0700 Message-ID: Subject: Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory To: Ryan Roberts Cc: David Hildenbrand , Andrew Morton , Matthew Wilcox , Yin Fengwei , Yu Zhao , Catalin Marinas , Anshuman Khandual , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 000E320025 X-Stat-Signature: nnyo7kk3kmzup7oencufxo5q9ehjd9bz X-HE-Tag: 1698776986-545524 X-HE-Meta: U2FsdGVkX1853k03lZQ67+ArgT+/aObRrstqrOUM6du9vpzLsVM5tujnteSiOGdlQfVeqaENeuwqdPMn1JyFMMpks2H9yoZ7C6hHYTb993YNRhDO3GjCg9BFGFERObbCH4fW5H3FLMAMFNswkLR6e41r0hHMY6TrwJzxtgxKXlhmE/uqG+dSfUHRbAChg8JKMYBgZ5KmjW8pqbZFjKBbH/OKhIwG5jkmXAnfk950+Hsh6R3I+MkOMSzQ8H2YKYw39q4Ux/LZePXJW0wUr0WrBF8mRYlfyl8sUhLfvgraggm07JkzlnNVV6PDiAH3c6LZJa6MAUd+CtBskmIPOQbq70hX60dEutju394dj3XkvtEtn9+qPL8NNGUVHlZ2k/Lw1tvnrU37kg6oPtWURcJvLw9CUP7LoDO1oyp25jqNIlAlfKY2nSVIvnc4nxS9MC/eJdJQ+A7HRNOlwJQko4VYlEidm3s919abySQ91LxuVLOuO+nCDp0OeRpp109U7EtJDsaB1sU/gkVllt+/GXiy9REnPyvfRZnJ0X5ZvhqRXMlZlRMHXr5kp3LdWET9S/f3h9kpy73i+TDJh2NddqKfxBZi76jK8Ebu97qZsG1/xGbGbLkiZzHrHt0YnCKWsN5XFr0wIFC9FQBhEfeNgyrA+sFqjPeHbX5o6rbg9E7mvneMmEpf6xN7JHYtqp42OWHgQ+mZROm5RA4bVSpxKHh5J3TNX8/3kQ1av7Y9JjoUgz18WFxj3FOOy0K3FezoGfKIRH1qk0r86K8QC+tEubPLQ9NoO0XgKODS1ZX11SWMGhZcMCMGaOvr4xjCYwvtofNpqgqLJSPH0k1UzXN3pAo1hhS+z1noHUFuJvyEuDo2aBI+8+eXYcNJtsOdAsK/fgYNM+ZQxp7LvV7DJjFgixBw6DwB2NTTLqmq6FE4PQ6wDeSeuRdutuMdeGMV6wFBsUFqM1G6J4WdeeZ3oYbajBU 2aA+0mWg Ff9EODijnVUB7eNYnIy8XdFrwklvjB9WTMg02KOl4J3s6tPpO+yVbrOsEvxY+jl5pKPgaJSeo62V0Ys4fQlDk4IX70h9F1lab/2wzsHGk/x4n6blJIcpgLLekX+7p6d89/D0cngADQgc1s2O4V0enZNP0yAyYsPmhUlG6L2c2wVK1lL3sc8HojdbS/IDIHgxyWnS6xXnTrZUzuODsMXkJTI9eNauPVQa1a715v76P4m+LDyk1N9I6lnpz4WPz4wzH6G8m4L+UT+khk/SBT4v6bUSOoFtDKZBZnIhIeKxOe3TcSUDd53/gZHGzWb4hD3YYFv9X8ATfmyaC7wx/hTk9eVRwakt2lB2bTzYtBhWDvqXJLs/rKIzhtGpAeHHS4Ye+4alkrLoNXprC/EW3PoMUsyRSHUFh35PDodiXHFmlZ+l6oFcLp5o0A/BKVpwFuDa2TlYMEcX6x1m1VFk045fLV0g36MN7OtF8cwPePgcABbpAKYk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Oct 31, 2023 at 4:55=E2=80=AFAM Ryan Roberts = wrote: > > On 31/10/2023 11:50, Ryan Roberts wrote: > > On 06/10/2023 21:06, David Hildenbrand wrote: > > [...] > >> > >> Change 2: sysfs interface. > >> > >> If we call it THP, it shall go under "/sys/kernel/mm/transparent_hugep= age/", I > >> agree. > >> > >> What we expose there and how, is TBD. Again, not a friend of "orders" = and > >> bitmaps at all. We can do better if we want to go down that path. > >> > >> Maybe we should take a look at hugetlb, and how they added support for= multiple > >> sizes. What *might* make sense could be (depending on which values we = actually > >> support!) > >> > >> > >> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/ > >> /sys/kernel/mm/transparent_hugepage/hugepages-128kB/ > >> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/ > >> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/ > >> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/ > >> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/ > >> > >> Each one would contain an "enabled" and "defrag" file. We want somethi= ng minimal > >> first? Start with the "enabled" option. > >> > >> > >> enabled: always [global] madvise never > >> > >> Initially, we would set it for PMD-sized THP to "global" and for every= thing else > >> to "never". > > > > Hi David, > > > > I've just started coding this, and it occurs to me that I might need a = small > > clarification here; the existing global "enabled" control is used to dr= ive > > decisions for both anonymous memory and (non-shmem) file-backed memory.= But the > > proposed new per-size "enabled" is implicitly only controlling anon mem= ory (for > > now). > > > > 1) Is this potentially confusing for the user? Should we rename the per= -size > > controls to "anon_enabled"? Or is it preferable to jsut keep it vague f= or now so > > we can reuse the same control for file-backed memory in future? > > > > 2) The global control will continue to drive the file-backed memory dec= ision > > (for now), even when hugepages-2048kB/enabled !=3D "global"; agreed? > > > > Thanks, > > Ryan > > > > Also, an implementation question: > > hugepage_vma_check() doesn't currently care whether enabled=3D"never" for= DAX VMAs > (although it does honour MADV_NOHUGEPAGE and the prctl); It will return t= rue > regardless. Is that by design? It couldn't fathom any reasoning from the = commit log: The enabled=3D"never" is for anonymous VMAs, DAX VMAs are typically file VM= As. > > bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flag= s, > bool smaps, bool in_pf, bool enforce_sysfs) > { > if (!vma->vm_mm) /* vdso */ > return false; > > /* > * Explicitly disabled through madvise or prctl, or some > * architectures may disable THP for some mappings, for > * example, s390 kvm. > * */ > if ((vm_flags & VM_NOHUGEPAGE) || > test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) > return false; > /* > * If the hardware/firmware marked hugepage support disabled. > */ > if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUP= PORTED)) > return false; > > /* khugepaged doesn't collapse DAX vma, but page fault is fine. *= / > if (vma_is_dax(vma)) > return in_pf; <<<<<<<< > > ... > } > >