From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F9C6C77B75 for ; Tue, 18 Apr 2023 05:30:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F24B8900002; Tue, 18 Apr 2023 01:30:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED3808E0002; Tue, 18 Apr 2023 01:30:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D4D64900002; Tue, 18 Apr 2023 01:30:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BF0218E0002 for ; Tue, 18 Apr 2023 01:30:05 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 8A8DB80139 for ; Tue, 18 Apr 2023 05:30:05 +0000 (UTC) X-FDA: 80693385570.22.F60737B Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) by imf27.hostedemail.com (Postfix) with ESMTP id B6F3E4000E for ; Tue, 18 Apr 2023 05:30:02 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=6DmLIFxk; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of hughd@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681795802; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1+D5UP/13o4cYOkwXpINRdFlXLQ0G5MF9jQgRmeO+Kk=; b=z2xL7Y7RHD2PBN8oLuWXG08jHsGEdMHlIQmbCfqwR4h605aGW/WOC57bWEcaQUonGGmwWl YD7nROuLDsZBadR4MYJttS8UAyKes4qp3WvxDWO7MhyPjbyL49Pz8DdSxTq4rIXro/C1Xx iUcqWfQDQyrLYirNidYHQdaSYloCjis= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=6DmLIFxk; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of hughd@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681795802; a=rsa-sha256; cv=none; b=YLlDOwPXndxO+nZIYn+hL1lZao2YGgBKzOp0nvhIbkKl2/aRgq/dMLsQPF1Vm+64llNp83 VrdzD0HTN4sf/rpnyXJflxv/WzLQTU+39m5PGHSKP3+Ve1+2jUhGzUPkyY0BY0JKPAu8bY 6fkXBgLGBzwR8QzGXkyypI3DDX3tyn8= Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-54fbb713301so201807047b3.11 for ; Mon, 17 Apr 2023 22:30:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681795802; x=1684387802; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=1+D5UP/13o4cYOkwXpINRdFlXLQ0G5MF9jQgRmeO+Kk=; b=6DmLIFxk6N4hyG9u/W0g9Ak1/kgGeuLS8GfsWeDeqYxlHWm2I2Yt0JqhZ29Dbc8odt Z3+7POIK2+3wBVMCHzPOrvLgyGviqLstRb3Ocp11fAC7oUFkHffvZ2NChRib5gXWd1xD CVx4q0PA01/ZG1197T8E/u0p5a5+ocZXEwUcBO/qihB1RjtLCj+ss0X4cTifvOP+gFs3 vA073FrKzICaAw1Uun4CZhivamWpT/fG3kg/rQdFY3m0gHPD8XUNlVQ6xyjdfXWAbDY+ gOQf1QO8R/UKJzEbNn/nPrSyNrCOjaO2iIhKdN16RnNHgRqCDLhMEA+GSmHMV/h1JJ81 XpYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681795802; x=1684387802; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1+D5UP/13o4cYOkwXpINRdFlXLQ0G5MF9jQgRmeO+Kk=; b=icVH4yRpChaz6jmN8V8XfeCcYGTZG1LgfDn7P8um4e5dd02nvPhgZqLILFnW1NVkJp YFitLkFE34R3chkdFB2+reSARsYVRxjwd8tjSwpbsB9ceiuhacsDEbTK2U1JehKa9jre we3BD5cSSERT4/i76Wzb6p4NCgy2HvXxK3TOUcFrZ8JbK/0B5dCSDTN7cp06l3/dJmrr P2NZxG494H16DWBxyVsUAI7T7P0k6eFZ5OnH5fv51rrsyJWmNd6Oz0k0CFm46hP1eUdQ iF+g4/D1L+eFqu9OwmRbHTM5ZUT1GNEQ1anYCvS28ZP8s/xneNzFYJtGmiXp92sWBOXU WpCQ== X-Gm-Message-State: AAQBX9ciOKZJhTYxydCrYzuklmae/cdLJOPdvXpRHUKJTcKYTiczEv3I 2HBzfOSCPKbMeXDvAtMbvSskXA== X-Google-Smtp-Source: AKy350azH/DscIsEsxNzaPwTyJ8Cxf7IWzTPkRANWFvihJI3Fq3OcrnvJZYiCP0nF5gRkYYr+Kt1Ww== X-Received: by 2002:a0d:d882:0:b0:552:1182:47dc with SMTP id a124-20020a0dd882000000b00552118247dcmr13161175ywe.6.1681795801636; Mon, 17 Apr 2023 22:30:01 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id db15-20020a05690c0dcf00b00545a08184b8sm3587038ywb.72.2023.04.17.22.30.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Apr 2023 22:30:01 -0700 (PDT) Date: Mon, 17 Apr 2023 22:29:59 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Luis Chamberlain cc: hughd@google.com, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org, linux-mm@kvack.org, p.raghav@samsung.com, da.gomez@samsung.com, a.manzanares@samsung.com, dave@stgolabs.net, yosryahmed@google.com, keescook@chromium.org, patches@lists.linux.dev, linux-kernel@vger.kernel.org, David Hildenbrand Subject: Re: [PATCH v2 5/6] shmem: update documentation In-Reply-To: <20230309230545.2930737-6-mcgrof@kernel.org> Message-ID: References: <20230309230545.2930737-1-mcgrof@kernel.org> <20230309230545.2930737-6-mcgrof@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: B6F3E4000E X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: mpg88qwa77o6naz5wdabip3d9g1igkb6 X-HE-Tag: 1681795802-744999 X-HE-Meta: U2FsdGVkX19vbFTDIhQp9MGHcA1wDmf6o9CBtl92QuYbCpu+n4I297ScAylEOUJ9Jtp1RVxu6kH3kJQ0cYLSvf8o+bwgyad3Ftt10ofTANJgJ+4QH0ph/S574b0WIeor7DAUEQO0Q8hwRJ6JzH1v7XgV7IlmSqSDegMUoE6sibYOZm3HCbt727YqKxwv910ZUIDMEq8Z8lcy7COcVE6PwDUmjwn0M3sDyh1LVulB9a5eCahqbvsXMnz3JW2ZQ4IHNCb938TXchHglfQ+DATb/5xNYLq2rVcQ17oSk2FqokG8vdvw+vYTAYkmvQ4EZmoArOX0tPqAEc3z1znR1LfilFcKrjS5aPHFbSLqPGUVv9o15ZIfsjliVoxEKOlGTzC0i2UBmpOVtpMovWGpPxtqgmqkgUYbPMeWbOAjNCkqWkckHjG3ogWWS6KN1qHChgwJHeFA74B7k4wK+wyrYRIJk7/jwWbbCBmbsUDd24OF23QVcNQuQhoAfBLgyHVMnhdmJB3iosiwKh9hDqfV/onEeby3gz5HETugKElovq05zKoeFAiojyoZ+exOo3wxmn0Pf1QF1/K3BW2k5fsfCH8s2T1+Sk5IyYKbhlqS65vZO8HGGDgdBG+bgfh6ifK0ckQe9Nnev4lItYM2zIxyHnSXwlg04y0mYxvbbqYFyA8pAo4zAzIqD7LRzCLEDvrdaa0Iia3Njtfr+SIQ1Zf390Z7N6GMyqXcNJ3vP7lj3q0wLWTHpp5hI8Eq7zl96TMn/roH9bIuISOizKK2W+JuL0yymwwwrIq6MDCeK+l2DTwcjzlRDQx3vd/n1ojaS1Bstl5ip5jdQKkg3wdipUA1so95t2e/pVq41iijtzJdKA+79LesqJ4z+D13CKC2mx0P8RnSfGrTd5Q4ccjN1CtkSvGy/OMBSrhhwGNq3rfyQczhbHmvYH+vnyRaepHpjUo3/QV7vxMdgnNRQazUTtof3Ab 9D8GNges mHAhSjCuV6Wbw2x4w+lqS10/zi5GoWzqMvjIfZ4w2GHnfEtKZxp6CTJS9tHQUppA+SQlw/3sGOOAIxCWPLPBC4hud/NVGf+KKFyHtbKlBCPCfH+r3TI4GGq3pNNzl22nTADTp3ADm/9tXn0PkvUN5TRbXlQlqmfYvnOrH+98rLdc9qg0my8lJXwi3ndBZaRUuDpylJ3JQFB8yHDOA87zg9JiJgeQmbungJbbFgD+qjQXvYkkVio4aFeTGYjfGObvD07tkAJwKAUdwnIDHs4XdxbdAZHxzGCPjaCcibFiCraEdX45PXFIaE8cJrMH93Bmsg2gMYxdArBfH5vMNCGtf8FQtzIaw8sTIFhpLRWhJ8CAgsOoMD4/TEo/EOlLFgg+1MewNVWlAFpfB//lg6ttziXMjxOvMlAHL5iQe3aclZRlhs64iPeXuoXAUynr+IA++78iLJlO1Dyx0nldeJuooCko9YA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 9 Mar 2023, Luis Chamberlain wrote: > Update the docs to reflect a bit better why some folks prefer tmpfs > over ramfs and clarify a bit more about the difference between brd > ramdisks. > > While at it, add THP docs for tmpfs, both the mount options and the > sysfs file. Okay: the original canonical reference for THP options on tmpfs has been Documentation/admin-guide/mm/transhuge.rst. You're right that they would be helpful here too: IIRC (but I might well be confusing with our Google tree) we used to have them documented in both places, but grew tired of keeping the two in synch. You're volunteering to do so! so please check now that they tell the same story. But nowadays, "man 5 tmpfs" is much more important (and that might give you a hint for what needs to be done after this series goes into 6.4-rc - and I wonder if there are tmpfs manpage updates needed from Christian for idmapped too? or already taken care of?). There's a little detail we do need you to remove, indicated below. > > Reviewed-by: Christian Brauner > Reviewed-by: David Hildenbrand > Signed-off-by: Luis Chamberlain > --- > Documentation/filesystems/tmpfs.rst | 57 +++++++++++++++++++++++++---- > 1 file changed, 49 insertions(+), 8 deletions(-) > > diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst > index 0408c245785e..1ec9a9f8196b 100644 > --- a/Documentation/filesystems/tmpfs.rst > +++ b/Documentation/filesystems/tmpfs.rst > @@ -13,14 +13,25 @@ everything stored therein is lost. > > tmpfs puts everything into the kernel internal caches and grows and > shrinks to accommodate the files it contains and is able to swap > -unneeded pages out to swap space. It has maximum size limits which can > -be adjusted on the fly via 'mount -o remount ...' > - > -If you compare it to ramfs (which was the template to create tmpfs) > -you gain swapping and limit checking. Another similar thing is the RAM > -disk (/dev/ram*), which simulates a fixed size hard disk in physical > -RAM, where you have to create an ordinary filesystem on top. Ramdisks > -cannot swap and you do not have the possibility to resize them. > +unneeded pages out to swap space, and supports THP. > + > +tmpfs extends ramfs with a few userspace configurable options listed and > +explained further below, some of which can be reconfigured dynamically on the > +fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs > +filesystem can be resized but it cannot be resized to a size below its current > +usage. tmpfs also supports POSIX ACLs, and extended attributes for the > +trusted.* and security.* namespaces. ramfs does not use swap and you cannot > +modify any parameter for a ramfs filesystem. The size limit of a ramfs > +filesystem is how much memory you have available, and so care must be taken if > +used so to not run out of memory. > + > +An alternative to tmpfs and ramfs is to use brd to create RAM disks > +(/dev/ram*), which allows you to simulate a block device disk in physical RAM. > +To write data you would just then need to create an regular filesystem on top > +this ramdisk. As with ramfs, brd ramdisks cannot swap. brd ramdisks are also > +configured in size at initialization and you cannot dynamically resize them. > +Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the > +block layer at all. > > Since tmpfs lives completely in the page cache and on swap, all tmpfs > pages will be shown as "Shmem" in /proc/meminfo and "Shared" in > @@ -85,6 +96,36 @@ mount with such options, since it allows any user with write access to > use up all the memory on the machine; but enhances the scalability of > that instance in a system with many CPUs making intensive use of it. > > +tmpfs also supports Transparent Huge Pages which requires a kernel > +configured with CONFIG_TRANSPARENT_HUGEPAGE and with huge supported for > +your system (has_transparent_hugepage(), which is architecture specific). > +The mount options for this are: > + > +====== ============================================================ > +huge=0 never: disables huge pages for the mount > +huge=1 always: enables huge pages for the mount > +huge=2 within_size: only allocate huge pages if the page will be > + fully within i_size, also respect fadvise()/madvise() hints. > +huge=3 advise: only allocate huge pages if requested with > + fadvise()/madvise() You're taking the source too literally there. Minor point is that there is no fadvise() for this, to date anyway. Major point is: have you tried mounting tmpfs with huge=0 etc? I did propose "huge=0" and "huge=1" years ago, but those "never" went in, it's "always" been the named options. Please remove those misleading numbers, it's "huge=never" etc. (Old Google internal trees excepted: and trying to wean people off "huge=1" internally makes me a bit touchy when seeing those numbers above!) > +====== ============================================================ > + > +There is a sysfs file which you can also use to control system wide THP > +configuration for all tmpfs mounts, the file is: > + > +/sys/kernel/mm/transparent_hugepage/shmem_enabled > + > +This sysfs file is placed on top of THP sysfs directory and so is registered > +by THP code. It is however only used to control all tmpfs mounts with one > +single knob. Since it controls all tmpfs mounts it should only be used either > +for emergency or testing purposes. The values you can set for shmem_enabled are: > + > +== ============================================================ > +-1 deny: disables huge on shm_mnt and all mounts, for > + emergency use > +-2 force: enables huge on shm_mnt and all mounts, w/o needing > + option, for testing Likewise here, please delete the invalid "-1" and "-2" notations, -1 and -2 are just #defines for use in the kernel source. And the description above is not quite accurate: it is very hard to describe shmem_enabled, partly because it combines two different things. It's partly the "huge=" mount option for any "internal mount", those things like SysV SHM and memfd and i915 and shared-anonymous: the shmem which has no user-visible mount to hold the option. But also these "deny" and "force" overrides affecting *all* internal and visible mounts. Hugh > +== ============================================================ > > tmpfs has a mount option to set the NUMA memory allocation policy for > all files in that instance (if CONFIG_NUMA is enabled) - which can be > -- > 2.39.1