Re: [PATCH] tmpfs: use ida to get inode number

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: "zhengbin (A)" <zhengbin13@huawei.com>
Cc: Hugh Dickins <hughd@google.com>,
	viro@zeniv.linux.org.uk, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, houtao1@huawei.com,
	yi.zhang@huawei.com, "J. R. Okajima" <hooanon05g@gmail.com>
Subject: Re: [PATCH] tmpfs: use ida to get inode number
Date: Fri, 22 Nov 2019 14:13:27 -0800	[thread overview]
Message-ID: <20191122221327.GW20752@bombadil.infradead.org> (raw)
In-Reply-To: <5423a199-eefb-0a02-6e86-1f6210939c11@huawei.com>

On Fri, Nov 22, 2019 at 09:23:30AM +0800, zhengbin (A) wrote:
> On 2019/11/22 3:53, Hugh Dickins wrote:
> > On Thu, 21 Nov 2019, zhengbin (A) wrote:
> >> On 2019/11/21 12:52, Hugh Dickins wrote:
> >>> Just a rushed FYI without looking at your patch or comments.
> >>>
> >>> Internally (in Google) we do rely on good tmpfs inode numbers more
> >>> than on those of other get_next_ino() filesystems, and carry a patch
> >>> to mm/shmem.c for it to use 64-bit inode numbers (and separate inode
> >>> number space for each superblock) - essentially,
> >>>
> >>> 	ino = sbinfo->next_ino++;
> >>> 	/* Avoid 0 in the low 32 bits: might appear deleted */
> >>> 	if (unlikely((unsigned int)ino == 0))
> >>> 		ino = sbinfo->next_ino++;
> >>>
> >>> Which I think would be faster, and need less memory, than IDA.
> >>> But whether that is of general interest, or of interest to you,
> >>> depends upon how prevalent 32-bit executables built without
> >>> __FILE_OFFSET_BITS=64 still are these days.
> >> So how google think about this? inode number > 32-bit, but 32-bit executables
> >> cat not handle this?
> > Google is free to limit what executables are run on its machines,
> > and how they are built, so little problem here.
> >
> > A general-purpose 32-bit Linux distribution does not have that freedom,
> > does not want to limit what the user runs.  But I thought that by now
> > they (and all serious users of 32-bit systems) were building their own
> > executables with _FILE_OFFSET_BITS=64 (I was too generous with the
> > underscores yesterday); and I thought that defined __USE_FILE_OFFSET64,
> > and that typedef'd ino_t to be __ino64_t.  And the 32-bit kernel would
> > have __ARCH_WANT_STAT64, which delivers st_ino as unsigned long long.
> >
> > So I thought that a modern, professional 32-bit executable would be
> > dealing in 64-bit inode numbers anyway.  But I am not a system builder,
> > so perhaps I'm being naive.  And of course some users may have to support
> > some old userspace, or apps that assign inode numbers to "int" or "long"
> > or whatever.  I have no insight into the extent of that problem.
> 
> So how to solve this problem?
> 
> 1. tmpfs use ida or other data structure
> 
> 2. tmpfs use 64-bit, each superblock a inode number space
> 
> 3. do not do anything, If somebody hits this bug, let them solve for themselves
> 
> 4. (last_ino change to 64-bit)get_next_ino -->other filesystems will be ok, but it was rejected before

5. Extend the sbitmap API to allow for growing the bitmap.  I had a
look at doing that, and it looks hard.  There are a lot of things which
are set up at initialisation and changing them mid-use seems tricky.
Ccing Jens in case he has an opinion.

6. Creating a percpu IDA.  This doesn't seem too hard.  We need a percpu
pointer to an IDA leaf (128 bytes), and a percpu integer which is the
current base for this CPU.  At allocation time, find and set the first
free bit in the leaf, and add on the current base.

If the percpu leaf is full, set the XA_MARK_1 bit on the entry in
the XArray.  Then look for any leaves which have both the XA_MARK_0
and XA_MARK_1 bits set; if there is one, claim it by clearing the
XA_MARK_1 bit.  If not, kzalloc a new one and find a free spot for it
in the underlying XArray.

Freeing an ID is simply ida_free().  That will involve changing the
users of get_next_ino() to call put_ino(), or something.

This should generally result in similar contention between threads as
the current scheme -- accessing a shared resource every 1024 allocations.
Maybe more often as we try to avoid leaving gaps in the data structure,
or maybe less as we reuse IDs.

(I've tried to explain what I want here, but appreciate it may be
inscrutable.  I can try to explain more, or maybe I should just write
the code myself)

next prev parent reply	other threads:[~2019-11-22 22:13 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-20 14:23 zhengbin
2019-11-20 15:45 ` Matthew Wilcox
2019-11-21  2:36   ` zhengbin (A)
2019-11-21  4:52     ` Hugh Dickins
2019-11-21  6:45       ` zhengbin (A)
2019-11-21 19:53         ` Hugh Dickins
2019-11-22  1:23           ` zhengbin (A)
2019-11-22 22:13             ` Matthew Wilcox [this message]
2019-11-23  2:16               ` zhengbin (A)
2019-11-23  2:33                 ` Matthew Wilcox
2019-11-23  4:54                   ` Al Viro
2019-12-01  8:44               ` zhengbin (A)
2019-11-21 11:40       ` J. R. Okajima
2019-11-21 20:07         ` Hugh Dickins
2019-11-21  4:31   ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191122221327.GW20752@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=hooanon05g@gmail.com \
    --cc=houtao1@huawei.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yi.zhang@huawei.com \
    --cc=zhengbin13@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox