linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: sjenning@linux.vnet.ibm.com, dan.magenheimer@oracle.com,
	devel@linuxdriverproject.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, ngupta@vflare.org, minchan@kernel.org,
	mgorman@suse.de, fschmaus@gmail.com, andor.daam@googlemail.com,
	ilendir@googlemail.com
Subject: Re: [PATCH 1/8] mm: cleancache: lazy initialization to allow tmem backends to build/run as modules
Date: Wed, 30 Jan 2013 10:57:18 -0500	[thread overview]
Message-ID: <20130130155718.GB1272@konrad-lan.dumpdata.com> (raw)
In-Reply-To: <20121116151049.244bb8f4.akpm@linux-foundation.org>

On Fri, Nov 16, 2012 at 03:10:49PM -0800, Andrew Morton wrote:
> On Wed, 14 Nov 2012 13:57:05 -0500
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> 
> > From: Dan Magenheimer <dan.magenheimer@oracle.com>
> > 
> > With the goal of allowing tmem backends (zcache, ramster, Xen tmem) to be
> > built/loaded as modules rather than built-in and enabled by a boot parameter,
> > this patch provides "lazy initialization", allowing backends to register to
> > cleancache even after filesystems were mounted. Calls to init_fs and
> > init_shared_fs are remembered as fake poolids but no real tmem_pools created.
> > On backend registration the fake poolids are mapped to real poolids and
> > respective tmem_pools.
> 
> What is your merge plan/path for this work?
> 
> >
> > ...
> >
> > + * When no backend is registered all calls to init_fs and init_shard_fs
> 
> "init_shared_fs"
> 
> > + * are registered and fake poolids are given to the respective
> > + * super block but no tmem_pools are created. When a backend
> > + * registers with cleancache the previous calls to init_fs and
> > + * init_shared_fs are executed to create tmem_pools and set the
> > + * respective poolids. While no backend is registered all "puts",
> > + * "gets" and "flushes" are ignored or fail.
> 
> The comment could use all 80 cols..
> 
> >
> > ...
> >
> >  struct cleancache_ops cleancache_register_ops(struct cleancache_ops *ops)
> >  {
> >  	struct cleancache_ops old = cleancache_ops;
> > +	int i;
> >  
> >  	cleancache_ops = *ops;
> > -	cleancache_enabled = 1;
> > +
> > +	backend_registered = true;
> > +	for (i = 0; i < MAX_INITIALIZABLE_FS; i++) {
> > +		if (fs_poolid_map[i] == FS_NO_BACKEND)
> > +			fs_poolid_map[i] = (*cleancache_ops.init_fs)(PAGE_SIZE);
> > +		if (shared_fs_poolid_map[i] == FS_NO_BACKEND)
> > +			shared_fs_poolid_map[i] = (*cleancache_ops.init_shared_fs)
> > +					(uuids[i], PAGE_SIZE);
> > +	}
> >  	return old;
> >  }
> 
> I never noticed before - this function returns a large structure by
> value.  That's really really unusual in the kernel.  I see no problem
> with it per-se, but it might generate awful code.
> 
> Also, this function has no locking and is blatantly racy.
> 
> >  EXPORT_SYMBOL(cleancache_register_ops);
> > @@ -61,15 +91,38 @@ EXPORT_SYMBOL(cleancache_register_ops);
> >  /* Called by a cleancache-enabled filesystem at time of mount */
> >  void __cleancache_init_fs(struct super_block *sb)
> >  {
> > -	sb->cleancache_poolid = (*cleancache_ops.init_fs)(PAGE_SIZE);
> > +	int i;
> > +
> > +	for (i = 0; i < MAX_INITIALIZABLE_FS; i++) {
> > +		if (fs_poolid_map[i] == FS_UNKNOWN) {
> > +			sb->cleancache_poolid = i + FAKE_FS_POOLID_OFFSET;
> > +			if (backend_registered)
> > +				fs_poolid_map[i] = (*cleancache_ops.init_fs)(PAGE_SIZE);
> > +			else
> > +				fs_poolid_map[i] = FS_NO_BACKEND;
> > +			break;
> > +		}
> > +	}
> >  }
> >  EXPORT_SYMBOL(__cleancache_init_fs);
> 
> This also looks wildly racy.
> 
> >  /* Called by a cleancache-enabled clustered filesystem at time of mount */
> >  void __cleancache_init_shared_fs(char *uuid, struct super_block *sb)
> >  {
> > -	sb->cleancache_poolid =
> > -		(*cleancache_ops.init_shared_fs)(uuid, PAGE_SIZE);
> > +	int i;
> > +
> > +	for (i = 0; i < MAX_INITIALIZABLE_FS; i++) {
> > +		if (shared_fs_poolid_map[i] == FS_UNKNOWN) {
> > +			sb->cleancache_poolid = i + FAKE_SHARED_FS_POOLID_OFFSET;
> > +			uuids[i] = uuid;
> > +			if (backend_registered)
> > +				shared_fs_poolid_map[i] = (*cleancache_ops.init_shared_fs)
> > +						(uuid, PAGE_SIZE);
> > +			else
> > +				shared_fs_poolid_map[i] = FS_NO_BACKEND;
> > +			break;
> > +		}
> > +	}
> >  }
> >  EXPORT_SYMBOL(__cleancache_init_shared_fs);
> 
> Again, a huge mess if two threads execute this concurrently.
> 
> > @@ -99,6 +152,19 @@ static int cleancache_get_key(struct inode *inode,
> >  }
> >  
> >  /*
> > + * Returns a pool_id that is associated with a given fake poolid.
> 
> Was there a comment anywhere which tells the reader what a "fake poolid" is?
> 
> > + */
> > +static int get_poolid_from_fake(int fake_pool_id)
> > +{
> > +	if (fake_pool_id >= FAKE_SHARED_FS_POOLID_OFFSET)
> > +		return shared_fs_poolid_map[fake_pool_id -
> > +			FAKE_SHARED_FS_POOLID_OFFSET];
> > +	else if (fake_pool_id >= FAKE_FS_POOLID_OFFSET)
> > +		return fs_poolid_map[fake_pool_id - FAKE_FS_POOLID_OFFSET];
> > +	return FS_NO_BACKEND;
> > +}
> > +
> > +/*
> >   * "Get" data from cleancache associated with the poolid/inode/index
> >   * that were specified when the data was put to cleanache and, if
> >   * successful, use it to fill the specified page with data and return 0.
> > @@ -109,17 +175,26 @@ int __cleancache_get_page(struct page *page)
> >  {
> >  	int ret = -1;
> >  	int pool_id;
> > +	int fake_pool_id;
> >  	struct cleancache_filekey key = { .u.key = { 0 } };
> >  
> > +	if (!backend_registered) {
> > +		cleancache_failed_gets++;
> > +		goto out;
> > +	}
> 
> Races everywhere...
> 
> >  	VM_BUG_ON(!PageLocked(page));
> > -	pool_id = page->mapping->host->i_sb->cleancache_poolid;
> > -	if (pool_id < 0)
> > +	fake_pool_id = page->mapping->host->i_sb->cleancache_poolid;
> > +	if (fake_pool_id < 0)
> >  		goto out;
> > +	pool_id = get_poolid_from_fake(fake_pool_id);
> >  
> >  	if (cleancache_get_key(page->mapping->host, &key) < 0)
> >  		goto out;
> >  
> > -	ret = (*cleancache_ops.get_page)(pool_id, key, page->index, page);
> > +	if (pool_id >= 0)
> > +		ret = (*cleancache_ops.get_page)(pool_id,
> > +				key, page->index, page);
> >  	if (ret == 0)
> >  		cleancache_succ_gets++;
> >  	else
> > @@ -138,12 +213,23 @@ EXPORT_SYMBOL(__cleancache_get_page);
> >  void __cleancache_put_page(struct page *page)
> >  {
> >  	int pool_id;
> > +	int fake_pool_id;
> >  	struct cleancache_filekey key = { .u.key = { 0 } };
> >  
> > +	if (!backend_registered) {
> > +		cleancache_puts++;
> > +		return;
> > +	}
> 
> More..
> 
> >  	VM_BUG_ON(!PageLocked(page));
> > -	pool_id = page->mapping->host->i_sb->cleancache_poolid;
> > +	fake_pool_id = page->mapping->host->i_sb->cleancache_poolid;
> > +	if (fake_pool_id < 0)
> > +		return;
> > +
> > +	pool_id = get_poolid_from_fake(fake_pool_id);
> > +
> >  	if (pool_id >= 0 &&
> > -	      cleancache_get_key(page->mapping->host, &key) >= 0) {
> > +		cleancache_get_key(page->mapping->host, &key) >= 0) {
> >  		(*cleancache_ops.put_page)(pool_id, key, page->index, page);
> >  		cleancache_puts++;
> >  	}
> >
> > ...
> >
> 
> I don't understand the timing flow here, nor the existing constraints
> on what can be done and when, but....
> 
> The whole thing looks really hacky?  Why do we need to remember all
> this stuff for later on?  What prevents us from simply synchonously
> doing whatever we need to do when someone wants to register a backend?
> 
> Maybe a little ascii time chart/flow diagram would help.

This patch hopefully answers the questions and comments. It has a bit
of a a), then b), then c) type chart to illustrate the issue of backends
registered asynchronously.

  reply	other threads:[~2013-01-30 15:57 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-14 18:57 [PATCH v2] enable all tmem backends to be built and loaded " Konrad Rzeszutek Wilk
2012-11-14 18:57 ` [PATCH 1/8] mm: cleancache: lazy initialization to allow tmem backends to build/run " Konrad Rzeszutek Wilk
2012-11-16 23:10   ` Andrew Morton
2013-01-30 15:57     ` Konrad Rzeszutek Wilk [this message]
2012-11-14 18:57 ` [PATCH 2/8] mm: frontswap: " Konrad Rzeszutek Wilk
2012-11-16 23:16   ` Andrew Morton
2012-11-19  0:53     ` Bob Liu
2012-11-19 22:25       ` Andrew Morton
2012-11-27 21:26         ` Konrad Rzeszutek Wilk
2013-01-30 15:55           ` Konrad Rzeszutek Wilk
2012-11-14 18:57 ` [PATCH 3/8] frontswap: Make frontswap_init use a pointer for the ops Konrad Rzeszutek Wilk
2012-11-14 18:57 ` [PATCH 4/8] cleancache: Make cleancache_init " Konrad Rzeszutek Wilk
2012-11-14 18:57 ` [PATCH 5/8] staging: zcache2+ramster: enable ramster to be built/loaded as a module Konrad Rzeszutek Wilk
2012-11-14 18:57 ` [PATCH 6/8] staging: zcache2+ramster: enable zcache2 " Konrad Rzeszutek Wilk
2012-11-14 18:57 ` [PATCH 7/8] xen: tmem: enable Xen tmem shim " Konrad Rzeszutek Wilk
2012-11-14 18:57 ` [PATCH 8/8] xen/tmem: Remove the subsys call Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130130155718.GB1272@konrad-lan.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andor.daam@googlemail.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=devel@linuxdriverproject.org \
    --cc=fschmaus@gmail.com \
    --cc=ilendir@googlemail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=sjenning@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox