Re: [PATCH v4 08/31] list: add a new LRU list type

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Glauber Costa <glommer@parallels.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Glauber Costa <glommer@openvz.org>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg Thelen <gthelen@google.com>,
	kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH v4 08/31] list: add a new LRU list type
Date: Tue, 30 Apr 2013 20:01:41 +0400	[thread overview]
Message-ID: <517FEAE5.2010809@parallels.com> (raw)
In-Reply-To: <20130430151854.GH6415@suse.de>

>> +
>> +struct list_lru {
>> +	spinlock_t		lock;
>> +	struct list_head	list;
>> +	long			nr_items;
>> +};
> 
> Is there ever a circumstance where nr_items is negative? If so, what
> does that mean?
> 

No, but we would like to be able to detect it and BUG (which we actually do)


> 
>> +int list_lru_add(struct list_lru *lru, struct list_head *item);
>> +int list_lru_del(struct list_lru *lru, struct list_head *item);
>> +
> 
> However, these are bool and the return value determines if the item was
> really added to the list or not. It fails if the item is already part of
> a list and it would be very nice to have a comment explaining why it's
> not a bug if this happens because it feels like it would be a lookup and
> insertion race. Maybe it's clear later in the series why this is ok but
> it's not very obvious at this point.
> 

I actually don't know.
I would appreciate Dave's comments on this one. Dave?

>> +static inline long list_lru_count(struct list_lru *lru)
>> +{
>> +	return lru->nr_items;
>> +}
>> +
>> +typedef enum lru_status
>> +(*list_lru_walk_cb)(struct list_head *item, spinlock_t *lock, void *cb_arg);
>> +
>> +typedef void (*list_lru_dispose_cb)(struct list_head *dispose_list);
>> +
>> +long list_lru_walk(struct list_lru *lru, list_lru_walk_cb isolate,
>> +		   void *cb_arg, long nr_to_walk);
>> +
> 
> Is nr_to_walk ever negative?
> 

Shouldn't be, and this one, we don't BUG.

>> +long list_lru_dispose_all(struct list_lru *lru, list_lru_dispose_cb dispose);
>> +
>> +#endif /* _LRU_LIST_H */
>> diff --git a/lib/Makefile b/lib/Makefile
>> index af79e8c..40a6d4a 100644
>> --- a/lib/Makefile
>> +++ b/lib/Makefile
>> @@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
>>  	 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
>>  	 proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
>>  	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
>> -	 earlycpio.o percpu-refcount.o
>> +	 earlycpio.o percpu-refcount.o list_lru.o
>>  
>>  obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
>>  lib-$(CONFIG_MMU) += ioremap.o
>> diff --git a/lib/list_lru.c b/lib/list_lru.c
>> new file mode 100644
>> index 0000000..937ee87
>> --- /dev/null
>> +++ b/lib/list_lru.c
>> @@ -0,0 +1,118 @@
>> +/*
>> + * Copyright (c) 2010-2012 Red Hat, Inc. All rights reserved.
>> + * Author: David Chinner
>> + *
>> + * Generic LRU infrastructure
>> + */
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/list_lru.h>
>> +
>> +int
>> +list_lru_add(
>> +	struct list_lru	*lru,
>> +	struct list_head *item)
>> +{
>> +	spin_lock(&lru->lock);
>> +	if (list_empty(item)) {
>> +		list_add_tail(item, &lru->list);
>> +		lru->nr_items++;
>> +		spin_unlock(&lru->lock);
>> +		return 1;
>> +	}
>> +	spin_unlock(&lru->lock);
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(list_lru_add);
>> +
>> +int
>> +list_lru_del(
>> +	struct list_lru	*lru,
>> +	struct list_head *item)
>> +{
>> +	spin_lock(&lru->lock);
>> +	if (!list_empty(item)) {
>> +		list_del_init(item);
>> +		lru->nr_items--;
>> +		spin_unlock(&lru->lock);
>> +		return 1;
>> +	}
>> +	spin_unlock(&lru->lock);
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(list_lru_del);
>> +
>> +long
>> +list_lru_walk(
> 
> It returns long but never actually returns a negative number.
> 
>> +	struct list_lru *lru,
>> +	list_lru_walk_cb isolate,
>> +	void		*cb_arg,
>> +	long		nr_to_walk)
>> +{
>> +	struct list_head *item, *n;
>> +	long removed = 0;
>> +
>> +	spin_lock(&lru->lock);
>> +restart:
>> +	list_for_each_safe(item, n, &lru->list) {
>> +		enum lru_status ret;
>> +
>> +		if (nr_to_walk-- < 0)
>> +			break;
>> +
>> +		ret = isolate(item, &lru->lock, cb_arg);
>> +		switch (ret) {
>> +		case LRU_REMOVED:
>> +			lru->nr_items--;
>> +			removed++;
>> +			break;
>> +		case LRU_ROTATE:
>> +			list_move_tail(item, &lru->list);
>> +			break;
>> +		case LRU_SKIP:
>> +			break;
>> +		case LRU_RETRY:
>> +			goto restart;
> 
> Are the two users of LRU_RETRY guaranteed to eventually make progress
> or can this infinite loop? It feels like the behaviour of LRU_RETRY is
> not very desirable. Once an object that returns LRU_RETRY is isolated on
> the list then it looks like XFS can stall for 100 ticks (bit arbitrary)
> each time it tries and maybe do this forever.
> 
> The inode.c user looks like it could race where some other process is
> reallocating the buffers between each time this isolate callback tries to
> isolate it. Granted, it may accidentally break out because the spinlock
> is contended and it returns LRU_SKIP so maybe no one will hit the problem
> but if it's ok with LRU_SKIP then maybe a LRU_RETRY_ONCE would also be
> suitable and use that in fs/inode.c?
> 
> Either hypothetical situation would require that the list you are trying
> to walk is very small but maybe memcg will hit that problem. If there is
> a guarantee of forward progress then a comment would be nice.
> 

I have to look into this issue further, which I plan to do soon, but
can't today. Meanwhile, This is really a lot more under Dave's umbrella,
so Dave, if you would give us the honor =)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-04-30 16:00 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-26 23:18 [PATCH v4 00/31] kmemcg shrinkers Glauber Costa
2013-04-26 23:18 ` [PATCH v4 01/31] super: fix calculation of shrinkable objects for small numbers Glauber Costa
2013-04-30 13:03   ` Mel Gorman
2013-04-26 23:18 ` [PATCH v4 02/31] vmscan: take at least one pass with shrinkers Glauber Costa
2013-04-30 13:22   ` Mel Gorman
2013-04-30 13:31     ` Glauber Costa
2013-04-30 15:37       ` Mel Gorman
2013-05-07 13:35         ` Glauber Costa
2013-04-26 23:18 ` [PATCH v4 03/31] dcache: convert dentry_stat.nr_unused to per-cpu counters Glauber Costa
2013-04-30 13:37   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 04/31] dentry: move to per-sb LRU locks Glauber Costa
2013-04-30 14:01   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 05/31] dcache: remove dentries from LRU before putting on dispose list Glauber Costa
2013-04-30 14:14   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 06/31] mm: new shrinker API Glauber Costa
2013-04-30 14:40   ` Mel Gorman
2013-04-30 15:03     ` Glauber Costa
2013-04-30 15:32       ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 07/31] shrinker: convert superblock shrinkers to new API Glauber Costa
2013-04-30 14:49   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 08/31] list: add a new LRU list type Glauber Costa
2013-04-30 15:18   ` Mel Gorman
2013-04-30 16:01     ` Glauber Costa [this message]
2013-04-26 23:19 ` [PATCH v4 09/31] inode: convert inode lru list to generic lru list code Glauber Costa
2013-04-30 15:46   ` Mel Gorman
2013-05-07 13:47     ` Glauber Costa
2013-04-26 23:19 ` [PATCH v4 10/31] dcache: convert to use new lru list infrastructure Glauber Costa
2013-04-30 16:04   ` Mel Gorman
2013-04-30 16:13     ` Glauber Costa
2013-04-26 23:19 ` [PATCH v4 11/31] list_lru: per-node " Glauber Costa
2013-04-30 16:33   ` Mel Gorman
2013-04-30 21:44     ` Glauber Costa
2013-04-26 23:19 ` [PATCH v4 12/31] shrinker: add node awareness Glauber Costa
2013-04-30 16:35   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 13/31] fs: convert inode and dentry shrinking to be node aware Glauber Costa
2013-04-30 17:39   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 14/31] xfs: convert buftarg LRU to generic code Glauber Costa
2013-04-26 23:19 ` [PATCH v4 15/31] xfs: convert dquot cache lru to list_lru Glauber Costa
2013-04-26 23:19 ` [PATCH v4 16/31] fs: convert fs shrinkers to new scan/count API Glauber Costa
2013-04-26 23:19 ` [PATCH v4 17/31] drivers: convert shrinkers to new count/scan API Glauber Costa
2013-04-30 21:53   ` Mel Gorman
2013-04-30 22:00     ` Kent Overstreet
2013-05-02  9:37       ` Mel Gorman
2013-05-02 13:37         ` Glauber Costa
2013-05-01 15:26     ` Daniel Vetter
2013-05-02  9:31       ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 18/31] shrinker: convert remaining shrinkers to " Glauber Costa
2013-04-26 23:19 ` [PATCH v4 19/31] hugepage: convert huge zero page shrinker to new shrinker API Glauber Costa
2013-04-26 23:19 ` [PATCH v4 20/31] shrinker: Kill old ->shrink API Glauber Costa
2013-04-30 21:57   ` Mel Gorman
2013-04-26 23:19 ` [PATCH v4 21/31] vmscan: also shrink slab in memcg pressure Glauber Costa
2013-04-26 23:19 ` [PATCH v4 22/31] memcg,list_lru: duplicate LRUs upon kmemcg creation Glauber Costa
2013-04-26 23:19 ` [PATCH v4 23/31] lru: add an element to a memcg list Glauber Costa
2013-04-26 23:19 ` [PATCH v4 24/31] list_lru: per-memcg walks Glauber Costa
2013-04-26 23:19 ` [PATCH v4 25/31] memcg: per-memcg kmem shrinking Glauber Costa
2013-04-26 23:19 ` [PATCH v4 26/31] memcg: scan cache objects hierarchically Glauber Costa
2013-04-26 23:19 ` [PATCH v4 27/31] super: targeted memcg reclaim Glauber Costa
2013-04-26 23:19 ` [PATCH v4 28/31] memcg: move initialization to memcg creation Glauber Costa
2013-04-26 23:19 ` [PATCH v4 29/31] vmpressure: in-kernel notifications Glauber Costa
2013-04-26 23:19 ` [PATCH v4 30/31] memcg: reap dead memcgs upon global memory pressure Glauber Costa
2013-04-26 23:19 ` [PATCH v4 31/31] memcg: debugging facility to access dangling memcgs Glauber Costa
2013-04-30 22:47 ` [PATCH v4 00/31] kmemcg shrinkers Mel Gorman
2013-05-01  9:05   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=517FEAE5.2010809@parallels.com \
    --to=glommer@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dchinner@redhat.com \
    --cc=glommer@openvz.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox