From: Alexey Lyahkov <alexey.lyashkov@gmail.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Theodore Ts'o <tytso@mit.edu>, Andrew Perepechko <anserper@ya.ru>,
Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>,
Bernd Schubert <bernd.schubert@fastmail.fm>,
Will Huck <will.huckk@gmail.com>,
linux-ext4@vger.kernel.org, linux-mm@kvack.org
Subject: Re: page eviction from the buddy cache
Date: Thu, 25 Apr 2013 21:37:07 +0300 [thread overview]
Message-ID: <7398CEE9-AF68-4A2A-82E4-940FADF81F97@gmail.com> (raw)
In-Reply-To: <20130425143056.GF2144@suse.de>
Mel,
On Apr 25, 2013, at 17:30, Mel Gorman wrote:
> On Wed, Apr 24, 2013 at 10:26:50AM -0400, Theodore Ts'o wrote:
>> On Tue, Apr 23, 2013 at 03:00:08PM -0700, Andrew Morton wrote:
>>> That should fix things for now. Although it might be better to just do
>>>
>>> mark_page_accessed(page); /* to SetPageReferenced */
>>> lru_add_drain(); /* to SetPageLRU */
>>>
>>> Because a) this was too early to decide that the page is
>>> super-important and b) the second touch of this page should have a
>>> mark_page_accessed() in it already.
>>
>> The question is do we really want to put lru_add_drain() into the ext4
>> file system code? That seems to pushing some fairly mm-specific
>> knowledge into file system code. I'll do this if I have to do, but
>> wouldn't be better if this was pushed into mark_page_accessed(), or
>> some other new API was exported by the mm subsystem?
>>
>
> I don't think we want to push lru_add_drain() into the ext4 code. It's
> too specific of knowledge just to work around pagevecs. Before we rework
> how pagevecs select what LRU to place a page, can we make sure that fixing
> that will fix the problem?
>
what is "that"? puting lru_add_drain() in ext4 core? sure that is fixes problem with many small reads during large write.
originally i have put shake_page() in ext4 code, but that have call lru_add_drain_all() so to exaggerated.
Index: linux-stage/fs/ext4/mballoc.c
===================================================================
--- linux-stage.orig/fs/ext4/mballoc.c 2013-03-19 10:55:52.000000000 +0200
+++ linux-stage/fs/ext4/mballoc.c 2013-03-19 10:59:02.000000000 +0200
@@ -900,8 +900,11 @@ static int ext4_mb_init_cache(struct pag
incore = data;
}
}
- if (likely(err == 0))
+ if (likely(err == 0)) {
SetPageUptodate(page);
+ /* make sure it's in active list */
+ mark_page_accessed(page);
+ }
out:
if (bh) {
@@ -957,6 +960,8 @@ int ext4_mb_init_group(struct super_bloc
page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
if (page) {
BUG_ON(page->mapping != inode->i_mapping);
+ /* move to lru - should be lru_add_drain() */
+ shake_page(page, 0);
ret = ext4_mb_init_cache(page, NULL);
if (ret) {
unlock_page(page);
@@ -986,6 +991,8 @@ int ext4_mb_init_group(struct super_bloc
unlock_page(page);
} else if (page) {
BUG_ON(page->mapping != inode->i_mapping);
+ /* move to lru - should be lru_add_drain() */
+ shake_page(page, 0);
ret = ext4_mb_init_cache(page, bitmap);
if (ret) {
unlock_page(page);
@@ -1087,6 +1094,7 @@ repeat_load_buddy:
if (page) {
BUG_ON(page->mapping != inode->i_mapping);
if (!PageUptodate(page)) {
+ shake_page(page, 0);
ret = ext4_mb_init_cache(page, NULL);
if (ret) {
unlock_page(page);
@@ -1118,6 +1126,7 @@ repeat_load_buddy:
if (page) {
BUG_ON(page->mapping != inode->i_mapping);
if (!PageUptodate(page)) {
+ shake_page(page, 0);
ret = ext4_mb_init_cache(page, e4b->bd_bitmap);
if (ret) {
unlock_page(page);
@@ -2500,6 +2509,8 @@ static int ext4_mb_init_backend(struct s
* not in the inode hash, so it should never be found by iget(), but
* this will avoid confusion if it ever shows up during debugging. */
sbi->s_buddy_cache->i_ino = EXT4_BAD_INO;
+ sbi->s_buddy_cache->i_state = I_NEW;
+// mapping_set_unevictable(sbi->s_buddy_cache->i_mapping);
EXT4_I(sbi->s_buddy_cache)->i_disksize = 0;
for (i = 0; i < ngroups; i++) {
desc = ext4_get_group_desc(sb, i, NULL);
additional i_state = I_NEW need to prevent kill page cache from sysctl -w vm.drop_caches=3
> Andrew, can you try the following patch please? Also, is there any chance
> you can describe in more detail what the workload does?
lustre OSS node + IOR with file size twice more then OSS memory.
> If it fails to boot,
> remove the second that calls lru_add_drain_all() and try again.
well, i will try.
>
> The patch looks deceptively simple, a downside from is is that workloads that
> call mark_page_accessed() frequently will contend more on the zone->lru_lock
> than it did previously. Moving lru_add_drain() to the ext4 could would
> suffer the same contention problem.
NO, isn't. we have call lru_add_drain() in new page allocation case, but mark_page_accessed called without differences - is page in page cache already or it's new allocated - so we have very small zone->lru_lock contention.
>
> Thanks.
>
> ---8<---
> mm: pagevec: Move inactive pages to active lists even if on a pagevec
>
> If a page is on a pagevec aimed at the inactive list then two subsequent
> calls to mark_page_acessed() will still not move it to the active list.
> This can cause a page to be reclaimed sooner than is expected. This
> patch detects if an inactive page is not on the LRU and drains the
> pagevec before promoting it.
>
> Not-signed-off
>
> diff --git a/mm/swap.c b/mm/swap.c
> index 8a529a0..eac64fe 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -437,7 +437,18 @@ void activate_page(struct page *page)
> void mark_page_accessed(struct page *page)
> {
> if (!PageActive(page) && !PageUnevictable(page) &&
> - PageReferenced(page) && PageLRU(page)) {
> + PageReferenced(page)) {
> + /* Page could be in pagevec */
> + if (!PageLRU(page))
> + lru_add_drain();
> +
> + /*
> + * Weeeee, using in_atomic() like this is a hand-grenade.
> + * Patch is for debugging purposes only, do not merge this.
> + */
> + if (!PageLRU(page) && !in_atomic())
> + lru_add_drain_all();
> +
> activate_page(page);
> ClearPageReferenced(page);
> } else if (!PageReferenced(page)) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-04-25 18:37 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <51504A40.6020604@ya.ru>
[not found] ` <20130327150743.GC14900@thunk.org>
2013-03-27 19:24 ` Hugh Dickins
2013-03-28 5:34 ` Alexey Lyahkov
2013-04-04 1:24 ` Will Huck
2013-04-04 4:51 ` Alexey Lyahkov
2013-04-20 21:18 ` Bernd Schubert
2013-04-20 23:57 ` Theodore Ts'o
2013-04-22 12:14 ` Alexey Lyahkov
2013-04-23 12:02 ` Bernd Schubert
2013-04-23 12:27 ` Theodore Ts'o
2013-04-23 19:57 ` Hugh Dickins
2013-04-23 22:00 ` Andrew Morton
2013-04-23 22:31 ` Hugh Dickins
2013-04-24 14:26 ` Theodore Ts'o
2013-04-24 21:41 ` Andrew Morton
2013-04-25 8:18 ` Alexey Lyahkov
2013-04-25 14:30 ` Mel Gorman
2013-04-25 18:37 ` Alexey Lyahkov [this message]
2013-04-25 22:40 ` Mel Gorman
2013-04-26 6:03 ` Alexey Lyahkov
2013-04-22 12:18 ` Alexey Lyahkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7398CEE9-AF68-4A2A-82E4-940FADF81F97@gmail.com \
--to=alexey.lyashkov@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=anserper@ya.ru \
--cc=bernd.schubert@fastmail.fm \
--cc=hughd@google.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=tytso@mit.edu \
--cc=will.huckk@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox