linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: "Pavel.Tatashin@microsoft.com" <Pavel.Tatashin@microsoft.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH 2/2] mm: zero remaining unavailable struct pages
Date: Wed, 19 Sep 2018 01:54:40 +0000	[thread overview]
Message-ID: <20180919015440.GA2581@hori1.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20180917132605.eln6tlc6hf7vfjy2@gabell>

On Mon, Sep 17, 2018 at 09:26:07AM -0400, Masayoshi Mizuma wrote:
> On Fri, Aug 31, 2018 at 02:55:36AM +0000, Naoya Horiguchi wrote:
> > On Wed, Aug 29, 2018 at 11:16:30AM -0400, Masayoshi Mizuma wrote:
> > > Hi Horiguchi-san and Pavel
> > > 
> > > Thank you for your comments!
> > > The Pavel's additional patch looks good to me, so I will add it to this series.
> > > 
> > > However, unfortunately, the movable_node option has something wrong yet...
> > > When I offline the memory which belongs to movable zone, I got the following
> > > warning. I'm trying to debug it.
> > > 
> > > I try to describe the issue as following. 
> > > If you have any comments, please let me know.
> > > 
> > > WARNING: CPU: 156 PID: 25611 at mm/page_alloc.c:7730 has_unmovable_pages+0x1bf/0x200
> > > RIP: 0010:has_unmovable_pages+0x1bf/0x200
> > > ...
> > > Call Trace:
> > >  is_mem_section_removable+0xd3/0x160
> > >  show_mem_removable+0x8e/0xb0
> > >  dev_attr_show+0x1c/0x50
> > >  sysfs_kf_seq_show+0xb3/0x110
> > >  seq_read+0xee/0x480
> > >  __vfs_read+0x36/0x190
> > >  vfs_read+0x89/0x130
> > >  ksys_read+0x52/0xc0
> > >  do_syscall_64+0x5b/0x180
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > RIP: 0033:0x7fe7b7823f70
> > > ...
> > > 
> > > I added a printk to catch the unmovable page.
> > > ---
> > > @@ -7713,8 +7719,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> > >                  * is set to both of a memory hole page and a _used_ kernel
> > >                  * page at boot.
> > >                  */
> > > -               if (found > count)
> > > +               if (found > count) {
> > > +                       pr_info("DEBUG: %s zone: %lx page: %lx pfn: %lx flags: %lx found: %ld count: %ld \n",
> > > +                               __func__, zone, page, page_to_pfn(page), page->flags, found, count);
> > >                         goto unmovable;
> > > +               }
> > > ---
> > > 
> > > Then I got the following. The page (PFN: 0x1c0ff130d) flag is 
> > > 0xdfffffc0040048 (uptodate|active|swapbacked)
> > > 
> > > ---
> > > DEBUG: has_unmovable_pages zone: 0xffff8c0ffff80380 page: 0xffffea703fc4c340 pfn: 0x1c0ff130d flags: 0xdfffffc0040048 found: 1 count: 0 
> > > ---
> > > 
> > > And I got the owner from /sys/kernel/debug/page_owner.
> > > 
> > > Page allocated via order 0, mask 0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO)
> > > PFN 7532909325 type Movable Block 14712713 type Movable Flags 0xdfffffc0040048(uptodate|active|swapbacked)
> > >  __alloc_pages_nodemask+0xfc/0x270
> > >  alloc_pages_vma+0x7c/0x1e0
> > >  handle_pte_fault+0x399/0xe50
> > >  __handle_mm_fault+0x38e/0x520
> > >  handle_mm_fault+0xdc/0x210
> > >  __do_page_fault+0x243/0x4c0
> > >  do_page_fault+0x31/0x130
> > >  page_fault+0x1e/0x30
> > > 
> > > The page is allocated as anonymous page via page fault.
> > > I'm not sure, but lru flag should be added to the page...?
> > 
> > There is a small window of no PageLRU flag just after page allocation
> > until the page is linked to some LRU list.
> > This kind of unmovability is transient, so retrying can work.
> > 
> > I guess that this warning seems to be visible since commit 15c30bc09085
> > ("mm, memory_hotplug: make has_unmovable_pages more robust")
> > which turned off the optimization based on the assumption that pages
> > under ZONE_MOVABLE are always movable.
> > I think that it helps developers find the issue that permanently
> > unmovable pages are accidentally located in ZONE_MOVABLE zone.
> > But even ZONE_MOVABLE zone could have transiently unmovable pages,
> > so the reported warning seems to me a false charge and should be avoided.
> > Doing lru_add_drain_all()/drain_all_pages() before has_unmovable_pages()
> > might be helpful?
> 
> Thanks you for your proposal! And sorry for delayed responce.
> 
> lru_add_drain_all()/drain_all_pages() might be helpful, but it 
> seems that the window is not very small because I tried to do
> offline some times, and every offline failed...

OK, so this doesn't work, thank you for trying.

> 
> I have another idea. I found that if the page is belonged to
> Movable zone and it has Uptodate flag, the page will go lru
> soon, so I think we can pass the page.
> Does the idea make sence? As far as I tested it, it works well.
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 52d9efe8c9fb..ecf87bec8ac6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7758,6 +7758,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>                 if (__PageMovable(page))
>                         continue;
> 
> +               if ((zone_idx(zone) == ZONE_MOVABLE) && PageUptodate(page))
> +                       continue;
> +

We have many call sites calling SetPageUptodate (many are from filesystems,)
so I'm concerned that some caller might set PageUptodate on non-LRU pages.
Could you explain a little more how/why this check is a clear separation b/w
movable pages and unmovable pages?
(Filesystem metadata is never allocated from ZONE_MOVABLE?)

Thanks,
Naoya Horiguchi

  reply	other threads:[~2018-09-19  1:57 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-23 18:25 [PATCH 1/2] Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved" Masayoshi Mizuma
2018-08-23 18:25 ` [PATCH 2/2] mm: zero remaining unavailable struct pages Masayoshi Mizuma
2018-08-27 23:33   ` Pasha Tatashin
2018-08-29 15:16     ` Masayoshi Mizuma
2018-08-31  2:55       ` Naoya Horiguchi
2018-09-17 13:26         ` Masayoshi Mizuma
2018-09-19  1:54           ` Naoya Horiguchi [this message]
2018-09-19 18:15             ` Masayoshi Mizuma
2018-08-24  0:03 ` [PATCH 1/2] Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved" Naoya Horiguchi
2018-08-24  8:29   ` Michal Hocko
2018-08-27 12:31     ` Masayoshi Mizuma
2018-08-27 13:29       ` Pasha Tatashin
2018-08-27 23:18 ` Pasha Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180919015440.GA2581@hori1.linux.bs1.fc.nec.co.jp \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=Pavel.Tatashin@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=msys.mizuma@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox