From: Hirokazu Takahashi <taka@valinux.co.jp>
To: marcelo.tosatti@cyclades.com
Cc: iwamoto@valinux.co.jp, haveblue@us.ibm.com, akpm@osdl.org,
linux-mm@kvack.org, piggin@cyberone.com.au, arjanv@redhat.com,
linux-kernel@vger.kernel.org
Subject: Re: [RFC] memory defragmentation to satisfy high order allocations
Date: Mon, 04 Oct 2004 03:35:59 +0900 (JST) [thread overview]
Message-ID: <20041004.033559.71092746.taka@valinux.co.jp> (raw)
In-Reply-To: <20041003140723.GD4635@logos.cnet>
Hi, Marcelo
> > > 2)
> > > At migrate_onepage you add anonymous pages which aren't swap allocated
> > > to the swap cache
> > > + /*
> > > + * Put the page in a radix tree if it isn't in the tree yet.
> > > + */
> > > +#ifdef CONFIG_SWAP
> > > + if (PageAnon(page) && !PageSwapCache(page))
> > > + if (!add_to_swap(page, GFP_KERNEL)) {
> > > + unlock_page(page);
> > > + return ERR_PTR(-ENOSPC);
> > > + }
> > > +#endif /* CONFIG_SWAP */
> > >
> > > Why's that? You can copy anonymous pages without adding them to swap (thats
> > > what the patch I posted does).
> >
> > The reason is to guarantee that any anonymous page can be migrated anytime.
> > I want to block newly occurred accesses to the page during the migration
> > because it can't be migrated if there remain some references on it by
> > system calls, direct I/O and page faults.
>
> It would be nice if we could block pte faults in a way such to not need
> adding each anonymous page to swap. It can be too costly if you have a lot memory
> and it makes the whole operation dependable on swap size (if you dont have enough
> swap, you're dead).
>
> Maybe hold mm->page_table_lock (might be too costly in terms of CPU time, but since
> migration is not a common operation anyway), or create a semaphore?
I think the problem of the holding mm->page_table_lock approach is
that it doesn't allow the migration code blocked. the semaphore
approach would be better.
I have another idea that each anonymous page can detach its swap entry
after its migration. It can be done by remove_exclusive_swap_page()
if the page is remapped to the same spaces forcibly by
touch_unmapped_address() I made.
> > Your approach will work fine on most of anonymous pages, which aren't
> > heavily accessed. I think it will be enough for memory defragmentation.
>
> Yes...
>
> > > 3) At migrate_page_common you assume additional page references
> > > (page_migratable returning -EAGAIN) means the code should try to writeout
> > > the page.
> > >
> > > Is that assumption always valid?
> >
> > -EAGAIN means that the page may require to be written back
>
> But why is it needed to writeout pages? We shouldnt need to. At least
> from what I can understand.
The migration code allows each filesystem to implement its own
migration code or just use migrate_page_buffer() or
migrate_page_common().
migrate_page_common() is a default function if filesystem doesn't
implement anything. The function is the most generic and it tries
to writeback pages only if they are dirty and have buffers.
> > or
> > just to wait for a while since the page is just referred by system call
> > or pagefault handler.
>
> I'm not sure if making that assumption is always valid.
>
> Kernel code can have an additional count on the page meaning "this page is pinned,
> dont move it". At least that should be valid.
Yes, I know. I have checked all of the code.
AIO event buffers are pinned, therefore the memory-hotplug team plans
to make pages for the event buffers assigned to non-hotpluggable
memory regions.
And pages in sendfile() might be pinned for a while in case of network
problems. I think there may be some workarounds. The easiest way
is just waiting its timeout, and another way is changing the mode
of sendfile() to copy pages in advance.
Pages for NFS also might be pinned with network problems.
One of the ideas is to restrict NFS to allocate pages from
specific memory region, sot that all memory except the region
can be hot-removed. And it's possible to implementing whole
migrate_page method, which may handled stuck pages.
If the migration code is used for memory defragmentation, pinned pages
must be avoided. I think it can be done with the non-blocking mode.
> Any piece of code which holds a reference on a page for a long
> time is going to be a pain for the algorithm right?
>
> > > 4)
> > > About implementing a nonblocking version of it. The easier way, it
> > > seems to me, is to pass a "block" argument to generic_migrate_page() and
> > > use that.
> >
> > Yes.
>
> OK. I'll try to implement it this week (plus the radix_tree_replace
> tag thingie).
Thank you for that.
Hirokazu Takahashi.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2004-10-03 18:35 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-10-01 18:22 Marcelo Tosatti
2004-10-01 20:11 ` Andrew Morton
2004-10-01 19:04 ` Marcelo Tosatti
2004-10-01 21:00 ` Andrew Morton
2004-10-01 21:57 ` Dave Hansen
2004-10-01 23:42 ` Marcelo Tosatti
2004-10-02 1:17 ` Andrew Morton
2004-10-02 9:30 ` Hirokazu Takahashi
2004-10-02 18:33 ` Marcelo Tosatti
2004-10-03 4:13 ` Hirokazu Takahashi
2004-10-03 14:07 ` Marcelo Tosatti
2004-10-03 18:35 ` Hirokazu Takahashi [this message]
2004-10-03 19:21 ` Trond Myklebust
2004-10-03 20:03 ` Hirokazu Takahashi
2004-10-03 20:44 ` Trond Myklebust
2004-10-04 13:02 ` Hirokazu Takahashi
2004-10-04 17:24 ` Marcelo Tosatti
2004-10-05 2:53 ` Hirokazu Takahashi
2004-10-07 12:06 ` Marcelo Tosatti
2004-10-08 7:00 ` Hirokazu Takahashi
2004-10-08 10:00 ` Marcelo Tosatti
2004-10-08 12:23 ` Hirokazu Takahashi
2004-10-08 12:41 ` Marcelo Tosatti
2004-10-08 16:52 ` Hirokazu Takahashi
2004-10-08 15:36 ` Marcelo Tosatti
2004-10-12 10:56 ` IWAMOTO Toshihiro
2004-10-12 10:35 ` Marcelo Tosatti
2004-10-12 17:55 ` Hirokazu Takahashi
2004-10-12 14:26 ` Martin J. Bligh
2004-10-12 12:17 ` Marcelo Tosatti
2004-10-12 15:01 ` Dave Hansen
2004-10-04 3:24 ` IWAMOTO Toshihiro
2004-10-04 2:22 ` Dave Hansen
2004-10-05 16:46 ` [PATCH] mhp: transfer dirty tag at radix_tree_replace Marcelo Tosatti
2004-10-05 18:35 ` Dave Hansen
2004-10-06 7:39 ` Hirokazu Takahashi
2004-10-08 8:15 ` Hirokazu Takahashi
2004-10-08 20:36 ` Marcelo Tosatti
2004-10-04 4:09 ` [RFC] memory defragmentation to satisfy high order allocations IWAMOTO Toshihiro
2004-10-04 17:29 ` Marcelo Tosatti
2004-10-02 2:30 ` Nick Piggin
2004-10-02 3:08 ` Marcelo Tosatti
2004-10-04 8:15 ` Nick Piggin
2004-10-02 2:41 ` Nick Piggin
2004-10-02 3:50 ` Hirokazu Takahashi
2004-10-02 16:06 ` Marcelo Tosatti
2004-10-04 2:38 ` Hiroyuki KAMEZAWA
2004-10-04 17:32 ` Marcelo Tosatti
2004-10-04 6:58 ` Hiroyuki KAMEZAWA
2004-10-07 15:58 ` memory hotplug and mem= Marcelo Tosatti
2004-10-07 18:36 ` Dave Hansen
2004-10-07 17:01 ` Marcelo Tosatti
2004-10-07 19:10 ` Dave Hansen
2004-10-07 20:25 ` Dave Hansen
2004-10-11 16:40 [RFC] memory defragmentation to satisfy high order allocations linux
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20041004.033559.71092746.taka@valinux.co.jp \
--to=taka@valinux.co.jp \
--cc=akpm@osdl.org \
--cc=arjanv@redhat.com \
--cc=haveblue@us.ibm.com \
--cc=iwamoto@valinux.co.jp \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=marcelo.tosatti@cyclades.com \
--cc=piggin@cyberone.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox