linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Michal Nazarewicz" <mina86@mina86.com>
To: Marek Szyprowski <m.szyprowski@samsung.com>, Mel Gorman <mel@csn.ul.ie>
Cc: linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-media@vger.kernel.org, linux-mm@kvack.org,
	linaro-mm-sig@lists.linaro.org,
	Kyungmin Park <kyungmin.park@samsung.com>,
	Russell King <linux@arm.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Daniel Walker <dwalker@codeaurora.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Jesse Barker <jesse.barker@linaro.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Shariq Hasnain <shariq.hasnain@linaro.org>,
	Chunsang Jeong <chunsang.jeong@linaro.org>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Benjamin Gaignard <benjamin.gaignard@linaro.org>
Subject: Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating
Date: Mon, 30 Jan 2012 16:41:22 +0100	[thread overview]
Message-ID: <op.v8wlu8ws3l0zgt@mpn-glaptop> (raw)
In-Reply-To: <20120130111522.GE25268@csn.ul.ie>

On Mon, 30 Jan 2012 12:15:22 +0100, Mel Gorman <mel@csn.ul.ie> wrote:

> On Thu, Jan 26, 2012 at 10:00:44AM +0100, Marek Szyprowski wrote:
>> From: Michal Nazarewicz <mina86@mina86.com>
>> @@ -139,3 +139,27 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
>>  	spin_unlock_irqrestore(&zone->lock, flags);
>>  	return ret ? 0 : -EBUSY;
>>  }
>> +
>> +/* must hold zone->lock */
>> +void update_pcp_isolate_block(unsigned long pfn)
>> +{
>> +	unsigned long end_pfn = pfn + pageblock_nr_pages;
>> +	struct page *page;
>> +
>> +	while (pfn < end_pfn) {
>> +		if (!pfn_valid_within(pfn)) {
>> +			++pfn;
>> +			continue;
>> +		}
>> +

On Mon, 30 Jan 2012 12:15:22 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> There is a potential problem here that you need to be aware of.
> set_pageblock_migratetype() is called from start_isolate_page_range().
> I do not think there is a guarantee that pfn + pageblock_nr_pages is
> not in a different block of MAX_ORDER_NR_PAGES. If that is right then
> your options are to add a check like this;
>
> if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && !pfn_valid(pfn))
> 	break;
>
> or else ensure that end_pfn is always MAX_ORDER_NR_PAGES aligned and in
> the same block as pfn and relying on the caller to have called
> pfn_valid.

	pfn = round_down(pfn, pageblock_nr_pages);
	end_pfn = pfn + pageblock_nr_pages;

should do the trick as well, right?  move_freepages_block() seem to be
doing the same thing.

>> +		page = pfn_to_page(pfn);
>> +		if (PageBuddy(page)) {
>> +			pfn += 1 << page_order(page);
>> +		} else if (page_count(page) == 0) {
>> +			set_page_private(page, MIGRATE_ISOLATE);
>> +			++pfn;
>
> This is dangerous for two reasons. If the page_count is 0, it could
> be because the page is in the process of being freed and is not
> necessarily on the per-cpu lists yet and you cannot be sure if the
> contents of page->private are important. Second, there is nothing to
> prevent another CPU allocating this page from its per-cpu list while
> the private field is getting updated from here which might lead to
> some interesting races.
>
> I recognise that what you are trying to do is respond to Gilad's
> request that you really check if an IPI here is necessary. I think what
> you need to do is check if a page with a count of 0 is encountered
> and if it is, then a draining of the per-cpu lists is necessary. To
> address Gilad's concerns, be sure to only this this once per attempt at
> CMA rather than for every page encountered with a count of 0 to avoid a
> storm of IPIs.

It's actually more then that.

This is the same issue that I first fixed with a change to free_pcppages_bulk()
function[1].  At the time of positing, you said you'd like me to try and find
a different solution which would not involve paying the price of calling
get_pageblock_migratetype().  Later I also realised that this solution is
not enough.

[1] http://article.gmane.org/gmane.linux.kernel.mm/70314

My next attempt was to run drain PCP list while holding zone->lock[2], but that
quickly proven to be broken approach when Marek started testing it on an SMP
system.

[2] http://article.gmane.org/gmane.linux.kernel.mm/72016

This patch is yet another attempt of solving this old issue.  Even though it has
a potential race condition we came to conclusion that the actual chances of
causing any problems are slim.  Various stress tests did not, in fact, show
the race to be an issue.

The problem is that if a page is on a PCP list, and it's underlaying pageblocks'
migrate type is changed to MIGRATE_ISOLATE, the page (i) will still remain on PCP
list and thus someone can allocate it, and (ii) when removed from PCP list, the
page will be put on freelist of migrate type it had prior to change.

(i) is actually not such a big issue since the next thing that happens after
isolation is migration so all the pages will get freed.  (ii) is actual problem
and if [1] is not an acceptable solution I really don't have a good fix for that.

One things that comes to mind is calling drain_all_pages() prior to acquiring
zone->lock in set_migratetype_isolate().  This is however prone to races since
after the drain and before the zone->lock is acquired, pages might get moved
back to PCP list.

Draining PCP list after acquiring zone->lock is not possible because
smp_call_function_many() cannot be called with interrupts disabled, and changing
spin_lock_irqsave() to spin_lock() followed by local_irq_save() causes a dead
lock (that's what [2] attempted to do).

Any suggestions are welcome!

>> +		} else {
>> +			++pfn;
>> +		}
>> +	}
>> +}

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-01-30 15:41 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
2012-01-26  9:00 ` [PATCH 01/15] mm: page_alloc: remove trailing whitespace Marek Szyprowski
2012-01-30 10:59   ` Mel Gorman
2012-01-26  9:00 ` [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating Marek Szyprowski
2012-01-30 11:15   ` Mel Gorman
2012-01-30 15:41     ` Michal Nazarewicz [this message]
2012-01-30 16:14       ` Mel Gorman
2012-01-31 16:23         ` Marek Szyprowski
2012-02-02 12:47           ` Mel Gorman
2012-02-02 19:53             ` Michal Nazarewicz
2012-02-03  9:31               ` Marek Szyprowski
2012-02-03 11:27               ` Mel Gorman
2012-01-26  9:00 ` [PATCH 03/15] mm: compaction: introduce isolate_migratepages_range() Marek Szyprowski
2012-01-30 11:24   ` Mel Gorman
2012-01-30 12:42     ` Michal Nazarewicz
2012-01-30 13:25       ` Mel Gorman
2012-01-26  9:00 ` [PATCH 04/15] mm: compaction: introduce isolate_freepages_range() Marek Szyprowski
2012-01-30 11:48   ` Mel Gorman
2012-01-30 11:55     ` Mel Gorman
2012-01-26  9:00 ` [PATCH 05/15] mm: compaction: export some of the functions Marek Szyprowski
2012-01-30 11:57   ` Mel Gorman
2012-01-30 12:33     ` Michal Nazarewicz
2012-01-26  9:00 ` [PATCH 06/15] mm: page_alloc: introduce alloc_contig_range() Marek Szyprowski
2012-01-30 12:11   ` Mel Gorman
2012-01-26  9:00 ` [PATCH 07/15] mm: page_alloc: change fallbacks array handling Marek Szyprowski
2012-01-30 12:12   ` Mel Gorman
2012-01-26  9:00 ` [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added Marek Szyprowski
2012-01-30 12:35   ` Mel Gorman
2012-01-30 13:06     ` Michal Nazarewicz
2012-01-30 14:52       ` Mel Gorman
2012-01-26  9:00 ` [PATCH 09/15] mm: page_isolation: MIGRATE_CMA isolation functions added Marek Szyprowski
2012-01-26  9:00 ` [PATCH 10/15] mm: extract reclaim code from __alloc_pages_direct_reclaim() Marek Szyprowski
2012-01-30 12:42   ` Mel Gorman
2012-01-26  9:00 ` [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks Marek Szyprowski
2012-01-30 13:05   ` Mel Gorman
2012-01-31 17:15     ` Marek Szyprowski
2012-01-26  9:00 ` [PATCH 12/15] drivers: add Contiguous Memory Allocator Marek Szyprowski
2012-01-27  9:44   ` [Linaro-mm-sig] " Ohad Ben-Cohen
2012-01-27 10:53     ` Marek Szyprowski
2012-01-27 14:27       ` Clark, Rob
2012-01-27 14:51         ` Marek Szyprowski
2012-01-27 14:59           ` Ohad Ben-Cohen
2012-01-27 15:17             ` Marek Szyprowski
2012-01-28 18:57               ` Ohad Ben-Cohen
2012-01-30  7:43                 ` Marek Szyprowski
2012-01-30  9:16                   ` Ohad Ben-Cohen
2012-01-27 14:56       ` Ohad Ben-Cohen
2012-01-26  9:00 ` [PATCH 13/15] X86: integrate CMA with DMA-mapping subsystem Marek Szyprowski
2012-01-26  9:00 ` [PATCH 14/15] ARM: " Marek Szyprowski
2012-01-26  9:00 ` [PATCH 15/15] ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device Marek Szyprowski
2012-01-26 15:31 ` [PATCHv19 00/15] Contiguous Memory Allocator Arnd Bergmann
2012-01-26 15:38   ` Michal Nazarewicz
2012-01-26 15:48   ` Marek Szyprowski
2012-01-28  0:26   ` Andrew Morton
2012-01-29 18:09     ` Rob Clark
2012-01-29 20:32       ` Anca Emanuel
2012-01-29 20:51     ` Arnd Bergmann
2012-01-30 13:25     ` Mel Gorman
2012-01-30 15:43       ` Michal Nazarewicz
2012-01-31 17:17         ` Benjamin Gaignard
2012-02-01  8:47           ` Marek Szyprowski
2012-02-10 18:10     ` Marek Szyprowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=op.v8wlu8ws3l0zgt@mpn-glaptop \
    --to=mina86@mina86.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=benjamin.gaignard@linaro.org \
    --cc=chunsang.jeong@linaro.org \
    --cc=corbet@lwn.net \
    --cc=dave@linux.vnet.ibm.com \
    --cc=dwalker@codeaurora.org \
    --cc=jesse.barker@linaro.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kyungmin.park@samsung.com \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@arm.linux.org.uk \
    --cc=m.szyprowski@samsung.com \
    --cc=mel@csn.ul.ie \
    --cc=shariq.hasnain@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox