From: Mel Gorman <mel@csn.ul.ie>
To: Frans Pop <elendil@planet.nl>
Cc: David Rientjes <rientjes@google.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
"Rafael J. Wysocki" <rjw@sisk.pl>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Kernel Testers List <kernel-testers@vger.kernel.org>,
Pekka Enberg <penberg@cs.helsinki.fi>,
Reinette Chatre <reinette.chatre@intel.com>,
Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>,
Karol Lewandowski <karol.k.lewandowski@gmail.com>,
Mohamed Abbas <mohamed.abbas@intel.com>,
"John W. Linville" <linville@tuxdriver.com>,
linux-mm@kvack.org
Subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn
Date: Mon, 12 Oct 2009 14:43:28 +0100 [thread overview]
Message-ID: <20091012134328.GB8200@csn.ul.ie> (raw)
In-Reply-To: <200910120110.28061.elendil@planet.nl>
On Mon, Oct 12, 2009 at 01:10:25AM +0200, Frans Pop wrote:
> Sorry for going quiet on this issue for a few days, but I have been
> spending *a lot* of time on it. I've done what amounts to 5 bisection
> rounds at ~20 minutes per iteration and in total over 80 boots.
>
> The problem with my first bisection was that there are *at least two*
> changes at the root of this issue, both committed between .30 and .30-rc1.
> Because of this a normal bisection will not lead to a reliable result and
> even with my last effort I can only narrow it down to two different areas,
> and not 100% to specific commits.
>
Thanks very much for your detailed work on this.
> The two identified areas are:
> 1) a wireless merge which causes the SKB errors to appear in the first
> place, but not always;
> 2) an mm merge which makes the SKB errors occur *much* quicker; IMHO this
> is the change that also causes the regressions reported by Pekka and
> Karol.
>
> So below my results. The issue is both complex and subtle. Now it's up to
> you, domain experts for both mm *and* wireless/networking, to make sense of
> it all and come up with suggestions on how to proceed.
>
> I've improved my test and it's now a lot more reliable, but there are still
> timing influences.
The timing influences is probably because kswapd is working from the
time memory gets full. High-order allocation failures would cause it to
start reclaiming at that order so it's a race always to see can it do
its work before an atomic allocation fails or not.
> Also, because this is all merge-window stuff, I'm
> hitting quite a few minor and major regressions between commits that can
> affect tests.
>
> Please study the information below carefully. I know it's long, but I think
> this issue justifies that.
>
Agreed. I'll be looking at commits, both wireless and mm but obviously
anything I saw about wireless needs to be taken with a generous dose of
salt.
> On Monday 05 October 2009, Frans Pop wrote:
> > This looks conclusive. I tested .30 and .32-rc3 from clean reboots and
> > only starting gitk. I only started music playing in the background
> > (amarok) from an NFS share to ensure network activity.
> >
> > With .32-rc3 I got 4 SKB allocation errors while starting the *second*
> > gitk instance. And the system was completely frozen with music stopped
> > until gitk finished loading.
>
> With .32-rc3, .31.1 and vanilla .31 I will get multiple SKB allocation
> errors the *first time* I run the test, *every* time.
>
So, this remains a current problem that wasn't solved by accident.
> > With .30 I was able to start *three* gitk's (which meant 2 of them got
> > (partially) swapped out) without any allocation errors. And with the
> > system remaining relatively responsive. There was a short break in the
> > music while I started the 2nd instance, but it just continued playing
> > afterwards. There was also some mild latency in the mouse cursor, but
> > nothing like the full desktop freeze I get with .32-rc3.
>
> With both .30.2 and vanilla .30 I have *never* been able to get any SKB
> allocation errors. No matter how often I repeat the test.
>
> So, the start and end position are 100% reproducible. Problem is that this
> changes during the bisection. At some point the test will fail (no SKB
> errors) the first time I run it, but it will fail on the second or third
> attempt.
> Apparently at some point memory must already be fragmented (or higher
> orders already used up) to some extend for the errors to trigger.
>
That is a reasonable assessment. It could be because
1. Something in the intevening commits greatly increases the number of
GFP_ATOMIC allocations that are occuring. It's a pity that the allocator
tracepoints are not available in those kernels. It would have made
investigating this theory easier.
2. kswapd is no longer reclaiming high-order pages as well as it used
to be it due to changes in kswapd itself or lumpy reclaim
3. Fragmentation avoidance has been broken in some subtle manner
I think 3 is particularly unlikely and am expecting it to be 1 or 2.
> TEST METHOD
> -----------
> As a normal bisection (I tried 3 times...) did not lead anywhere, I had to
> think of an alternative approach. I decided to start by manually selecting
> merges by Linus into mainline. The advantage is that that makes the
> bisection linear and makes it a lot easier to see patterns.
> After narrowing down to a specific merge, I bisected (again semi-manually)
> inside that merge.
>
> Because I suspected there were multiple changes involved, I deliberately
> tried to find two points:
> - where do I first start seeing SKB errors at all, even if it is only at
> the second or third try;
> - where do I start getting SKB errors reliably on the first try.
>
> I worked from "good" to "bad", i.e. I started at .30. The merges were not
> chosen completely randomly. From the first 3 bisections I strongly
> suspected the first 'net-next' merge and the first 'akpm' merge, but I did
> make sure to confirm that suspicion.
>
A very good approach.
> TEST DESCRIPTION
> ----------------
> The test I've ended up using is:
> 1) clean boot
> 2) start music in amarok from NFS share; use very long song to avoid file
> changes and thus ensure a fluent stream of network data during the test
> 3) start 'gitk v2.6.29..master &' - to use up some memory
> 4) start first 'gitk master &' - after this all normal memory is as good as
> used up, with minor swap; this never resulted in SKB errors
> 5) start second 'gitk master &' - this causes heavy swapping (>700 MB) and
> is the real test
> 6) if there were no SKB errors after 5), kill the gitk processes and repeat
> steps 3) to 5). I've done this up to 4 times in some cases
> 7) if the results are not clear or when there is doubt later, repeat from
> step 1) with same kernel
>
> Memory after initial 'gitk v2.6.29..master &':
> total used free shared buffers cached
> Mem: 2030776 1153008 877768 0 41572 333968
> -/+ buffers/cache: 777468 1253308
> Swap: 2097144 0 2097144
>
> Memory after first 'gitk master &':
> total used free shared buffers cached
> Mem: 2030776 1979040 51736 0 35684 238420
> -/+ buffers/cache: 1704936 325840
> Swap: 2097144 21876 2075268
>
> Memory after second 'gitk master &' (with .30.2):
> total used free shared buffers cached
> Mem: 2030776 2011608 19168 0 21836 92336
> -/+ buffers/cache: 1897436 133340
> Swap: 2097144 776160 1320984
>
> OVERVIEW OF RESULTS
> -------------------
> Below I list the most relevant merges and commits. Note that they are
> listed in commit order; my kernel version shows the order of testing.
>
> For the commits I tested the test results are listed on the next line.
> The first number on that line consists of the test series + the iteration
> (and also identifies the kernel I used).
> A "+" means I got no SKB errors, a "-" that I did get them. A "|" means I
> rebooted for a second series of tests.
>
> v2.6.30-2330-gdb8e7f1 'x86-fixes-for-linus' of linux-2.6-tip
> 1.1 +++ iwlagn sw-error during first test
> v2.6.30-4127-g0fa2133 'merge' of powerpc (last merge before net-next-2.6)
> 1.2 +++
> v2.6.30-5398-g2ed0e21 net-next-2.6 (mega-merge!)
> 1.4 +- system reboot fails after testing
> v2.6.30-5517-g609106b 'merge' of powerpc
> 1.3 +- system reboot fails after testing
> v2.6.30-5927-gf83b1e6 'for-linus' of linux1394-2.6 (last merge before akpm)
> 2.2 ++-
> v2.6.30-6111-g517d086 'akpm'
> 2.1 -|-
>
> BISECTION OF net-next-2.6 MERGE
> -------------------------------
> Note that this merge was based not on .30 vanilla, but partly on
> v2.6.30-rc1 and partly on v2.6.30-rc6.
> I think this had an influence on the latencies I saw (i.e. because some
> post-rc6 bug fixes were not present it changes the general behavior of the
> system during the swapping). For example: with v2.6.30-4127-g0fa2133 the
> system remained more responsive (smaller music skips) than with
> v2.6.30-rc1-1219-g82d0481.
>
> I started again by testing merges, this time those by David.
>
> v2.6.30-rc1-1219-g82d0481 'master' of wireless-next-2.6
> 1.5 ++++ bad latencies
The bad latencies might imply that there are a lot more allocations
going on than there used to be. Maybe it was just because of a wireless
bug though that was later fixed.
> v2.6.30-rc6-660-gbb803cf 'master' of net-2.6
> v2.6.30-rc6-808-g45ea4ea 'master' of wireless-next-2.6
> v2.6.30-rc6-850-gc649c0e 'master' of net-2.6
> v2.6.30-rc6-922-g3f1f39c 'linux-2.6.31.y' of wimax
> v2.6.30-rc6-999-gb2f8f75 'master' of net-2.6
> v2.6.30-rc6-1028-ga8c617e 'net-next' of lksctp-dev
> 1.7 ++++|++++|++++
> I went back to this one twice because the bisection inside the
> next merge (see below) did not give a clear result.
> v2.6.30-rc6-1103-gb1bc81a 'master' of wireless-next-2.6
> 1.8 +-
> v2.6.30-rc6-1224-g84503dd 'master' of wireless-next-2.6
> 1.6 +-
>
> So the problem started in the v2.6.30-rc6-1103-gb1bc81a merge.
> I was unable to narrow it down to an exact commit; AFAICT the remaining
> ones (between v2.6.30-rc6-1028-g8fc0fee and v2.6.30-rc6-1032-g7ba10a8) are
> uninteresting. But it *must* be in this area!
>
> For a good overview of the area, use 'gitk 3f1f39c4..b1bc81a0'.
>
> v2.6.30-rc6-1028-g8fc0fee cfg80211: use key size constants
> 1.11 ++++
> v2.6.30-rc6-1031-g1bb5633 iwmc3200wifi: fix printk format
> 1.14 +++- not quite conclusive...
> v2.6.30-rc6-1032-g7ba10a8 mac80211: fix transposed min/max CW values
> 1.13 -
> This is a bugfix for aa837ee1d from an earlier merge! Could this maybe
> influence the test results in between? There are various SKB related
> changes there, for example: dfbf97f3..e5b9215e.
Maybe. Your commit id's are different to what I see. Maybe it's because your
tree has been shuffled around a bit but after some digging around in this
general area, I saw this patch
4752c93c30 iwlcore: Allow skb allocation from tasklet
This patch increases the number of GFP_ATOMIC allocations that can occur by
allocating GFP_ATOMIC in some cases and GFP_KERNEL in others. Previously,
only GFP_KERNEL was used and I didn't realise this allocation method was
so recent. Problems of this sort have cropped up before and while there
are later changes that suppress some of these warnings, I believe this is
a strong candidate for where the allocation failures started appearing.
> v2.6.30-rc6-1037-g2c5b9e5 wireless: libertas: fix unaligned accesses
> 1.12 +-
> v2.6.30-rc6-1044-g729e9c7 cfg80211: fix for duplicate userspace replies
> 1.10 +-
> v2.6.30-rc6-1075-gc587de0 iwlwifi: unify station management
> 1.9 ++-|+-
> v2.6.30-rc6-1076-gd14d444 iwl3945: port allow skb allocation in tasklet
> I thought this was a prime candidate, but as you can see several commits
> before failed too. Still worth looking at I think!
>
Your commit IDs are different to what I see but it's the commit merge at
b1bc81a0ef86b86fa410dd303d84c8c7bd09a64d. I agree that the last commit
(d14d44407b9f06e3cf967fcef28ccb780caf0583) could make the problem worse
because it expands the use of GFP_ATOMIC for another driver.
> BISECTION of akpm (mm) MERGE
> ----------------------------
> So here I went looking for "where does the test start failing on the first
> try". Again, I was unable to narrow it down to a single commit.
>
> For a good overview of the area, use 'gitk f83b1e61..517d0869'.
>
> v2.6.30-5466-ga1dd268 mm: use alloc_pages_exact in alloc_large_system_hash
> 2.3 +-
> v2.6.30-5478-ge9bb35d mm: setup_per_zone_inactive_ratio - fix comment and..
> 2.5 +-
> v2.6.30-5486-g35282a2 migration: only migrate_prep() once per move_pages()
> 2.6 -|+|- not quite conclusive...
> v2.6.30-5492-gbce7394 page-allocator: reset wmark_min and inactive ratio..
> 2.4 -|-
>
While I didn't spot anything too out of the ordinary here, they did occur
shortly after a number of other page allocator related patches. One small
thing I noticed there is that kswapd is getting woken up less now than it did
previously. Generally, I wouldn't have expected it to make a difference but
it's possible that kswapd is not being woken up to reclaim at a higher order
than it was previously. I have a patch for this below. It'd be nice if you
could apply it and see do fewer allocation failures occur on current mainline.
> WHERE NEXT?
> ===========
> I think the results confirm there is definitely an issue here and that my
> test is reliable and consistent enough to show it. And as it currently is
> the only test we have...
>
> I hope that the info above is enough for the mm and wireless domain
> experts to identify likely candidates in the areas I've identified.
>
> The next step could be trying specific reverts or debug patches, either on
> top of current git, or 2.6.31, or inside the identified areas.
> I'll run anything you care to throw at me and will try to provide any
> additional info you need, but at this point it's up to you.
>
For the wireless people in mainline - iwl_rx_replenish_now() is doing
a GFP_ATOMIC allocation that does not use __GFP_NOWARN. As part of
investigating allocation failures, iwl_rx_allocate() was taught to
distinguish between a benign and serious allocation failure - serious
being there are very few RX buffers left and packet loss could occur soon
(see commit f82a924cc88a5541df1d4b9d38a0968cd077a051). I think this GFP mask
should be made GFP_ATOMIC|__GFP_NOWARN so that warnings only appear when the
failure is serious, dump stack after the warning if you need it. I have a
feeling that almost all these warnings have been benign and are related to
the introduction of GFP_ATOMIC being used so heavily to move more expensive
allocations to the tasklet (presumably to reduce user-visible latency).
Frans, could you try the following kswapd-related patch please? I'd be
interested in seeing if the number of allocation failure warnings are
reduced with it. After that, could you edit
drivers/net/wireless/iwlwifi/iwl-rx.c and make the GFP_ATOMIC in
iwl_rx_replenish_now() GFP_ATOMIC|__GFP_NOWARN and see do any of the
"serious" allocation failure messages appear.
Thanks again for your persistence.
==== CUT HERE ====
next prev parent reply other threads:[~2009-10-12 13:43 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <3onW63eFtRF.A.xXH.oMTxKB@chimera>
[not found] ` <COE24pZSBH.A.k2B.ZNTxKB@chimera>
[not found] ` <200910021111.55749.elendil@planet.nl>
2009-10-05 5:13 ` Frans Pop
2009-10-05 6:50 ` Frans Pop
2009-10-05 8:54 ` Frans Pop
2009-10-05 8:57 ` Mel Gorman
2009-10-05 21:34 ` Frans Pop
2009-10-06 0:04 ` David Rientjes
2009-10-06 1:25 ` KOSAKI Motohiro
2009-10-06 8:53 ` Mel Gorman
2009-10-06 9:14 ` David Rientjes
2009-10-06 9:22 ` Mel Gorman
2009-10-06 10:23 ` Frans Pop
2009-10-11 23:10 ` Frans Pop
2009-10-11 23:36 ` Frans Pop
2009-10-12 13:43 ` Mel Gorman [this message]
2009-10-12 17:32 ` Frans Pop
2009-10-12 18:43 ` Mel Gorman
2009-10-13 20:38 ` Frans Pop
2009-10-14 10:30 ` Mel Gorman
2009-10-14 13:10 ` Frans Pop
2009-10-14 15:40 ` Mel Gorman
2009-10-14 16:13 ` Frans Pop
2009-10-14 18:34 ` Frans Pop
2009-10-14 23:56 ` Mel Gorman
2009-10-15 20:15 ` Frans Pop
2009-10-16 9:39 ` Mel Gorman
2009-10-14 16:30 ` reinette chatre
2009-10-18 23:33 ` Frans Pop
2009-10-19 0:36 ` Pekka Enberg
2009-10-19 2:44 ` Frans Pop
2009-10-19 9:49 ` [Bug #14141] order 2 page allocation failures (generic) Tobi Oetiker
2009-10-19 9:54 ` Pekka Enberg
2009-10-19 14:01 ` Karol Lewandowski
2009-10-19 14:06 ` Mel Gorman
2009-10-19 17:09 ` Karol Lewandowski
2009-10-20 1:47 ` Karol Lewandowski
2009-10-19 13:31 ` Mel Gorman
2009-10-19 13:40 ` Tobias Oetiker
2009-10-19 14:09 ` Mel Gorman
2009-10-19 14:16 ` Tobias Oetiker
2009-10-19 14:59 ` Mel Gorman
2009-10-19 20:12 ` Tobias Oetiker
2009-10-19 20:17 ` Tobias Oetiker
2009-10-20 10:57 ` Mel Gorman
2009-10-20 11:44 ` Tobias Oetiker
2009-10-20 12:51 ` Mel Gorman
2009-10-20 12:58 ` Tobias Oetiker
2009-10-20 13:39 ` Mel Gorman
2009-10-20 13:50 ` Tobias Oetiker
2009-10-20 14:14 ` Mel Gorman
2009-10-20 14:20 ` Tobias Oetiker
2009-10-22 10:27 ` Tobias Oetiker
2009-10-19 2:52 ` [Bug #14141] order 2 page allocation failures in iwlagn Jens Axboe
2009-10-19 14:01 ` Mel Gorman
2009-10-19 16:18 ` Chris Mason
2009-10-19 17:01 ` Christoph Hellwig
2009-10-19 21:57 ` Chris Mason
2009-10-19 17:01 ` Christoph Hellwig
2009-10-20 10:48 ` Mel Gorman
2009-10-20 10:48 ` Mel Gorman
2009-10-26 21:06 ` Frans Pop
2009-10-27 14:54 ` Mel Gorman
2009-10-27 15:16 ` KOSAKI Motohiro
2009-10-27 15:21 ` Mel Gorman
2009-10-27 15:52 ` Mel Gorman
2009-10-27 16:03 ` Chris Mason
2009-10-27 17:21 ` Frans Pop
2009-10-27 17:21 ` Frans Pop
2009-11-05 20:14 ` Frans Pop
2009-11-06 9:51 ` Frans Pop
2009-11-09 19:00 ` Mel Gorman
2009-10-25 18:54 ` Frans Pop
2009-10-14 16:28 ` reinette chatre
2009-10-14 16:50 ` Mel Gorman
2009-10-14 20:41 ` reinette chatre
2009-10-14 21:33 ` Frans Pop
2009-10-14 21:55 ` reinette chatre
2009-10-15 2:02 ` Frans Pop
2009-10-15 15:29 ` reinette chatre
2009-10-15 19:41 ` Frans Pop
2009-10-16 17:21 ` reinette chatre
2009-10-17 5:42 ` reinette chatre
2009-10-27 11:10 ` Frans Pop
2009-10-27 16:15 ` reinette chatre
[not found] ` <COE24pZSBH.A.rP.2MTxKB@chimera>
2009-10-21 20:04 ` [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) Karol Lewandowski
2009-10-21 21:06 ` David Rientjes
2009-10-21 21:20 ` Karol Lewandowski
2009-10-22 10:20 ` Mel Gorman
2009-10-22 21:33 ` Karol Lewandowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091012134328.GB8200@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=bzolnier@gmail.com \
--cc=elendil@planet.nl \
--cc=karol.k.lewandowski@gmail.com \
--cc=kernel-testers@vger.kernel.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linville@tuxdriver.com \
--cc=mohamed.abbas@intel.com \
--cc=penberg@cs.helsinki.fi \
--cc=reinette.chatre@intel.com \
--cc=rientjes@google.com \
--cc=rjw@sisk.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox