linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Robin H. Johnson" <robbat2@gentoo.org>
To: Kumar Abhishek <kumar.abhishek.kakkar@gmail.com>
Cc: robbat2@orbis-terrarum.net, Michal Hocko <mhocko@kernel.org>,
	linux-kernel@vger.kernel.org, robbat2@gentoo.org,
	linux-mm@kvack.org, mina86@mina86.com
Subject: Re: Regarding your thread on LKML - drm_radeon spamming alloc_contig_range [WAS: Re: PROBLEM-PERSISTS: dmesg spam: alloc_contig_range: [XX, YY) PFNs busy]
Date: Thu, 29 Jun 2017 17:47:05 +0000	[thread overview]
Message-ID: <20170629174705.GN23586@orbis-terrarum.net> (raw)
In-Reply-To: <CADK6UNEQ+WuKDRyUVPQ1RwOWCkvcU95OBh4obKj4dv62Kf5ipA@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 2605 bytes --]

CC'd back to LKML.

On Thu, Jun 29, 2017 at 06:11:00PM +0530, Kumar Abhishek wrote:
> Hi Robin,
> 
> I am an independent developer who stumbled upon your thread on the LKML
> after facing a similar issue - my kernel log being spammed by
> alloc_contig_range messages. I am running Linux on an ARM system
> (specifically the BeagleBoard-X15) and am on kernel version 4.9.33 with TI
> patches on top of it.
> 
> I am running Debian Stretch (9.0) on the system.
> 
> Here's what my stack trace looks like:
..
> 
> It's somewhat similar to your stack trace, but this here happens on an
> etnaviv GPU (Vivante GCxx).
> 
> In my case if I do 'sudo service lightdm stop', these messages stop too.
> This seems to suggest that the problem may be in the X server rather than
> the kernel? I seem to think this because I replicated this on an entirely
> different set of hardware than yours.
> 
> I just wanted to bring this to your notice, and also ask you if you managed
> to solve it for yourself.
> 
> One solution could be to demote the pr_info in alloc_contig_range to
> pr_debug or to do away with the message altogether, but this would be
> suppressing the issue instead of really knowing what it is about.
> 
> Let me know how I could further investigate this.
The problem, as far as I got diagnosed on LKML, is that some of the GPUs
have a bunch of non-fatal contiguous memory allocation requests: they
have a meaningful fallback path on the allocation, so 'PFNs busy' is a
false busy for their case.

However, if there was a another consumer that does NOT have a fallback,
the output would still be crucially useful.

Attached is the patch that I unsuccessfully proposed on LKML to
rate-limit the messages, with the last revision to only dump_stack() if
CONFIG_CMA_DEBUG was set.

The path that LKML wanted was to add a new parameter to suppress or at
least demote the failure message, and update all of the callers: but it
means that many of the indirect callers need that added parameter as
well.

mm/cma.c:cma_alloc this call can suppress the error, you can see it retry.
mm/hugetlb.c: These callers should get the error message.

The error message DOES still have a good general use in notifying you
that something is going wrong. There was noticeable performance slowdown
in my case when it was trying hard to allocate.

-- 
Robin Hugh Johnson
E-Mail     : robbat2@orbis-terrarum.net
Home Page  : http://www.orbis-terrarum.net/?l=people.robbat2
ICQ#       : 30269588 or 41961639
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85

[-- Attachment #1.2: 000-despam-pfn-busy.patch --]
[-- Type: text/x-diff, Size: 1047 bytes --]

commit 808c209dc82ce79147122ca78e7047bc74a16149
Author: Robin H. Johnson <robbat2@gentoo.org>
Date:   Wed Nov 30 10:32:57 2016 -0800

    mm: ratelimit & trace PFNs busy.
    
    Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
	Acked-by: Michal Nazarewicz <mina86@mina86.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6de9440e3ae2..3c28ec3d18f8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7289,8 +7289,16 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	/* Make sure the range is really isolated. */
 	if (test_pages_isolated(outer_start, end, false)) {
-		pr_info("%s: [%lx, %lx) PFNs busy\n",
-			__func__, outer_start, end);
+		static DEFINE_RATELIMIT_STATE(ratelimit_pfn_busy,
+					DEFAULT_RATELIMIT_INTERVAL,
+					DEFAULT_RATELIMIT_BURST);
+		if (__ratelimit(&ratelimit_pfn_busy)) {
+			pr_info("%s: [%lx, %lx) PFNs busy\n",
+				__func__, outer_start, end);
+			if (IS_ENABLED(CONFIG_CMA_DEBUG))
+				dump_stack();
+		}
+
 		ret = -EBUSY;
 		goto done;
 	}

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

       reply	other threads:[~2017-06-29 17:47 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CADK6UNEQ+WuKDRyUVPQ1RwOWCkvcU95OBh4obKj4dv62Kf5ipA@mail.gmail.com>
2017-06-29 17:47 ` Robin H. Johnson [this message]
2017-06-30  8:46   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170629174705.GN23586@orbis-terrarum.net \
    --to=robbat2@gentoo.org \
    --cc=kumar.abhishek.kakkar@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mina86@mina86.com \
    --cc=robbat2@orbis-terrarum.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox