From: Waiman Long <longman@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>, Roman Gushchin <guro@fb.com>,
Vlastimil Babka <vbabka@suse.cz>,
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
Jann Horn <jannh@google.com>, Song Liu <songliubraving@fb.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Rafael Aquini <aquini@redhat.com>,
Waiman Long <longman@redhat.com>
Subject: [PATCH] mm/vmstat: Reduce zone lock hold time when reading /proc/pagetypeinfo
Date: Tue, 22 Oct 2019 12:21:56 -0400 [thread overview]
Message-ID: <20191022162156.17316-1-longman@redhat.com> (raw)
The pagetypeinfo_showfree_print() function prints out the number of
free blocks for each of the page orders and migrate types. The current
code just iterates the each of the free lists to get counts. There are
bug reports about hard lockup panics when reading the /proc/pagetyeinfo
file just because it look too long to iterate all the free lists within
a zone while holing the zone lock with irq disabled.
Given the fact that /proc/pagetypeinfo is readable by all, the possiblity
of crashing a system by the simple act of reading /proc/pagetypeinfo
by any user is a security problem that needs to be addressed.
There is a free_area structure associated with each page order. There
is also a nr_free count within the free_area for all the different
migration types combined. Tracking the number of free list entries
for each migration type will probably add some overhead to the fast
paths like moving pages from one migration type to another which may
not be desirable.
we can actually skip iterating the list of one of the migration types
and used nr_free to compute the missing count. Since MIGRATE_MOVABLE
is usually the largest one on large memory systems, this is the one
to be skipped. Since the printing order is migration-type => order, we
will have to store the counts in an internal 2D array before printing
them out.
Even by skipping the MIGRATE_MOVABLE pages, we may still be holding the
zone lock for too long blocking out other zone lock waiters from being
run. This can be problematic for systems with large amount of memory.
So a check is added to temporarily release the lock and reschedule if
more than 64k of list entries have been iterated for each order. With
a MAX_ORDER of 11, the worst case will be iterating about 700k of list
entries before releasing the lock.
Signed-off-by: Waiman Long <longman@redhat.com>
---
mm/vmstat.c | 51 +++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 41 insertions(+), 10 deletions(-)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 6afc892a148a..40c9a1494709 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1373,23 +1373,54 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
pg_data_t *pgdat, struct zone *zone)
{
int order, mtype;
+ unsigned long nfree[MAX_ORDER][MIGRATE_TYPES];
- for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
- seq_printf(m, "Node %4d, zone %8s, type %12s ",
- pgdat->node_id,
- zone->name,
- migratetype_names[mtype]);
- for (order = 0; order < MAX_ORDER; ++order) {
+ lockdep_assert_held(&zone->lock);
+ lockdep_assert_irqs_disabled();
+
+ /*
+ * MIGRATE_MOVABLE is usually the largest one in large memory
+ * systems. We skip iterating that list. Instead, we compute it by
+ * subtracting the total of the rests from free_area->nr_free.
+ */
+ for (order = 0; order < MAX_ORDER; ++order) {
+ unsigned long nr_total = 0;
+ struct free_area *area = &(zone->free_area[order]);
+
+ for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
unsigned long freecount = 0;
- struct free_area *area;
struct list_head *curr;
- area = &(zone->free_area[order]);
-
+ if (mtype == MIGRATE_MOVABLE)
+ continue;
list_for_each(curr, &area->free_list[mtype])
freecount++;
- seq_printf(m, "%6lu ", freecount);
+ nfree[order][mtype] = freecount;
+ nr_total += freecount;
}
+ nfree[order][MIGRATE_MOVABLE] = area->nr_free - nr_total;
+
+ /*
+ * If we have already iterated more than 64k of list
+ * entries, we might have hold the zone lock for too long.
+ * Temporarily release the lock and reschedule before
+ * continuing so that other lock waiters have a chance
+ * to run.
+ */
+ if (nr_total > (1 << 16)) {
+ spin_unlock_irq(&zone->lock);
+ cond_resched();
+ spin_lock_irq(&zone->lock);
+ }
+ }
+
+ for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
+ seq_printf(m, "Node %4d, zone %8s, type %12s ",
+ pgdat->node_id,
+ zone->name,
+ migratetype_names[mtype]);
+ for (order = 0; order < MAX_ORDER; ++order)
+ seq_printf(m, "%6lu ", nfree[order][mtype]);
seq_putc(m, '\n');
}
}
--
2.18.1
next reply other threads:[~2019-10-22 16:22 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-22 16:21 Waiman Long [this message]
2019-10-22 16:57 ` Michal Hocko
2019-10-22 18:00 ` Waiman Long
2019-10-22 18:40 ` Waiman Long
2019-10-23 0:52 ` David Rientjes
2019-10-23 8:31 ` Mel Gorman
2019-10-23 9:04 ` Michal Hocko
2019-10-23 9:56 ` Mel Gorman
2019-10-23 10:27 ` [RFC PATCH 0/2] " Michal Hocko
2019-10-23 10:27 ` [RFC PATCH 1/2] mm, vmstat: hide /proc/pagetypeinfo from normal users Michal Hocko
2019-10-23 13:13 ` Mel Gorman
2019-10-23 13:27 ` Vlastimil Babka
2019-10-23 14:52 ` Waiman Long
2019-10-23 15:10 ` Rafael Aquini
2019-10-23 16:15 ` Vlastimil Babka
2019-10-24 19:01 ` David Rientjes
2019-10-23 10:27 ` [RFC PATCH 2/2] mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo Michal Hocko
2019-10-23 13:15 ` Mel Gorman
2019-10-23 13:32 ` Vlastimil Babka
2019-10-23 13:37 ` Michal Hocko
2019-10-23 13:48 ` Vlastimil Babka
2019-10-23 14:31 ` Michal Hocko
2019-10-23 16:20 ` Vlastimil Babka
2019-10-23 13:46 ` Rafael Aquini
2019-10-23 14:56 ` Waiman Long
2019-10-23 15:21 ` Waiman Long
2019-10-23 16:10 ` Michal Hocko
2019-10-23 16:17 ` Waiman Long
2019-10-23 16:21 ` Waiman Long
2019-10-23 16:15 ` Vlastimil Babka
2019-10-23 16:41 ` Michal Hocko
2019-10-23 16:47 ` Waiman Long
2019-10-23 17:34 ` [PATCH 1/2] mm, vmstat: Release zone lock more frequently when reading /proc/pagetypeinfo Waiman Long
2019-10-23 18:01 ` Michal Hocko
2019-10-23 18:14 ` Waiman Long
2019-10-23 20:02 ` Michal Hocko
2019-10-23 17:34 ` [PATCH 2/2] mm, vmstat: List total free blocks for each order in /proc/pagetypeinfo Waiman Long
2019-10-23 18:02 ` Michal Hocko
2019-10-23 18:07 ` Waiman Long
2019-10-24 8:20 ` [RFC PATCH 0/2] mm/vmstat: Reduce zone lock hold time when reading /proc/pagetypeinfo Michal Hocko
2019-10-24 16:16 ` Waiman Long
2019-10-23 12:42 ` [PATCH] " Qian Cai
2019-10-23 13:25 ` Vlastimil Babka
2019-10-22 21:59 ` Andrew Morton
2019-10-23 6:15 ` Michal Hocko
2019-10-23 14:30 ` Waiman Long
2019-10-23 14:48 ` Qian Cai
2019-10-23 15:01 ` Waiman Long
2019-10-23 15:05 ` Qian Cai
2019-10-23 22:30 ` Andrew Morton
2019-10-24 5:33 ` Qian Cai
2019-10-24 7:42 ` Michal Hocko
2019-10-24 11:11 ` Qian Cai
2019-10-24 13:38 ` Michal Hocko
2019-10-24 14:55 ` Qian Cai
2019-10-24 3:33 ` Feng Tang
2019-10-24 4:34 ` Qian Cai
2019-10-24 5:34 ` Feng Tang
2019-10-24 10:51 ` Qian Cai
2019-10-25 1:38 ` Feng Tang
2019-10-23 15:03 ` Rafael Aquini
2019-10-23 15:51 ` Qian Cai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191022162156.17316-1-longman@redhat.com \
--to=longman@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aquini@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=jannh@google.com \
--cc=khlebnikov@yandex-team.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=songliubraving@fb.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox