From: Michal Hocko <mhocko@kernel.org>
To: Wei Wang <wei.w.wang@intel.com>
Cc: linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
linux-mm@kvack.org, mst@redhat.com, mawilcox@microsoft.com,
akpm@linux-foundation.org, virtio-dev@lists.oasis-open.org,
david@redhat.com, cornelia.huck@de.ibm.com,
mgorman@techsingularity.net, aarcange@redhat.com,
amit.shah@redhat.com, pbonzini@redhat.com,
liliang.opensource@gmail.com, yang.zhang.wz@gmail.com,
quan.xu@aliyun.com
Subject: Re: [PATCH v13 4/5] mm: support reporting free page blocks
Date: Thu, 3 Aug 2017 11:11:51 +0200 [thread overview]
Message-ID: <20170803091151.GF12521@dhcp22.suse.cz> (raw)
In-Reply-To: <1501742299-4369-5-git-send-email-wei.w.wang@intel.com>
On Thu 03-08-17 14:38:18, Wei Wang wrote:
> This patch adds support to walk through the free page blocks in the
> system and report them via a callback function. Some page blocks may
> leave the free list after the report function returns, so it is the
> caller's responsibility to either detect or prevent the use of such
> pages.
>
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> ---
> include/linux/mm.h | 7 ++++
> include/linux/mmzone.h | 5 +++
> mm/page_alloc.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 121 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 46b9ac5..24481e3 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1835,6 +1835,13 @@ extern void free_area_init_node(int nid, unsigned long * zones_size,
> unsigned long zone_start_pfn, unsigned long *zholes_size);
> extern void free_initmem(void);
>
> +#if IS_ENABLED(CONFIG_VIRTIO_BALLOON)
> +extern void walk_free_mem_block(void *opaque1,
> + unsigned int min_order,
> + void (*visit)(void *opaque2,
> + unsigned long pfn,
> + unsigned long nr_pages));
> +#endif
Is the ifdef necessary. Sure only virtio balloon driver will use this
currently but this looks like a generic functionality not specific to
virtio at all so the ifdef is rather confusing.
> /*
> * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
> * into the buddy system. The freed pages will be poisoned with pattern
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fc14b8b..59eacf2 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -83,6 +83,11 @@ static inline bool is_migrate_movable(int mt)
> for (order = 0; order < MAX_ORDER; order++) \
> for (type = 0; type < MIGRATE_TYPES; type++)
>
> +#define for_each_migratetype_order_decend(min_order, order, type) \
> + for (order = MAX_ORDER - 1; order < MAX_ORDER && order >= min_order; \
> + order--) \
> + for (type = 0; type < MIGRATE_TYPES; type++)
> +
Is there going to be any other user outside of mm/page_alloc.c? If not
then do not export this.
> extern int page_group_by_mobility_disabled;
>
> #define NR_MIGRATETYPE_BITS (PB_migrate_end - PB_migrate + 1)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6d30e91..b90b513 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4761,6 +4761,115 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
> show_swap_cache_info();
> }
>
> +#if IS_ENABLED(CONFIG_VIRTIO_BALLOON)
> +
> +/*
> + * Heuristically get a free page block in the system.
> + *
> + * It is possible that pages from the page block are used immediately after
> + * report_free_page_block() returns. It is the caller's responsibility to
> + * either detect or prevent the use of such pages.
> + *
> + * The input parameters specify the free list to check for a free page block:
> + * zone->free_area[order].free_list[migratetype]
> + *
> + * If the caller supplied page block (i.e. **page) is on the free list, offer
> + * the next page block on the list to the caller. Otherwise, offer the first
> + * page block on the list.
> + *
> + * Return 0 when a page block is found on the caller specified free list.
> + * Otherwise, no page block is found.
> + */
> +static int report_free_page_block(struct zone *zone, unsigned int order,
> + unsigned int migratetype, struct page **page)
This is just too ugly and wrong actually. Never provide struct page
pointers outside of the zone->lock. What I've had in mind was to simply
walk free lists of the suitable order and call the callback for each one.
Something as simple as
for (i = 0; i < MAX_NR_ZONES; i++) {
struct zone *zone = &pgdat->node_zones[i];
if (!populated_zone(zone))
continue;
spin_lock_irqsave(&zone->lock, flags);
for (order = min_order; order < MAX_ORDER; ++order) {
struct free_area *free_area = &zone->free_area[order];
enum migratetype mt;
struct page *page;
if (!free_area->nr_pages)
continue;
for_each_migratetype_order(order, mt) {
list_for_each_entry(page,
&free_area->free_list[mt], lru) {
pfn = page_to_pfn(page);
visit(opaque2, prn, 1<<order);
}
}
}
spin_unlock_irqrestore(&zone->lock, flags);
}
[...]
> +/*
> + * Walk through the free page blocks in the system. The @visit callback is
> + * invoked to handle each free page block.
> + *
> + * Note: some page blocks may be used after the report function returns, so it
> + * is not safe for the callback to use any pages or discard data on such page
> + * blocks.
> + */
> +void walk_free_mem_block(void *opaque1,
> + unsigned int min_order,
> + void (*visit)(void *opaque2,
> + unsigned long pfn,
> + unsigned long nr_pages))
Is there any reason why there is no node id? I guess you just do not
care for your particular use case. Not that I care too much either. If
somebody wants this per node then it would be trivial to extend I was
just wondering whether this is a deliberate decision or an omission.
> +{
> + struct zone *zone = NULL;
> + struct page *page = NULL;
> + unsigned int order;
> + unsigned long pfn, nr_pages;
> + int type;
> +
> + for_each_populated_zone(zone) {
> + for_each_migratetype_order_decend(min_order, order, type) {
> + while (!report_free_page_block(zone, order, type,
> + &page)) {
> + pfn = page_to_pfn(page);
> + nr_pages = 1 << order;
> + visit(opaque1, pfn, nr_pages);
> + }
> + }
> + }
> +}
> +EXPORT_SYMBOL_GPL(walk_free_mem_block);
> +
> +#endif
> +
> static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
> {
> zoneref->zone = zone;
> --
> 2.7.4
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-08-03 9:11 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-03 6:38 [PATCH v13 0/5] Virtio-balloon Enhancement Wei Wang
2017-08-03 6:38 ` [PATCH v13 1/5] Introduce xbitmap Wei Wang
2017-08-07 6:58 ` Wei Wang
2017-08-09 21:36 ` Andrew Morton
2017-08-10 5:59 ` Wei Wang
2017-08-03 6:38 ` [PATCH v13 2/5] xbitmap: add xb_find_next_bit() and xb_zero() Wei Wang
2017-08-03 6:38 ` [PATCH v13 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG Wei Wang
2017-08-03 14:22 ` Michael S. Tsirkin
2017-08-03 15:17 ` Wang, Wei W
2017-08-03 15:55 ` Michael S. Tsirkin
2017-08-03 6:38 ` [PATCH v13 4/5] mm: support reporting free page blocks Wei Wang
2017-08-03 9:11 ` Michal Hocko [this message]
2017-08-03 10:42 ` Wei Wang
2017-08-03 10:44 ` Michal Hocko
2017-08-03 11:27 ` Wei Wang
2017-08-03 11:28 ` Michal Hocko
2017-08-03 12:11 ` Wei Wang
2017-08-03 12:41 ` Michal Hocko
2017-08-03 13:17 ` Wei Wang
2017-08-03 13:50 ` Michal Hocko
2017-08-03 15:20 ` Wang, Wei W
2017-08-03 21:02 ` Michael S. Tsirkin
2017-08-04 7:53 ` Michal Hocko
2017-08-04 8:15 ` Wei Wang
2017-08-04 8:24 ` Michal Hocko
2017-08-04 8:55 ` Wei Wang
2017-08-08 6:12 ` Wei Wang
2017-08-08 6:34 ` [virtio-dev] " Wei Wang
2017-08-10 7:05 ` Michal Hocko
2017-08-10 7:38 ` Wei Wang
2017-08-10 7:53 ` Michal Hocko
2017-08-03 6:38 ` [PATCH v13 5/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ Wei Wang
2017-08-03 8:13 ` Pankaj Gupta
2017-08-03 12:28 ` Wei Wang
2017-08-03 13:05 ` Pankaj Gupta
2017-08-03 13:21 ` Wei Wang
2017-08-03 12:33 ` Michael S. Tsirkin
2017-08-03 16:11 ` kbuild test robot
2017-08-16 5:57 ` [virtio-dev] [PATCH v13 0/5] Virtio-balloon Enhancement Adam Tao
2017-08-16 9:33 ` Wei Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170803091151.GF12521@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=amit.shah@redhat.com \
--cc=cornelia.huck@de.ibm.com \
--cc=david@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=liliang.opensource@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mawilcox@microsoft.com \
--cc=mgorman@techsingularity.net \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=quan.xu@aliyun.com \
--cc=virtio-dev@lists.oasis-open.org \
--cc=virtualization@lists.linux-foundation.org \
--cc=wei.w.wang@intel.com \
--cc=yang.zhang.wz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox