From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DC46CA0EC3 for ; Tue, 12 Sep 2023 13:47:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 232EA6B00F2; Tue, 12 Sep 2023 09:47:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E2626B00F3; Tue, 12 Sep 2023 09:47:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0AB276B00F4; Tue, 12 Sep 2023 09:47:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EEB7E6B00F2 for ; Tue, 12 Sep 2023 09:47:44 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8F491C0BE7 for ; Tue, 12 Sep 2023 13:47:44 +0000 (UTC) X-FDA: 81228073248.27.59A4F48 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf13.hostedemail.com (Postfix) with ESMTP id 6322120033 for ; Tue, 12 Sep 2023 13:47:42 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=zgVO3M6Z; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=sQ3kDJkC; spf=pass (imf13.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694526462; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ok+qQFg/kmYWw3PXWwcahlm+ztrJJOjH7MHqb1H6G/c=; b=NF8rbK4YTzYg5nClQpagZnQnwA74n+lVI6r3tgcUu4nwrZ70iYfieAGyoT8M38lXfw1vXq uR+Y/lUpqRFZgbpeuHD669g/AB5cQkTqjXRcdbkKVXhv4DKCqJb6XxcmuiVNWQVV+tEAuq YiQJxgMHQWoq+tSWDT+d0xAO5V5e4Xw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694526462; a=rsa-sha256; cv=none; b=xEM93EyLQt/GowWGUOrXJcGJ+HqT8ngrI10QIQ+LQ006bvCFF9NUpCvM4b6IZB/DEOFYKc 2uyoPsvQ4RI7iLNiuf5b3Oyy4jDFHrXzmwYWSsprEOibwpOK3pIm4DPkdEvpYMycBEeRjg C50WvDFExFQu8/tcQ4Y2akRehhTGsXg= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=zgVO3M6Z; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=sQ3kDJkC; spf=pass (imf13.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 253A3215D5; Tue, 12 Sep 2023 13:47:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1694526460; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ok+qQFg/kmYWw3PXWwcahlm+ztrJJOjH7MHqb1H6G/c=; b=zgVO3M6Zc7J4OxsB+bGcgvCrY2BS87brVfD7ifcu0ebMMBQMQjGtgDUPvUlVn/0FBQmlw5 bw6Xnp6S08gLM7qb2DJTv1hWLDHVaNl50hII8x5bkbCIi1sy4wF9c14NljMOxPhMPujyYb ka7DxHAZz7cxSFewWckT7u4PsFyk8hI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1694526460; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ok+qQFg/kmYWw3PXWwcahlm+ztrJJOjH7MHqb1H6G/c=; b=sQ3kDJkCLTN68Rl49m2ZWKYbdOo4aBvBSvrS42lQZV94n29djkcpdFVGagV9iUiUDm+CVg sZBlMPDBqgjTmhAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id F3225139DB; Tue, 12 Sep 2023 13:47:39 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id RsZXOvtrAGWJQQAAMHmgww (envelope-from ); Tue, 12 Sep 2023 13:47:39 +0000 Message-ID: Date: Tue, 12 Sep 2023 15:47:45 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching Content-Language: en-US To: Johannes Weiner , Andrew Morton Cc: Mel Gorman , Miaohe Lin , Kefeng Wang , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230911195023.247694-1-hannes@cmpxchg.org> <20230911195023.247694-2-hannes@cmpxchg.org> From: Vlastimil Babka In-Reply-To: <20230911195023.247694-2-hannes@cmpxchg.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 6322120033 X-Rspam-User: X-Stat-Signature: 1uqxox1jqern9fr518r9qm6oazxya34i X-Rspamd-Server: rspam03 X-HE-Tag: 1694526462-269377 X-HE-Meta: U2FsdGVkX18KHSUV1BU3ohV6sz8JAICiNVZS94i1wH8iV2aA+lxdlZVKn9Am5kKG/XFGHgGBVL33M9iRr2w01JHj79oD2KOaLdf6HYE3hkpSFxsF4PzXMNXLf+uxLgAfk7RpPfCF1DQj9Q+gdh7pYANnx5dIboIlJ9cjS1qWRAeF2uuYaIWh3b9uA39eLFfLvEJYFXjD7/UxtaOnfKwU4kaHfFRr64ffzU8AYDVEkPnSJ2rklU5rHF/2A7awKGcNs1seGUqLJszNzuDAMz0xrmJjDnJg7JEanU7vN8xpHDgc/+OaFfZI8aJcEw95SFWxEsv26gFgs7Wyw3iBvQtKcLdPJT/v2EZ3MzYCdTH722XquFEIS5LkTKzkHfEqEBdaaWxUhuSqm62tFvIrRQWoGTHsDPeRfwpT8OeRTCHqKuBUdF1G58+/zWEPqicj3x+OGbPvk1TQTHtXQKXDIB+0VjTLyFusW/RXwRjGh/StaGQXbJToXfSeiRb721Fssd9C1LZL6wW5HqiG8D+J5xOGNYtjQYrc/cFrgypY706KoItD0Y/4829zmDMJwjr2Rf2ujdYPd6EyMM3VBk8oq5DBNGdN5oyg8bh43kEFVL2Ga4hUvAL1YkIVGvlrgp+Ik5GlzHa7xdx4W2J0h7LtQFsca1Oh6lVrNoFDqRYxaQleajUVHbqVbPsl1g2rrjAUQl+JWpPxrBL1fJzRfavPLrtrUAA+C3PP/EvvMOK1Q3AIyw0a9bC5d496jOclIerjr/xWdRrhFSfUJlMqjai8voOLSHnquX5uxFauns82eJ/fZDzU63qixd9bB9HiHVTc1IRSV8tx+2F33+E8QkF2XXs9c8MfV3S0J8uXkxCyrW0aASLm33ukBWbFwIneLUHHW7SynUpvFPDvlBvFQk0oD5MPJmrlntOtqIU6sdje+Ywwb+JxeE1AcV3lPf+4a/fFxSTFRGEQiM/jrTMTEx+KThU lJVxckQ6 BvuG/33ufFGPV5AYJypBPLI5f+jB72cpkC+OZkjl0fDL8OAGELJSZS9O5n+0XJNc4ViAjYAN3d/a8ewY+QGAbAdeSr4OzXbesk3Uxt1t/h32jNs277kg9evLMd2tQse1EtjuFIEajzv52hNMk2RlQzLn1NiTkP5MFmXlUyAhUFsjKSq0Ncy8QAWpxrvUSNKmU5x41fgPC/KphloLRHrDk8Lqah31nYzpaARFIXySEjnk5APXyGbrNWMKvl+drIjlKM/yyzJiezh5lINZUIc0lyV4i+rc77Nc9qMcqzDzdGUINqvY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 9/11/23 21:41, Johannes Weiner wrote: > The idea behind the cache is to save get_pageblock_migratetype() > lookups during bulk freeing. A microbenchmark suggests this isn't > helping, though. The pcp migratetype can get stale, which means that > bulk freeing has an extra branch to check if the pageblock was > isolated while on the pcp. > > While the variance overlaps, the cache write and the branch seem to > make this a net negative. The following test allocates and frees > batches of 10,000 pages (~3x the pcp high marks to trigger flushing): > > Before: > 8,668.48 msec task-clock # 99.735 CPUs utilized ( +- 2.90% ) > 19 context-switches # 4.341 /sec ( +- 3.24% ) > 0 cpu-migrations # 0.000 /sec > 17,440 page-faults # 3.984 K/sec ( +- 2.90% ) > 41,758,692,473 cycles # 9.541 GHz ( +- 2.90% ) > 126,201,294,231 instructions # 5.98 insn per cycle ( +- 2.90% ) > 25,348,098,335 branches # 5.791 G/sec ( +- 2.90% ) > 33,436,921 branch-misses # 0.26% of all branches ( +- 2.90% ) > > 0.0869148 +- 0.0000302 seconds time elapsed ( +- 0.03% ) > > After: > 8,444.81 msec task-clock # 99.726 CPUs utilized ( +- 2.90% ) > 22 context-switches # 5.160 /sec ( +- 3.23% ) > 0 cpu-migrations # 0.000 /sec > 17,443 page-faults # 4.091 K/sec ( +- 2.90% ) > 40,616,738,355 cycles # 9.527 GHz ( +- 2.90% ) > 126,383,351,792 instructions # 6.16 insn per cycle ( +- 2.90% ) > 25,224,985,153 branches # 5.917 G/sec ( +- 2.90% ) > 32,236,793 branch-misses # 0.25% of all branches ( +- 2.90% ) > > 0.0846799 +- 0.0000412 seconds time elapsed ( +- 0.05% ) > > A side effect is that this also ensures that pages whose pageblock > gets stolen while on the pcplist end up on the right freelist and we > don't perform potentially type-incompatible buddy merges (or skip > merges when we shouldn't), whis is likely beneficial to long-term > fragmentation management, although the effects would be harder to > measure. Settle for simpler and faster code as justification here. Makes sense to me, so > Signed-off-by: Johannes Weiner Reviewed-by: Vlastimil Babka Some notes below. > @@ -1577,7 +1556,6 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, > continue; > del_page_from_free_list(page, zone, current_order); > expand(zone, page, order, current_order, migratetype); > - set_pcppage_migratetype(page, migratetype); Hm interesting, just noticed that __rmqueue_fallback() never did this AFAICS, sounds like a bug. > trace_mm_page_alloc_zone_locked(page, order, migratetype, > pcp_allowed_order(order) && > migratetype < MIGRATE_PCPTYPES); > @@ -2145,7 +2123,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, > * pages are ordered properly. > */ > list_add_tail(&page->pcp_list, list); > - if (is_migrate_cma(get_pcppage_migratetype(page))) > + if (is_migrate_cma(get_pageblock_migratetype(page))) > __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, > -(1 << order)); This is potentially a source of overhead, I assume patch 6/6 might change that. > } > @@ -2304,19 +2282,6 @@ void drain_all_pages(struct zone *zone) > __drain_all_pages(zone, false); > } > > -static bool free_unref_page_prepare(struct page *page, unsigned long pfn, > - unsigned int order) > -{ > - int migratetype; > - > - if (!free_pages_prepare(page, order, FPI_NONE)) > - return false; > - > - migratetype = get_pfnblock_migratetype(page, pfn); > - set_pcppage_migratetype(page, migratetype); > - return true; > -} > - > static int nr_pcp_free(struct per_cpu_pages *pcp, int high, bool free_high) > { > int min_nr_free, max_nr_free; > @@ -2402,7 +2367,7 @@ void free_unref_page(struct page *page, unsigned int order) > unsigned long pfn = page_to_pfn(page); > int migratetype, pcpmigratetype; > > - if (!free_unref_page_prepare(page, pfn, order)) > + if (!free_pages_prepare(page, order, FPI_NONE)) > return; > > /* > @@ -2412,7 +2377,7 @@ void free_unref_page(struct page *page, unsigned int order) > * get those areas back if necessary. Otherwise, we may have to free > * excessively into the page allocator > */ > - migratetype = pcpmigratetype = get_pcppage_migratetype(page); > + migratetype = pcpmigratetype = get_pfnblock_migratetype(page, pfn); > if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { > if (unlikely(is_migrate_isolate(migratetype))) { > free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE); > @@ -2448,7 +2413,8 @@ void free_unref_page_list(struct list_head *list) > /* Prepare pages for freeing */ > list_for_each_entry_safe(page, next, list, lru) { > unsigned long pfn = page_to_pfn(page); > - if (!free_unref_page_prepare(page, pfn, 0)) { > + > + if (!free_pages_prepare(page, 0, FPI_NONE)) { > list_del(&page->lru); > continue; > } > @@ -2457,7 +2423,7 @@ void free_unref_page_list(struct list_head *list) > * Free isolated pages directly to the allocator, see > * comment in free_unref_page. > */ > - migratetype = get_pcppage_migratetype(page); > + migratetype = get_pfnblock_migratetype(page, pfn); > if (unlikely(is_migrate_isolate(migratetype))) { > list_del(&page->lru); > free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE); I think after this change we should move the isolated pages handling to the second loop below, so that we wouldn't have to call get_pfnblock_migratetype() twice per page. Dunno yet if some later patch does that. It would need to unlock pcp when necessary. > @@ -2466,10 +2432,11 @@ void free_unref_page_list(struct list_head *list) > } > > list_for_each_entry_safe(page, next, list, lru) { > + unsigned long pfn = page_to_pfn(page); > struct zone *zone = page_zone(page); > > list_del(&page->lru); > - migratetype = get_pcppage_migratetype(page); > + migratetype = get_pfnblock_migratetype(page, pfn); > > /* > * Either different zone requiring a different pcp lock or > @@ -2492,7 +2459,7 @@ void free_unref_page_list(struct list_head *list) > pcp = pcp_spin_trylock(zone->per_cpu_pageset); > if (unlikely(!pcp)) { > pcp_trylock_finish(UP_flags); > - free_one_page(zone, page, page_to_pfn(page), > + free_one_page(zone, page, pfn, > 0, migratetype, FPI_NONE); > locked_zone = NULL; > continue; > @@ -2661,7 +2628,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, > } > } > __mod_zone_freepage_state(zone, -(1 << order), > - get_pcppage_migratetype(page)); > + get_pageblock_migratetype(page)); > spin_unlock_irqrestore(&zone->lock, flags); > } while (check_new_pages(page, order)); >