From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A52EEE020B for ; Wed, 13 Sep 2023 20:18:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E78926B028E; Wed, 13 Sep 2023 16:18:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E015C6B028F; Wed, 13 Sep 2023 16:18:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C54A56B0290; Wed, 13 Sep 2023 16:18:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B38286B028E for ; Wed, 13 Sep 2023 16:18:22 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 53D0C1A04DF for ; Wed, 13 Sep 2023 20:18:22 +0000 (UTC) X-FDA: 81232686444.26.FB5BB87 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf24.hostedemail.com (Postfix) with ESMTP id 1DB1918000E for ; Wed, 13 Sep 2023 20:18:19 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=ZtJT7RhZ; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=oQAxaEvc; spf=pass (imf24.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694636300; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1GH7qir1qQHMIG1UYyOw9wdZ53FxihsyGr7TX5dB0SQ=; b=soKXXQtC75u6+IvhikfoAmpPG+kSL3m/t21tr5uuYSu/MozTl/tx+xGq3IsIBks30JgQWK 2AsQKtmPR8pLR7lJXHIZJ3Iolj5qvbJQybmBuqu/czfT56oTKnO78nOOBZgZRx4pyS+e3H JRGfhNdGTib9dPEuoxM7f4ctNh4UM7I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694636300; a=rsa-sha256; cv=none; b=xqGnwkvVsb+A8911lPooiARfRZcMcGpErl8XVbWbSNpZbC4xja/coXvFV7Wws0uDYuKn0R Bg0evERvTcN9Y/3NNiLrmRf/r8KxnidZ8RhU1FtQUutSLcQ1NX0/guwPqQ9nZDOnrgbRgy jdfsPf0/tpEn7PP1Cv4vgvJU1bbckrY= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=ZtJT7RhZ; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=oQAxaEvc; spf=pass (imf24.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 0A27B1F385; Wed, 13 Sep 2023 20:18:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1694636298; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1GH7qir1qQHMIG1UYyOw9wdZ53FxihsyGr7TX5dB0SQ=; b=ZtJT7RhZ9QWAENDYt4jjtGJSXl/JuY5YfiJQw5uewVi1Goub4//cijlYkbqu4KNS5HD/Ek sVImPfGvXyKhUe1zh6LOInoHJ0WhMTwBubCXM6VSrk2EMyQJN2qNOT86L4hifuVUGK1MTa 7ZM8FX1rm1eEyJlUr0XKcNL82VGORnc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1694636298; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1GH7qir1qQHMIG1UYyOw9wdZ53FxihsyGr7TX5dB0SQ=; b=oQAxaEvcxaHxjPOpP4YWBhv0CpIKFEvgrHpJfCmRxC+va5Xt9lL+4cYRBT3t6zQP6Cc3Dd WrRw8N9atRseC2Bg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D9C7113440; Wed, 13 Sep 2023 20:18:17 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 17YZNAkZAmWHAgAAMHmgww (envelope-from ); Wed, 13 Sep 2023 20:18:17 +0000 Message-ID: <37dbd4d0-c125-6694-dec4-6322ae5b6dee@suse.cz> Date: Wed, 13 Sep 2023 22:18:17 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: [PATCH 6/6] mm: page_alloc: consolidate free page accounting Content-Language: en-US To: Johannes Weiner , Andrew Morton Cc: Mel Gorman , Miaohe Lin , Kefeng Wang , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230911195023.247694-1-hannes@cmpxchg.org> <20230911195023.247694-7-hannes@cmpxchg.org> From: Vlastimil Babka In-Reply-To: <20230911195023.247694-7-hannes@cmpxchg.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 1DB1918000E X-Rspam-User: X-Stat-Signature: a1cxjeoiojo5j87iosx385bw1mesw18k X-Rspamd-Server: rspam03 X-HE-Tag: 1694636299-74530 X-HE-Meta: U2FsdGVkX1/TH0i2qug5yDWI6A7VnqLAhuv38EEPs4qzW/zmzC2zLnD4K16USC+BboEGQ7tb42hr1CZX/C6oGYiTu7LGkym2/rND+w5HDrtdt+FdV5aQbJ0XhkmnGkqL9sX5bnYSATxw564yySS7lYs2JKGLNH2HlKMuVYPGhyKXU4XLdQptTbHPmacN8D487skUtpQ9YbasrO/8rdy6qJlxwx+vm+ifTHpzc32YCzUKh4FCe4bOf55rcUV6A3qpGkfOmukhpYDDh8Udw4Kygwv/58R738osPdafDgyuxM0F48IXYGrNgsLPpTpKgu4JztdsGrpNY2O7+S2m33WSClm7BpkaDyzBqmWqyKF5lKHwsFNGuDo2FrVJ9g/5TVBRBPxmtXE62chBeeooiJtZmoouk1gAmp8GoziiyGkibxsBLwOSRqemnFx8aNeCX/CVN3YgG+FP8lTBCgK8BCD/UY2nc7oLgB2SxZdadK9HKr+TxdlfQqgGuLbsJ1GymsbcxkPwL9fO7AaxM1uiSAKEJk4ojCtgpUbLoxnf2VkXlIH/4GwuobMpiWiEkpYtxQ1l2UIw3nBaYK+b0EPhmwRDLWBfnucSVMDCKAHf6f4z5Dkic6cb66jW/2iKPLKYNo6BoVxY2cqaiAQiOtIpoM7wGz1xFmkjJyoxrGqN3UlebWXaGR1fvadj2cgsv578q3AyrITHoZrrq+JtW01jXR9MbbLc0nyUEdB9GCpX/GXeQgbtnQO+tp7K9s0tu0bEB1vFl19pkAFgt1nqTHW/O4WX68k2Q1jROit5GPzr8xlvkIs0GyHUzgCLgWKs0/HMTRFpBmZLQ1fSAHOKSqAkQlP9KMBz9cQePsBJ0EwyNuT3xktUcsXlDEDihh/rNm4M/ViBee3ygYQl6d8i+/ztlE3QOMEaE7CH5uE6OWb+PphgmEEiMEGS++u2tNjVoFJFS425AkZDp6V+O40r0ODP2b+ kBji7fuN g+BClVWJ8hAUr48raI/9JRFXmZYlAmOCjKrAI6C/yQG1i+A+Ak0U6b4v1Drto2VIoZOOec4jS/5dED1d1zPYB1h4iCy7BbtS9XovjdiFJot55brHNmbbCiTmJd+uJymYJ5CrqXOFiK/Bs3LgUVAPYYKRAZEZXVFFt8d9ZeBoy9LdmNjOdGk8qgT3OujwLGv9FvOoNaiDZnFPMCN5O+rR7uFsnDanPGcuMNrbWEXFTQZF39SZMSUdEY7CgUtv9pqOuMkzNk4mfu/DC96P/npj2O1hH7Hi7TDL2ZuB0xFJyn8Zfm7o= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 9/11/23 21:41, Johannes Weiner wrote: > Free page accounting currently happens a bit too high up the call > stack, where it has to deal with guard pages, compaction capturing, > block stealing and even page isolation. This is subtle and fragile, > and makes it difficult to hack on the code. > > Now that type violations on the freelists have been fixed, push the > accounting down to where pages enter and leave the freelist. > > v3: > - fix CONFIG_UNACCEPTED_MEMORY build (lkp) > v2: > - fix CONFIG_DEBUG_PAGEALLOC build (Mel) > > Signed-off-by: Johannes Weiner > > /* Used for pages not on another list */ > -static inline void add_to_free_list_tail(struct page *page, struct zone *zone, > - unsigned int order, int migratetype) > +static inline void add_to_free_list(struct page *page, struct zone *zone, > + unsigned int order, int migratetype, > + bool tail) > { > struct free_area *area = &zone->free_area[order]; > > - list_add_tail(&page->buddy_list, &area->free_list[migratetype]); > + VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, > + "page type is %lu, passed migratetype is %d (nr=%d)\n", > + get_pageblock_migratetype(page), migratetype, 1 << order); Ok, IIUC so you now assume pageblock migratetype is now matching freelist placement at all times. This is a change from the previous treatment as a heuristic that may be sometimes imprecise. Let's assume the previous patches handled the deterministic reasons why those would deviate (modulo my concern about pageblocks spanning multiple zones in reply to 5/6). But unless I'm missing something, I don't think the possible race scenarios were dealt with? Pageblock migratetype is set under zone->lock but there are places that read it outside of zone->lock and then trust it to perform the freelist placement. See for example __free_pages_ok(), or free_unref_page() in the cases it calls free_one_page(). These determine pageblock migratetype before taking the zone->lock. Only for has_isolate_pageblock() cases we are more careful, because previously isolation was the only case where precision was needed. So I think this warning is going to trigger? > + > + if (tail) > + list_add_tail(&page->buddy_list, &area->free_list[migratetype]); > + else > + list_add(&page->buddy_list, &area->free_list[migratetype]); > area->nr_free++; > + > + account_freepages(page, zone, 1 << order, migratetype); > } > > /* > @@ -757,23 +783,21 @@ static inline void __free_one_page(struct page *page, > VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page); > > VM_BUG_ON(migratetype == -1); > - if (likely(!is_migrate_isolate(migratetype))) > - __mod_zone_freepage_state(zone, 1 << order, migratetype); > - > VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page); > VM_BUG_ON_PAGE(bad_range(zone, page), page); > > while (order < MAX_ORDER) { > - if (compaction_capture(capc, page, order, migratetype)) { > - __mod_zone_freepage_state(zone, -(1 << order), > - migratetype); > + int buddy_mt; > + > + if (compaction_capture(capc, page, order, migratetype)) > return; > - } > > buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn); > if (!buddy) > goto done_merging; > > + buddy_mt = get_pfnblock_migratetype(buddy, buddy_pfn); You should assume buddy_mt equals migratetype, no? It's the same assumption as the VM_WARN_ONCE() I've discussed? > + > if (unlikely(order >= pageblock_order)) { Only here buddy_mt can differ and the code in this block already handles that. > /* > * We want to prevent merge between freepages on pageblock > @@ -801,9 +825,9 @@ static inline void __free_one_page(struct page *page, > * merge with it and move up one order. > */ > if (page_is_guard(buddy)) > - clear_page_guard(zone, buddy, order, migratetype); > + clear_page_guard(zone, buddy, order); > else > - del_page_from_free_list(buddy, zone, order); > + del_page_from_free_list(buddy, zone, order, buddy_mt); Ugh so this will add account_freepages() call to each iteration of the __free_one_page() hot loop, which seems like a lot of unnecessary overhead - as long as we are within pageblock_order the migratetype should be the same, and thus also is_migrate_isolate() and is_migrate_cma() tests should return the same value so we shouldn't need to call __mod_zone_page_state() piecemeal like this. > combined_pfn = buddy_pfn & pfn; > page = page + (combined_pfn - pfn); > pfn = combined_pfn;