From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6178C0015E for ; Thu, 13 Jul 2023 15:28:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2FC7B90003D; Thu, 13 Jul 2023 11:28:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 284FF900037; Thu, 13 Jul 2023 11:28:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1303490003D; Thu, 13 Jul 2023 11:28:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id F35E8900037 for ; Thu, 13 Jul 2023 11:28:54 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id BDDCBA0306 for ; Thu, 13 Jul 2023 15:28:54 +0000 (UTC) X-FDA: 81006971388.13.DB06F12 Received: from mail-qv1-f44.google.com (mail-qv1-f44.google.com [209.85.219.44]) by imf22.hostedemail.com (Postfix) with ESMTP id 9202CC0025 for ; Thu, 13 Jul 2023 15:28:51 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=E9hBr1jt; spf=pass (imf22.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.44 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689262131; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h/80Ekd6DqzKLFqX46dYHjJZXY+rAS6rJgfP1rtkgTw=; b=I7RfxftucYRdebmKpnV1QXyOBfRqhv6VsUqHh/ZTl8wKl0D8fBpVXuiknRW5BQqtCriBhp wtQ6OwlZDJxQse7O42K6Uxs1ytPtfHjXzufPr4PlAeLWt8pusTD7T/q8B57EkP5lphYg63 t3Stt1XlqHCbUZ5qtrH5t1+CoGTWfqw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689262132; a=rsa-sha256; cv=none; b=3ghCPOMUAZCFkB3QNtCcXI05QXUxkB0KFvtpAZOVCUxPSluFF44+8jw+SxseewpQPcj0rT 4T7Cj6fws5CAEKH5SDGxzaD9Ev/7zydqx3wcwn8GeV/pBxVJLk0Vs3TMHgjwgnCU8smrMV QUWforpKSF4Wwoqain3KkNU7jNwn8mU= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=E9hBr1jt; spf=pass (imf22.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.44 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org Received: by mail-qv1-f44.google.com with SMTP id 6a1803df08f44-635e107642fso3579676d6.1 for ; Thu, 13 Jul 2023 08:28:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1689262130; x=1691854130; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=h/80Ekd6DqzKLFqX46dYHjJZXY+rAS6rJgfP1rtkgTw=; b=E9hBr1jtJz/iksiAdsR2un2ICY/Q8lHi4KTgSuA/7iR3fqTHt+lPbTuN9+IPLgjjvt 4cSAR5bN+HKxFrii46F8+XH89z2AirUAr2Wppx5SO7OarfVVsnvTYTl83mER9V5P/4dW vYqcfDG+DpazKcJ2rEldIgdTinGJ/qMmK9HI7wosH6ppxzdKKmGz1jJzcHbSBmelm3uF /ZPjjUNVscZLfAF9ByHAoAmD+iyFcwzYSG5NNBzOnVzXuBdII0QG6ETfNteggNbDSXuS FNNRc3FWCjDzOkKEpwf+LhJDcJIEBl7qaL+01kClqrHS+0wmvv1GWm6/ir3IZFHRzG4o swBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689262130; x=1691854130; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=h/80Ekd6DqzKLFqX46dYHjJZXY+rAS6rJgfP1rtkgTw=; b=aku5sViSXGDbCIcuCRdykA8y6J5JuvsHRPUJFaa/p/m96car9rArqMsodm6xmpp7g8 7m113rW+6l8XRj2yw+vkUsphb03PUHfhzf6USajOI9fx+p6BqTPOo6En30ET6aFYp/HK nCVKNrLTVE59HaBPqT52LuGfBamCfzwaAat6KIq41FkTsUkXB1XrDAE+qFmebSZamO1m AjiVAfCgbUEVcmC2tfVMJCCXYtr2EtWVYlmleJrGNiNY6pJuR3ULBVTjQzgiAi5ZO1dQ GmGt3xSrIGx2udy5UuY1pZrFCjg+I9HPjVWlqBTdFQzUpRiXKEHYfL7T01VQIOIhC3kL 4P6Q== X-Gm-Message-State: ABy/qLbtfDpbitKTVwMGfGDFAEfDm3Lp/SPa11g7GbuRW0Plgpw56gwc nZtJWio2iBfKqvgNTe8DTqc2kw== X-Google-Smtp-Source: APBJJlEPmbtWoVegAbCrTRJayFAcpqN6oOQKID1ZEfTn5WBgGrZqLXQyComslaaPYjdCn2i0GnHvHw== X-Received: by 2002:a05:6214:20e7:b0:62f:fe47:d478 with SMTP id 7-20020a05621420e700b0062ffe47d478mr32572qvk.8.1689262129830; Thu, 13 Jul 2023 08:28:49 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:f593]) by smtp.gmail.com with ESMTPSA id d22-20020a0cb2d6000000b00631fea4d5bcsm3168522qvf.95.2023.07.13.08.28.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jul 2023 08:28:49 -0700 (PDT) Date: Thu, 13 Jul 2023 11:28:48 -0400 From: Johannes Weiner To: James Gowans Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jan H =?iso-8859-1?Q?=2E_Sch=F6nherr?= , Andrew Morton , Vlastimil Babka , Baolin Wang , Mel Gorman , Matthew Wilcox , Kefeng Wang , Minghao Chi Subject: Re: [RFC] mm: compaction: suitable_migration_target checks for higher order buddies Message-ID: <20230713152848.GA495602@cmpxchg.org> References: <20230712155421.875491-1-jgowans@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230712155421.875491-1-jgowans@amazon.com> X-Rspamd-Queue-Id: 9202CC0025 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 5qe1wt7on67toqgmbfigsiycbm73awoj X-HE-Tag: 1689262131-254677 X-HE-Meta: U2FsdGVkX19B4yFAzJC8QPgkaLCNhERh/MY0QKbsPX4ebv5igJ92w/3va2K39rk41Oqs/sO8p3NqXmhFM6rcEAzNUP4cogB9UZkvlylsEEDYf/obwAO0xRE8sAwObR1WkFuE5RkXMpt4IGNXqaJJSg4upMm6w/Dd3NYO7Ly4/xNoUjlZtL3xjpKoecaBN0Vz50toCxvFNSJpu9ST2dnsWPPGsDYAetu4zoXqEJSJA3j7Mm1JeUFzPFayGiY4XOtuR2ESot+CX5S7bYYj+90IPSlBOYFK3vmC9zh4GodEHqOnkKoM2XN41E0GDyo5PFA8cdHcc/tBN4IwJbYG7z8dIEIjtoTxYOe0HQXH/VG9qcCEls02tQeTaR4FAzJqEGqJxP1KWd8qb1fOzcGce6EiJ20dfYQEj9mWIQvDhp9bRryQMeiX6k1OgbJERXDc5XA/Q+i3R6Z9pziZ2mOMMOUwf7B4BZX4Xy+xTWQjh+uLAfVajmwS1d5gxnNEvPp3PxJVK09l4n200P3gLbicOEOic/Uxidmy6CMkTyh+H9NeY+17z84uo8pdP700v5t36xCUh3jv4o3VR7WDAcNDXHeJGpUP7lm/FHmUNuVDWkDfjZV9UIyM4k8wiZAKbg/ZaFdZFfIqRDCVSqrX+ERvnJCQfOxn/XkdgoU/3TU+M/KpmkJZvqd4m/4PTINS4YGCim5IWDjZsrssSoxWEYKndOcZ0B6QXedKZsg+8AkCQOkANmwZvSlxUjbaB8fo9rb3BkauLnLU4DG1lF1fRdOFO0hyJiH+qVkq3JsXtA2Wq56sznmpiHojaaqtXK09SalzVmTfJQLHAULxmMO2DrzQz5OtazAETCkHFsv94dPiP8rlG7XEWHS67b/nkouhYqUIWl/jarRBvee1NO6Z7eb1l/oe971YDexC3wDd1oZ9f0mlrIqeUOwKMfJsFybB1hCN3ycGa1wT5iZ8ZQNLySbnTik TOk7SiPM Pf8Ic+l2qBHpeKwaYEn6sbFYQRdiQNT8jivJxxcPzcmB6ToeeNxknoQFMuPPB+EAgz4UyZEgPaIbsMj1Yp023sY6iJVvSR9SJvaHClR0eGnHW459WvMrNjgVw9FkyfsBxpuudlcWYHRAlMg9tBI6W7QN7Q8soyBr/bcOD0z7RIbb5gBqdR6NiDZrRfk99wGG/GKfd6tKMC3Sv6VlK3PBI0Ub4uLK8ASF035hlHLbXRmtXefhAKPuyEMX958oOSLfIRqLTT+iaI363Hil5j3qozp5dLSRosrezWGw4S7jeGbYYX7PevVflEXQViSNbkAvPCDmXyLQz8HC3vohz9iQ/euQUxb9wduxRZXeBNLuiVhAa6eo6UISsJwYMkl4VRMPub53s/d7nI+LuzC3Aok+OCu5NQDI+AbBA2BUhvfMGjNylJ3xMLuaGqDyRi43w8vaEWhElCysuNLOcyK4a7kG1ebInEUzFAS97U3Kavv7Y+Q93LciF2wqz4Evddi74y3wWwNx266yXwii5V+eRf3/GX1jzvud6OYvqvKxuXnNcrSa0nylIdgjZtFnhPkuEczmKiL5IAXQSuDqRL76iZeXU17G+9w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 12, 2023 at 05:54:21PM +0200, James Gowans wrote: > Huge page compaction finds free target pages to which source pages can > be migrated when compacting. A huge page sized and aligned block is > considered a valid source of target pages if it passes the > suitable_migration_target() test. One of the things which > suitable_migration_target() does is to ensure that the entire block > isn't currently free. It would counter productive to use an already > fully free huge page sized block as a migration target because using > pages from that free huge page block would decrease the number of > available huge pages in the system. > > suitable_migration_source() attempts to ensure that the supplied block > is not currently a fully free block by checking PageBuddy flag on the > starting page of the huge page sized and aligned block. This approach is > flawed: the buddy list can and does maintain buddies at a larger order > than huge page size. For example on a typical x86 system, huge page > pageblock_order is 2 MiB, but the buddy list MAX_ORDER is 4 MiB. Because > of this, a pageblock_order sized block may be free because it is part of > a larger order buddy list buddy, but the pageblock_order sized block > won't itself be part of the buddy list, only the larger order block will > be. The current suitable_migration_target() implementation of just > checking the PageBuddy flag on the pageblock_order block is hence > insufficient as it will appear that the block is not free and hence try > to use it as a source of migration target pages. > > Enhance suitable_migration_target() to cater for this case by scanning > up the buddy orders from the current pageblock_order page to MAX_ORDER > to see if any of the larger page blocks have the PageBuddy flag set. > > In practice incorrectly considering a page block as a suitable migration > target doesn't actually cause the block to be broken down. That block is > passed to isolate_freepages_block() which will scan it for any pages > currently in the buddy list. The assumption is that buddy list nodes > will be found because the entire block is not free. In the case > described above actually no buddy list nodes will be found because the > higher order block is free. It's just unnecessary scanning. > > As such, the user visible effect of this change is only (in theory [1]) > very slightly faster huge compaction by avoiding scanning entirely free > blocks for free pages. Even if the effect is negligible, this change > better conveys what the function is attempting to do: check whether this > page block is entirely free or not. > > [1] I have not actually measured whether the difference is noticeable. This is an interesting find. But because it's working correctly right now, this patch is a performance optimization, so numbers would help. > @@ -1342,15 +1342,40 @@ static bool suitable_migration_source(struct compact_control *cc, > static bool suitable_migration_target(struct compact_control *cc, > struct page *page) > { > - /* If the page is a large free page, then disallow migration */ > - if (PageBuddy(page)) { > + unsigned int higher_order; > + /* > + * If the supplied page is part of a pageblock_order or larger free > + * block it is not a suitable migration target block. Detect this case > + * by starting at the pageorder_block aligned page and scan upwards to > + * MAX_ORDER aligned page. Scan to see if any of the struct pages are > + * in the buddy list for the order of the larger block. Disallow > + * migration if so. > + */ > + for (higher_order = pageblock_order; higher_order <= MAX_ORDER; ++higher_order) { > + struct page *higher_order_page; > + unsigned long higher_order_pfn; > /* > - * We are checking page_order without zone->lock taken. But > - * the only small danger is that we skip a potentially suitable > - * pageblock, so it's not worth to check order for valid range. > + * This is legal provided that struct pages are always initialised > + * to at least start at MAX_ORDER alignment. > */ > - if (buddy_order_unsafe(page) >= pageblock_order) > - return false; > + higher_order_pfn &= ~((1 << higher_order) - 1); > + higher_order_page = pfn_to_page(higher_order_pfn); > + if (PageBuddy(higher_order_page)) { > + /* > + * We are checking page_order without zone->lock taken. But > + * the only small danger is that we skip a potentially suitable > + * pageblock, so it's not worth to check order for valid range. > + */ > + if (buddy_order_unsafe(higher_order_page) >= higher_order) > + return false; > + /* > + * This is a buddy but not a sufficiently large buddy. > + * There will never be a larger one above this. > + */ > + else > + break; > + } One thing that's unfortunate is that isolate_freepages() will still just skip one pageblock, even if you find the buddy further away than that. This would check the same range at least twice (or more, depending on the distance between pageblock_order and MAX_ORDER). Instead of returning bool, it could make sense to return the pfn of where you find the buddy, and then have isolate_freepages() skip and continue the search at the pageblock below that. Btw, this is also fixed by my patches that add the MIGRATE_FREE type[1]. It has isolate_freepages() check this block type instead of PageBuddy(), and that's set for all subblocks in a larger buddy. [1] https://lore.kernel.org/lkml/20230418191313.268131-1-hannes@cmpxchg.org/ ]