From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0593BC36010 for ; Fri, 11 Apr 2025 15:39:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 118CA68000C; Fri, 11 Apr 2025 11:39:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A38A680005; Fri, 11 Apr 2025 11:39:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E839868000C; Fri, 11 Apr 2025 11:39:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C650F680005 for ; Fri, 11 Apr 2025 11:39:14 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C8CA6C04B2 for ; Fri, 11 Apr 2025 15:39:14 +0000 (UTC) X-FDA: 83322171828.15.3B416CF Received: from mail-qv1-f44.google.com (mail-qv1-f44.google.com [209.85.219.44]) by imf01.hostedemail.com (Postfix) with ESMTP id B37C14000B for ; Fri, 11 Apr 2025 15:39:12 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=oVPZeNRQ; spf=pass (imf01.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.44 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744385952; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4zYNA1FuF5UGOp9xK52pBzhGeanKaxhrzOWMnlAuxLI=; b=7RwwZPmonXPy6B6l4YWnf9hzojMXWL61hE3u1nOcSnPyl+PhVCFiYlGFq/z9C6R+LTEEbW gfCxS7etG7mbi+zqAoHsSp1qyim4qb2Nrgy8bjMfJnJImpl3zzSRzwpXa6jWxigIgufvhW +XeW90fHvDmjFyHnqDDJo6va9C5I20M= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=oVPZeNRQ; spf=pass (imf01.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.44 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744385952; a=rsa-sha256; cv=none; b=PPy0fpupHdboHqEsDpYYboLTJA0VVSNHj5B/DZa3KUhfOsanX8koz0UI2ncb4JOY4dPx7I MCQDYTa+aAgeCDWudqa6+cSYApoXhlDnPKbHzrXCv451sY2+obJ/MSrHkyF73JxF24DMGd /oKNzUkByG3SbagAPSdFMKs7bPTVt7c= Received: by mail-qv1-f44.google.com with SMTP id 6a1803df08f44-6e8fca43972so23397826d6.1 for ; Fri, 11 Apr 2025 08:39:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1744385952; x=1744990752; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=4zYNA1FuF5UGOp9xK52pBzhGeanKaxhrzOWMnlAuxLI=; b=oVPZeNRQxsfUZdTlmIZV6TUo6R1+OA2MpyMvUjTXA7Y2l8Lz/Ur9GDMPiF9NDqAs96 +gjWMqyTIL84fhmIfPdL7OIZ1yqRobGNxYhJDzICXgDIuVoK5QH26D398hsTT8513pIO Ljek2NI7oXhGoKVQJKleYbUXNL8OBWesIQ6drjmPzk9DlCnd8tgH0gDca1x/NEPNkf6e ogRk8yl4SoCe/zjqSjGiFo+bs+bBakLT51n7+vsfQLpQEEEP9QdlOF3r6PZ8oQT7aKn3 UZ7w1IHPLdBrUzz0ZVnsnSFBTUQZrGIMeQrVaJPSOoNpnsOMOjzqJyT1LnoKsc/MD0QY rfCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744385952; x=1744990752; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4zYNA1FuF5UGOp9xK52pBzhGeanKaxhrzOWMnlAuxLI=; b=GpmbzIyOL9G9eVOi7pFpfzo1MSvBZbRq0aMyU8eYQ4CxDZm0Kkw6Vx7d0ZH5kx3djd AOol5ToyjfluKHua+UU6ExS0QJODCRyS/0b6vvZkaKiJzAeENKI11DwW6WzV2lC1YIxc KVKW8z5EIs5ivZdcyVSNIYhpXFIWCPxtyFnqyGVlQFntfGcoontWjck6BdINl25aaUXf UWUJDHwyNFXgKPxDs67g83EYlxnadpsAPgzSRkMxCB9euTkiTz8XjY3FA4JkhFnq4W42 aieVivNtoexlMPXCnU7RItLG+ePo7EOYiWXzR7qZHdlA/DWEb2e5K72eIcgjWR9VPNaG E5qg== X-Forwarded-Encrypted: i=1; AJvYcCUCwDHkPfjxWxR3MyRxZPKHPA1jcnuCLvVcT9XIoCOUFaerXd1dwKa0+wMJqwHKEeX1SLEubfpS4A==@kvack.org X-Gm-Message-State: AOJu0YyD+Od3dNcTsPLAfTRIUMP3lLEwQlRFM/iDqz2P2RwddzTVX84I WcdA02WprQNVM8BO1BAlcMDge9McC9l4rDVV7DMdy6mT8R5G5AATxej40enUhVPGRj46ej1mbfr h X-Gm-Gg: ASbGncsvlcTYPHCR8+kfPfK8e83QKbB+szolTFLzXG6fm+4N1Rx54A0m9JiitEbL+FY 0GrMGlE3wJPVRey/URbjKamK3IoFwzqB1JnNnliRHosDOBEs11oG9uqSJcbdllBbWJK4tGHvOGt pA0k6AwYm7BlXunRxBqPI5d/ChO4qV/xc6z5wXOIqcv+uArOo4wPrY52QS/cQcKPgzMD75XB2k/ WCtV/rlpW8oN0P2T7TdtE9RTW3RecqkEc4/+OpvBT5PN5VsNnH5eNBwShIdRSYgwoZLljspsNCJ bRX59gVWCvha3zeXvsVYHtJDT1fNIM8bcgOm7Kc= X-Google-Smtp-Source: AGHT+IFIqADj/xLZLbLmVnObT99X4OVEjUc986G0OtNBq54QFbfU9AIGF7a8i90vQRDbqoaQq6HKEg== X-Received: by 2002:ad4:5742:0:b0:6eb:28e4:8519 with SMTP id 6a1803df08f44-6f230d52924mr50696356d6.21.1744385951619; Fri, 11 Apr 2025 08:39:11 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:365a:60ff:fe62:ff29]) by smtp.gmail.com with UTF8SMTPSA id 6a1803df08f44-6f0dea10681sm37921396d6.104.2025.04.11.08.39.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Apr 2025 08:39:10 -0700 (PDT) Date: Fri, 11 Apr 2025 11:39:06 -0400 From: Johannes Weiner To: Vlastimil Babka Cc: Andrew Morton , Mel Gorman , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 5/5] mm: page_alloc: defrag_mode kswapd/kcompactd watermarks Message-ID: <20250411153906.GC366747@cmpxchg.org> References: <20250313210647.1314586-1-hannes@cmpxchg.org> <20250313210647.1314586-6-hannes@cmpxchg.org> <46f1b2ab-2903-4cde-9e68-e334a0d0df22@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <46f1b2ab-2903-4cde-9e68-e334a0d0df22@suse.cz> X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B37C14000B X-Stat-Signature: izfghzdtmwzszsh471b1xu7btagir6ax X-HE-Tag: 1744385952-910324 X-HE-Meta: U2FsdGVkX18aOwnfMsvUznQQI+hFBN4+ipMnMj+KyypX9QX+e0yAQBlH+AY6QbTW/vzV6tAoA5VrCVV7ZA8Loa6pTn57AdeDcr2MIoCuqCy5BbsTBirYCB3y1kSwkqC3hEXcQ9WZDuIR0qnoOnuw3UBCpirKUWho6WNOhpmoOoZu65h85p7pxiw3Gt0oPdDvUpVF5s49MxO6bQzA7Eh50xyY5IUZzPnfG8xwkNbQlOsmzb/0W+ecLlIzfMqfaX5yNRFKBot2nYqlzYgdtTK1ooE5lk5ZmW9BWSNzUO9NYe51mMeenJElXnCA8tCbV3+5dAfqx43TF4vw025suHELQYOfqrXtxH/1GQR/d3AonwTY1KkGJw1I+KQkp3Vaooawra6tqE/qtgwMxPIca4rnWuQpSTxaY0oytDHY+El8mkduW5luGp1BQKMF1GUSbPYLkx0vUBwdkERyw/PvY2zlEM6kS20h6U/6aPQlXItOazSKYdoKKCb8YIdtZfvhAusF6k1BX6+6horjmegcb32QQvR5WI8m0DOcTwQBvRD1oUFDob+yLBYFykuy1kTe3bAdy2csqd2pvZnoOnKdjWn4s8A3As1Drt5XFXr5V8383066axlzlXLgGqcMuQDqAnMMHSBUwy98BrdfiwT0rgQNX8AA19YQSHudOT4+LhBQoEIR3BfTC3qY2vWnouokESvGTmFWcze41mS3/SE2onra5okSZ8f2Dlz39hpCp7GDiYNp0l83mp1kg8UksaBGgsmmN1YUnZ18N8HbidHuwvALQan9joX2xJdW1EfNidnms0JCHMyKADd0/cWlfXIEG/XgVPcZP6+Ko0ImJm378FiOFISiClXVIF4/oB6SnT+aaPJCvqHD522PHh+mnfTvHvXikmcOXcpQClqnqyjbaUxQuUH1517kV9KXV0It50YKzofy/ZUjwb2W7Ir2PGxOEICsfmy6gcmbLLgOSGWZfkl o+j6fhGh RIcTrhisFWP5VUDBzFmswLfWhYASkwwZgqOCFM26HGLmgozQ40wtnLVFACvuZYwYje8TFgdk+2Og8mP3qrSGkAK/YGaRRGmlaTnv6DDas040CDJkYLI2Uw4l9fTU89VVqUETqdBfFDALlPKajyGA7Ttpwasl3Ia5to2TEuNqMYB/qeSPgwOkWPuWfRJnGjIclBWFsXLyuVWsdxsVneFdCueeWgv9TNkVGDaDc3wJ9KCtCCqxHevercxeN2nhMyhnXnTVGNbRSQe2IhkhJPT83RkyQaei0XxLe4Hv9oHMqsLOF+xO5DbojegK8YcmPsehgAa58Zy4z8DjxZEGILP4ll1q7MZTY2EVkl88E1amVj0qOH+Uoz9HpL3X+Kx9YrRsIjs6H7MBIC9FDKFYd1VJaiHfVDuyo5EVmC66vzBWOkaf84akh/YafaTz13XXE8/oWF4ULMlhxwZK4vAtMK13GSZi+dHqHOCIHIE5C6o/eIu3bdFndLNNVF+j3/4542q32IhFr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 11, 2025 at 10:19:58AM +0200, Vlastimil Babka wrote: > On 3/13/25 22:05, Johannes Weiner wrote: > > The previous patch added pageblock_order reclaim to kswapd/kcompactd, > > which helps, but produces only one block at a time. Allocation stalls > > and THP failure rates are still higher than they could be. > > > > To adequately reflect ALLOC_NOFRAGMENT demand for pageblocks, change > > the watermarking for kswapd & kcompactd: instead of targeting the high > > watermark in order-0 pages and checking for one suitable block, simply > > require that the high watermark is entirely met in pageblocks. > > Hrm. Hrm! > > @@ -2329,6 +2329,22 @@ static enum compact_result __compact_finished(struct compact_control *cc) > > if (!pageblock_aligned(cc->migrate_pfn)) > > return COMPACT_CONTINUE; > > > > + /* > > + * When defrag_mode is enabled, make kcompactd target > > + * watermarks in whole pageblocks. Because they can be stolen > > + * without polluting, no further fallback checks are needed. > > + */ > > + if (defrag_mode && !cc->direct_compaction) { > > + if (__zone_watermark_ok(cc->zone, cc->order, > > + high_wmark_pages(cc->zone), > > + cc->highest_zoneidx, cc->alloc_flags, > > + zone_page_state(cc->zone, > > + NR_FREE_PAGES_BLOCKS))) > > + return COMPACT_SUCCESS; > > + > > + return COMPACT_CONTINUE; > > + } > > Wonder if this ever succeds in practice. Is high_wmark_pages() even aligned > to pageblock size? If not, and it's X pageblocks and a half, we will rarely > have NR_FREE_PAGES_BLOCKS cover all of that? Also concurrent allocations can > put us below high wmark quickly and then we never satisfy this? The high watermark is not aligned, but why does it have to be? It's a binary condition: met or not met. Compaction continues until it's met. NR_FREE_PAGES_BLOCKS moves in pageblock_nr_pages steps. This means it'll really work until align_up(highmark, pageblock_nr_pages), as that's when NR_FREE_PAGES_BLOCKS snaps above the (unaligned) mark. But that seems reasonable, no? The allocator side is using low/min, so we have the conventional hysteresis between consumer and producer. For illustration, on my 2G test box, the watermarks in DMA32 look like this: pages free 212057 boost 0 min 11164 (21.8 blocks) low 13955 (27.3 blocks) high 16746 (32.7 blocks) promo 19537 spanned 456704 present 455680 managed 431617 (843.1 blocks) So there are several blocks between the kblahds wakeup and sleep. The first allocation to cut into a whole free block will decrease NR_FREE_PAGES_BLOCK by a whole block. But subsequent allocs that fill the remaining space won't change that counter. So the distance between the watermarks didn't fundamentally change (modulo block rounding). > Doesn't then happen that with defrag_mode, in practice kcompactd basically > always runs until scanners met? Tracing kcompactd calls to compaction_finished() with defrag_mode: @[COMPACT_CONTINUE]: 6955 @[COMPACT_COMPLETE]: 19 @[COMPACT_PARTIAL_SKIPPED]: 1 @[COMPACT_SUCCESS]: 17 @wakeuprequests: 3 Of course, similar to kswapd, it might not reach the watermarks and keep running if there is a continuous stream of allocations consuming the blocks it's making. Hence the ratio between wakeups & continues. But when demand stops, it'll balance the high mark and quit. > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -6724,11 +6724,24 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx) > > * meet watermarks. > > */ > > for_each_managed_zone_pgdat(zone, pgdat, i, highest_zoneidx) { > > + unsigned long free_pages; > > + > > if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) > > mark = promo_wmark_pages(zone); > > else > > mark = high_wmark_pages(zone); > > - if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx)) > > I think you just removed the only user of this _safe() function. Is the > cpu-drift control it does no longer necessary? Ah good catch. This should actually use zone_page_state_snapshot() below depending on z->percpu_drift_mark. This is active when there are enough cpus for the vmstat pcp deltas to exceed low-min. Afaics this is still necessary, but also still requires a lot of CPUs to matter (>212 cpus with 64G of memory). I'll send a follow-up fix. > > + /* > > + * In defrag_mode, watermarks must be met in whole > > + * blocks to avoid polluting allocator fallbacks. > > + */ > > + if (defrag_mode) > > + free_pages = zone_page_state(zone, NR_FREE_PAGES_BLOCKS); > > + else > > + free_pages = zone_page_state(zone, NR_FREE_PAGES); > > + > > + if (__zone_watermark_ok(zone, order, mark, highest_zoneidx, > > + 0, free_pages)) > > return true; > > } > >