From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4D392C44508 for ; Thu, 22 Jan 2026 01:40:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B21E26B00A2; Wed, 21 Jan 2026 20:40:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AF9696B00A3; Wed, 21 Jan 2026 20:40:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A053E6B00A4; Wed, 21 Jan 2026 20:40:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8D8076B00A2 for ; Wed, 21 Jan 2026 20:40:45 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 473C5BEF38 for ; Thu, 22 Jan 2026 01:40:45 +0000 (UTC) X-FDA: 84357895650.27.80D7E08 Received: from mail-dl1-f67.google.com (mail-dl1-f67.google.com [74.125.82.67]) by imf29.hostedemail.com (Postfix) with ESMTP id 688E4120002 for ; Thu, 22 Jan 2026 01:40:43 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=V0cMtkd7; spf=pass (imf29.hostedemail.com: domain of realwujing@gmail.com designates 74.125.82.67 as permitted sender) smtp.mailfrom=realwujing@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1769046043; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bv3y2V8fADohDcIx1NVelau+w+a6LhiaxX40puOmnps=; b=wgXw64q3VenOeIfJ3y4kbkENKp2IWmkL2ETqbxP97FJH96+zmp+a9jdxPMT+XHqxQ1mL5z zFM4AcYM25nbT0gKji3/nx8qRRtuAbl1iPYXkonIc23lA3ibrs0SccXVKWsSYj0f2qOg0Z Q4ksgeVYBO8aCCkrJ8iTk+kOZZTJJng= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=V0cMtkd7; spf=pass (imf29.hostedemail.com: domain of realwujing@gmail.com designates 74.125.82.67 as permitted sender) smtp.mailfrom=realwujing@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1769046043; a=rsa-sha256; cv=none; b=zAjgtGkQHX5WqpE8nKjZ9qfqTOqo68sxCjIrHRdgoNV+/r1mjmRF0lIOSedUQQhyYJXH0G y5imfaZpnBeCPMDeXp/9czACrKw6ZWLZnsO0oJ6fKRy4xnK7sSiUnmXU8cIAfD7a49qn/5 kaYGRr9VleHbxXrKesg0nBAKhKdKfsQ= Received: by mail-dl1-f67.google.com with SMTP id a92af1059eb24-121bf277922so823925c88.0 for ; Wed, 21 Jan 2026 17:40:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769046042; x=1769650842; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bv3y2V8fADohDcIx1NVelau+w+a6LhiaxX40puOmnps=; b=V0cMtkd7t10iP/SQgUdYNhXhlEU7iAH2YDC7XLDh4pAeMHojSiF14SSEagw2tqjngy e1nUq/KVa4Hg749E+iy9BuaOlAi7mhwAJzGxgdrHmjDBJJAxQqcDoUyguf/8qvJhblXQ 7HSj/Q41JCewiANs/7hdnCDMeQubW9oZcQBlME88tgS7zkHQGLBryoe+6Sx+gllImt+s EqfsW8o1I0YjFCYpFYHAXo7qzpycQdknt0A2c6ImYPRFFpdPbnFQyck5GcR/NqsdOccH NyOb+79HP2x0YzvUYEsuy2XJwq6IacdZEC/JZ/bSF4IJ4jUYkVBSkfyV2LCpLB6ksBRq 33RA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769046042; x=1769650842; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=bv3y2V8fADohDcIx1NVelau+w+a6LhiaxX40puOmnps=; b=V+2WkeKa8PW9m23JpTH8IsOjygUuqFISG6ycDMaqnUQ4iBDnjFUDleEej/88+t++Xy A8EisfaM8Os54Dc+8pjm1Jd9EwJL2b9mIhMD1dOB4+5CR6EYlGkufGiGsBdIGdi9Tj8v B21WA+PqjfFcT8MEF6H6IoIzxWL7wfp/NFGF5BZjbTyU3KRvrYZ9p4YekVtzoM8Y1Tbe VW2mug7r3JTPaEB31vzzO3T1m18ZDvc8u8Aab1YD+aTklgGiIgc+yD8rdKZu2Tg6lAh8 CzvntoYKO4iU2YCbF8aw9lQszWcCf8dGTPzVtsNXxmP9ajxyGNCKPTU+T2AXm2ihLLAr yAcA== X-Forwarded-Encrypted: i=1; AJvYcCX37JWb5fvnVJpfGCajA75ArlgCZP8mgjg/X/iqGj4R09/GWFHhUakx05Gmp7D7m1yf7E1Z9uyoog==@kvack.org X-Gm-Message-State: AOJu0YzBgjQ1e83BPvyGxltXI9bSssYt4TrJFv2Tamp6elI5QQ32stg+ 09Tc68lim9581FIGpmbOYObYxJwEy4X31A0ZFL6h42jQYqvERMvN04Nq X-Gm-Gg: AZuq6aL1zE9afHC4C5uRZREbGZEeYY4o25nrlTDZjyh7QDGQ+rA3+1KFvCABhJrGugJ dSApgeItthIF0pDuZ2JJ/Dhnx4jmkzEbjmCK73cI5f8k6mnceBpK59if6v8iMeuFjzS30HAnd7p Jm9o2aMLLeDmTCdeN8XsheWo3q8Nl+r2tA7JeIYRbFMdDkQ8qpelwpIhdV1XUHUQhLbUKN0slJg VmpUGSZ+3+049+Ycs0GtJbaLOqIKl1USZk/sL51k2XZ3xgkCt/bQ6+ZkysldsAeK4szKnP6XDr/ cewBQUlePFLNIXCA8f+TmHYCS6hn7XQM1WlW8Qox8mJx1vmE5mbzM6+wUcTMXpDGg4JWpQv9BAC JzhigKmiASO8LtQIq7HgT6zRv30J9MJL5TPQapZuWoCH+cxTU/nyIxCRQD1733bxeB6kMOwlI88 ud1ho= X-Received: by 2002:a05:7022:6284:b0:119:e569:f611 with SMTP id a92af1059eb24-1244b30446bmr12551894c88.10.1769046041263; Wed, 21 Jan 2026 17:40:41 -0800 (PST) Received: from debian ([74.48.213.230]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-1244ad72063sm29147090c88.6.2026.01.21.17.40.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Jan 2026 17:40:40 -0800 (PST) From: Qiliang Yuan To: akpm@linux-foundation.org Cc: david@kernel.org, mhocko@suse.com, vbabka@suse.cz, willy@infradead.org, lance.yang@linux.dev, hannes@cmpxchg.org, surenb@google.com, jackmanb@google.com, ziy@nvidia.com, weixugc@google.com, rppt@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com Subject: Re: [PATCH v5] mm/page_alloc: boost watermarks on atomic allocation failure Date: Wed, 21 Jan 2026 20:40:10 -0500 Message-ID: <20260122014034.223163-1-realwujing@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260121125603.47b204cc8fbe9466b25cce16@linux-foundation.org> References: <20260121125603.47b204cc8fbe9466b25cce16@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=y Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam12 X-Stat-Signature: o6fr6dzft71xszjpjdn9xd6d9iuenf57 X-Rspamd-Queue-Id: 688E4120002 X-Rspam-User: X-HE-Tag: 1769046043-504667 X-HE-Meta: U2FsdGVkX197oXTGcB9NDoW8mwsFSULUn7icYYBnDDxyIMFlqBLMKUpc/ItjjpJsMlswTHzW6zgRJkAJ3ilb+tF0Bdag51+aYro6V8OWOBgyew90EZxkU24flIa2X2zEh5O9vnNJxpSHGX2tygc4RqubsNyVc084pjjce6xj53tSOrU7dc01Z+e6RrmIMi4OynvrTEjnj3nfeA+6FoypWrgNq5yIPXGiIByTi1qINoPK6LwkoR90fSHDOoEhbRwgdBRZton4sa1uwf38VEBsHji7ZM/P/broaEHiU1BOrYyGfG36ztgbi/GjTr0OGP25lGc+eMVwB6UlNJHmQjm6vq1yY4MIuv2Q47PTeaI5FwtbYjjkyoSh6+1a67sfVeag+cob2yOjpZ++3d/nTqFsDxjUfEUgS+SLCOyGe1F50vdr5QOcmTbwHth2w8dblMcU9qfSJPsAGPE0QUYLAfMQrirFOa2XlP2pIOWNM7Bud1swjZyNCArangfsOQokzXRXy0QCJfCMsbWvVLA8d6XtdQ65fSX2549ophecuRNApLoWUFsbvzQUbuQ2EIcza25FhntPUlemd+rNu/V0fInQdizj8JPpJFhQoMRqErhoio4f5qsu8LVfRyrz+zKTgDRKpcvgaDuP63scxqA6nIRg5AZ9+ZDl4h9yhideDYw2EtD+JKw1NZ9VAEDANU+TCHY4y+/vI8aGEr1XoXbraR7e0OOS7bH8Aqw0ujdY8Ws6GdgVzi/zRTI6HeDM5l82RL8O7b681kq/+lYJc/yo7OIIUw0YCS0nNONOvgcyxkqOhAZkScnyJOHGE//WXc8RQgO2NNbb/ZfLrc0phDpPOyBGW0aIQL886riPrKGilx4zUvyJSrm+kPTMqZNWTetQb3h6lVsiflr9MDtP5+U+AlWLUdD9LqA03IQlpyH8VPulxj2lBGuJkatVYRk9GEw7XXEFjgyFp0oS1BW0ZL/+Yks DCGZlDgR /qL8aLsGVI7+vJvUhx5Vv93ImlCYuiqe2NCLpJgYjUWCzRko3J7WGiYvh+heyTE1tjF1pFikmjo8MdymfRdm5JrbkBcvFw49xdwACt1zf/xYn/3BFM1NF9cSZZeTWOXhTFkt+OeUhiTBca1h1hTJ3VKl4aTExeJBlLknD24QfYKUjY7tmLiaEKlJ5K068tndJ3942zU56HRT/cEJHM1DScHxsz+Met2bCryOlf8W7eAdOebVT8zzMb9WyvPeKDnoctBeZ8B6Agflkg2Ft5w2bfsYDC4vIt3nrF3PuYCmBKgl5PmO1m7NbnubUnnH/7OWUkm0oc4zjQUaONtCyBZNYfIXV88Icm6nwtIZiRC0KATa5menrcMoj4404PpM/ogEEI8Dbgy8q7swL2gJ9Ytsy5PRZpOkkkripYOEAKI08nFCzAc3LF944GUBM3Ix7lUfXeISrHWkI5+BiEgHQddhSYdqbmI7tvY7SzApPcCoy/41pxDh9LEuOjaFtI6X/6jLYDsA6iHZzv2hRKg0gJBeaMU8lWbKmeUe7+SAcdCx/GXMBRChQlJeCEXreKLN0RLgPkdkmq2FpD1eXl3nQL7uDECeyNPHCxTzwcUylne4PRXjXyv8MpHkYZ3PMnzSGPxIVpkdJxi9vweMi4RAtN1J2tRK1zNUuHwTtdzT3gxIOhqonvC6yUKiQVep6Rg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 21 Jan 2026 12:56:03 -0800 Andrew Morton wrote: > This seems sensible to me - dynamically boost reserves in response to > sustained GFP_ATOMIC allocation failures. It's very much a networking > thing and I expect the networking people have been looking at these > issues for years. So let's start by cc'ing them! Thank you for the feedback and for cc'ing the networking folks! I appreciate your continued engagement throughout this patch series (v1-v5). > Obvious question, which I think was asked before: what about gradually > decreasing those reserves when the packet storm has subsided? > > > v4: > > - Introduced watermark_scale_boost and gradual decay via balance_pgdat. > > And there it is, but v5 removed this. Why? Or perhaps I'm misreading > the implementation. You're absolutely right - v4 did include a gradual decay mechanism. The evolution from v1 to v5 was driven by community feedback, and I'd like to explain the rationale for each major change: **v1 → v2**: Following your and Matthew Wilcox's feedback on v1, I: - Reduced the boost from doubling (100%) to 50% increase - Added a decay mechanism (5% every 5 minutes) - Added debounce logic - v1: https://lore.kernel.org/all/tencent_9DB6637676D639B4B7AEA09CC6A6F9E49D0A@qq.com/ - v2: https://lore.kernel.org/all/tencent_6FE67BA7BE8376AB038A71ACAD4FF8A90006@qq.com/ **v2 → v3**: Following Michal Hocko's suggestion to use watermark_scale_factor instead of min_free_kbytes, I switched to the watermark_boost infrastructure. This was a significant simplification that reused existing MM subsystem patterns. - v3: https://lore.kernel.org/all/tencent_44B556221480D8371FBC534ACCF3CE2C8707@qq.com/ **v3 → v4**: Added watermark_scale_boost and gradual decay via balance_pgdat() to provide more fine-grained control over the reclaim aggressiveness. - v4: https://lore.kernel.org/all/tencent_D23BFCB69EA088C55AFAF89F926036743E0A@qq.com/ **v4 → v5**: Removed watermark_scale_boost for the following reasons: - v5: https://lore.kernel.org/all/20260121065740.35616-1-realwujing@gmail.com/ 1. **Natural decay exists**: The existing watermark_boost infrastructure already has a built-in decay path. When kswapd successfully reclaims memory and the zone becomes balanced, kswapd_shrink_node() automatically resets watermark_boost to 0. This happens organically without custom decay logic. 2. **Simplicity**: The v4 approach added custom watermark_scale_boost tracking and manual decay in balance_pgdat(). This added complexity that duplicated functionality already present in the kswapd reclaim path. 3. **Production validation**: In our production environment (high-throughput networking workloads), the natural decay via kswapd proved sufficient. Once memory pressure subsides and kswapd successfully reclaims to the high watermark, the boost is cleared automatically within seconds. However, I recognize this is a trade-off. The v4 gradual decay provided more explicit control over the decay rate. If you or the networking maintainers feel that explicit decay control is important for packet storm scenarios, I'm happy to reintroduce the v4 approach or explore alternative decay strategies (e.g., time-based decay independent of kswapd success). > > + zone->watermark_boost = min(zone->watermark_boost + > > + max(pageblock_nr_pages, zone_managed_pages(zone) >> 10), > > ">> 10" is a magic number. What is the reasoning behind choosing this > value? Good catch. The ">> 10" (divide by 1024) was chosen to provide a zone-proportional boost that scales with zone size: - For a 1GB zone: ~1MB boost per trigger - For a 16GB zone: ~16MB boost per trigger The rationale: 1. **Proportionality**: Larger zones experiencing atomic allocation pressure likely need proportionally larger safety buffers. A fixed pageblock_nr_pages (typically 2MB) might be insufficient for large zones under heavy load. 2. **Conservative scaling**: 1/1024 (~0.1%) is aggressive enough to help during sustained pressure but conservative enough to avoid over-reclaim. This was empirically tuned based on our production workload. 3. **Production results**: In our high-throughput networking environment (100Gbps+ traffic bursts), this value reduced GFP_ATOMIC failures by ~95% without causing excessive kswapd activity or impacting normal allocations. I should document this better. I propose adding a #define: ```c /* * Boost watermarks by ~0.1% of zone size on atomic allocation pressure. * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size. */ #define ATOMIC_BOOST_SCALE_SHIFT 10 ``` Best regards, Qiliang Yuan