From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37F74D2E018 for ; Wed, 23 Oct 2024 06:37:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 996956B0082; Wed, 23 Oct 2024 02:37:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 946BC6B0083; Wed, 23 Oct 2024 02:37:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E7626B0085; Wed, 23 Oct 2024 02:37:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 5DD7D6B0082 for ; Wed, 23 Oct 2024 02:37:27 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D5FCEA0723 for ; Wed, 23 Oct 2024 06:36:55 +0000 (UTC) X-FDA: 82703910030.12.FBB7AE2 Received: from mail-ua1-f48.google.com (mail-ua1-f48.google.com [209.85.222.48]) by imf26.hostedemail.com (Postfix) with ESMTP id 90E49140002 for ; Wed, 23 Oct 2024 06:37:12 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=c3jo44BX; spf=pass (imf26.hostedemail.com: domain of yuzhao@google.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729665292; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3RwGWVZWgUL28jOJ2ZorbzjGVGR4NAagIvpYUFNtjGU=; b=Te7QgYR9khXys8KjYwWMOriS3Tm0cm1png9Mbt8zLdIhptH6rKh+Uu3tDkLerISpkyo9Ew TEA6MFNUQRqzeix/XgirIBEG2EUtdXh3gXkb2MZSmFVHSgNIPm/RhIBI1czGyNFpw0yTr9 YZMxCG8umUrcTmxeg6uCU0BBZ5CQf+A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729665292; a=rsa-sha256; cv=none; b=jsNGlB+baEGG0dmyLvDq3/F7MfSzfOQG8ikSLiEUE+0EvnTft0ufS0qG7oQyaIiMM//R4K HICC3StmebALI9cvlgCvws4BoMsO0UEP+k75NyG8yH16KKulVOvg+rHc2c9C2OClwrjKm5 lOyTDJ0EFGM/gGp8tNde9I/REjl0c7Q= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=c3jo44BX; spf=pass (imf26.hostedemail.com: domain of yuzhao@google.com designates 209.85.222.48 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ua1-f48.google.com with SMTP id a1e0cc1a2514c-84fc0212a60so1686162241.2 for ; Tue, 22 Oct 2024 23:37:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729665444; x=1730270244; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3RwGWVZWgUL28jOJ2ZorbzjGVGR4NAagIvpYUFNtjGU=; b=c3jo44BXRpCd0ahYQ5j6CvEdvLGI/gsLZ+m3nbq/ECuHITRTpTFst1toHr5xwXrK9l yDzCRQVnpmKFOSm41v/XTurfJ4Sa/fOeXRS0bxwvIaxT1ShEm3lwEjRnYpdOLPiJiMbn YHOvIVuBrbcVv+ZLLJj1TZoLjR9umxLgRJpNaLjX41mlitUCPmYz6nlcdbOqSBAQSXiL LEQUbDJbUWYwmSMydTQR3cOC+88ZKdgnhyuMjSdFL7OrOz01Fgib4gzcrdlfT61gSJc/ FeD2Z9cTOF3D00iuePN35A4I8Sa7vQeHXf8QDYjXRG1K1MO8cFwBhQdd6PUzxYHs7TlT B31A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729665444; x=1730270244; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3RwGWVZWgUL28jOJ2ZorbzjGVGR4NAagIvpYUFNtjGU=; b=KO/rdR9f5PdzIr7Th8uxJHQpUs/7gUWjGZDik8rNGgBSHXZ9hTNy7lnUzXWCb+DDUL nyvoGTiEiXeKsqJJ9VpMRZCqTx/w7uOXh+CpPoOgLxaHH6K+RgAhXe1d/Wgq69AjL+Tm bLWxg067Y+23tB4skbPhT3RGYYPz5vWgymULJIFHQdf+Zlf+f7bkJZZKAoZqpB0Fga2p PEnSDe+nkJeCIauFKHwu7logfbHq2gIbUJCncJUMH2zkOL9EU3sI57fmv1cY72dzHvgq uI806GuynSUv4TstZ60Hf6UfZQd2zDwIF32sDQWqLlxyB2XX+HC5+YJhyLO+N3UI8JQ7 ZE8A== X-Forwarded-Encrypted: i=1; AJvYcCUWz/IDLbS4SiCLkgYI7U8xVifU1RyMOSiXE52E1Su96shELE+Ki3jHWKPVmndYLNHjeK3LupBaoA==@kvack.org X-Gm-Message-State: AOJu0YxzfYtsvkQWAxYmXTCxkns6TwbSqwBynSq/gMpUrzVutpe2I4xu EkIAFnTWn/3Z5v5QVmltsVEAL/FAsm4VJ6eypcj3Sgd6mzvLKNCvGrWpHvVJ2hj9wMisT9DaxEa FuLE1FHa1I1fn4/CK7LaywO3u+Mlu+ERkGRR+ X-Google-Smtp-Source: AGHT+IH/ZsPlxgBddWsHfy39It0obvWBL67n4AK9/EC/iSAclsEp52smPUSwcD1dGZTcT4IWFQe34vwo7pjm/KxvH9g= X-Received: by 2002:a05:6102:5113:b0:4a4:6a8a:d2dd with SMTP id ada2fe7eead31-4a751c94356mr1642162137.21.1729665444011; Tue, 22 Oct 2024 23:37:24 -0700 (PDT) MIME-Version: 1.0 References: <20241020051315.356103-1-yuzhao@google.com> <82e6d623-bbf3-4dd8-af32-fdfc120fc759@suse.cz> In-Reply-To: <82e6d623-bbf3-4dd8-af32-fdfc120fc759@suse.cz> From: Yu Zhao Date: Wed, 23 Oct 2024 00:36:46 -0600 Message-ID: Subject: Re: [PATCH mm-unstable v1] mm/page_alloc: try not to overestimate free highatomic To: Vlastimil Babka Cc: Michal Hocko , Andrew Morton , David Rientjes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Link Lin , Mel Gorman , Matt Fleming Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: ku95ffs8izc4df5kzjmsbtg6ggtoappc X-Rspamd-Queue-Id: 90E49140002 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729665432-322978 X-HE-Meta: U2FsdGVkX1+ubw7iKi2/8aF99OCelhTU2DzoQFSv0ZgJl6Sfk2FlDTbBKXFVtViObiI7Iw777xAU1HObPUy1E+3NbMCp20wRH7pUAAW3OsAydMC4L3Wb+xM0mHvqaHcmIcQV08kszlDfUYUsmNRSUMU4drQmPsY+5+PrixcMf9H9btmEuNFpk2Mydw1Ui+EA/l9dTw5RkdUUakMb5e4u4zEJJlJkuRPEi6Si63mDFwc/UoYIcVsTTKM1QJsPCMD9CsGojnSLUo2Ys0maqxLcoVugovYMn/DaTC/vyDimjwhAibb1R3DBUwzrHz+HgKDZhRsErR3FTUiNwnwK49esA1TBdL4yTaihSQIo3dOvfhF7l8rPAOL9pSwa+AlnQrMospTktKHhNxNbjJpfAZRmjbGykKqf9hyNW2j5h+yNishVBj6bEZpwG2IHkrmrjyINLFYHYcRSBm/4oLLo9DEJ5FOg2wW8eV5gNYJrkaLdFSVic4J5NpOax7+ouwdo7axTABGK3w/HT30lk2ZRdj/uq+V/gKIuxBTlgAnjQUI/Thl7X/JJZhQjmFZjwU+V2qYW9DCmXZ9u8bwiMkVp0IZf5SzhkGGCLsiftaqEvjvB8LxXCH/f3Qhp9XcFyiEXTd3WIbP2Tv58pAUdBIyBzIKwNH3XqsXkfpZxnEzADyWaw+m87R6mqY7sLSxvgfNz7fpoLHoebApWA0rtOKaPu5mPB35bJYyOKUxDtZ3cAFeOexA1DIO6uWGiR3Vb5ZZRBhLWXXXb2n9jzZe2UH7DAhscq//zt+tTWK6zyqst0s9uc09NcNPXRIRa0UqY6EZ+J/CCD5vPMAWxIbgwkifFciEj7XFbYmpWhrZIg99vROdNM67gt8VSaPdIymDlXkQfNnjn0uj3E/CfoOn2Cl/SuaVusDXNzLxKJQGYVtahH2Y+qWMUTRcQMEMopzxV2WMYwOVX77nH99KRY7+NXPaFvV8 G5OmE90Z Sa7A1BuCRFW28ibGUcQg+tTTLVCX6On2dE63GMHC5iiLehH6ufvIG25zxOIILiGKAfTFd5zIGuQ/h3dcDaAOkSRHLlaxbrYxiLQPFPwv54RJtciNlX1HZqWgDv+L6nUBesKhRj+fVeohdYl/QM3TfyOG3f0DV4G1lAcJ/dpLwqswskFJOPVkZi15WdCPzhuHnpFVuKrKsX2abVICy5QuuvG5EQ3Bm+f14R+sAotd1Rzff9x//H8+h7gxU6gvskXBkPrVn7Oqj5s2j2tn95As7PkWPr9ERwh3OsGaqsckqFA8EbPo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Oct 22, 2024 at 4:53=E2=80=AFAM Vlastimil Babka wr= ote: > > +Cc Mel and Matt > > On 10/21/24 19:25, Michal Hocko wrote: > > On Mon 21-10-24 11:10:50, Yu Zhao wrote: > >> On Mon, Oct 21, 2024 at 2:13=E2=80=AFAM Michal Hocko = wrote: > >> > > >> > On Sat 19-10-24 23:13:15, Yu Zhao wrote: > >> > > OOM kills due to vastly overestimated free highatomic reserves wer= e > >> > > observed: > >> > > > >> > > ... invoked oom-killer: gfp_mask=3D0x100cca(GFP_HIGHUSER_MOVABLE= ), order=3D0 ... > >> > > Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB= high:1068392kB reserved_highatomic:1073152KB ... > >> > > Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB= (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (= M) 0*2048kB 0*4096kB =3D 1477408kB > >> > > > >> > > The second line above shows that the OOM kill was due to the follo= wing > >> > > condition: > >> > > > >> > > free (1482936kB) - reserved_highatomic (1073152kB) =3D 409784KB = < min (410416kB) > >> > > > >> > > And the third line shows there were no free pages in any > >> > > MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as ty= pe > >> > > 'H'. Therefore __zone_watermark_unusable_free() overestimated free > >> > > highatomic reserves. IOW, it underestimated the usable free memory= by > >> > > over 1GB, which resulted in the unnecessary OOM kill. > >> > > >> > Why doesn't unreserve_highatomic_pageblock deal with this situation? > >> > >> The current behavior of unreserve_highatomic_pageblock() seems WAI to > >> me: it unreserves highatomic pageblocks that contain *free* pages so > > Hm I don't think it's completely WAI. The intention is that we should be > able to unreserve the highatomic pageblocks before going OOM, and there > seems to be an unintended corner case that if the pageblocks are fully > exhausted, they are not reachable for unreserving. I still think unreserving should only apply to highatomic PBs that contain free pages. Otherwise, it seems to me that it'd be self-defecting because: 1. Unreserving fully used hightatomic PBs can't fulfill the alloc demand immediately. 2. More importantly, it only takes one alloc failure in __alloc_pages_direct_reclaim() to reset nr_reserved_highatomic to 2MB, from as high as 1% of a zone (in this case 1GB). IOW, it makes more sense to me that highatomic only unreserves what it doesn't fully use each time unreserve_highatomic_pageblock() is called, not everything it got (except the last PB). Also not reachable from free_area[] isn't really a big problem. There are ways to solve this without scanning the PB bitmap. > The nr_highatomic is then > also fully misleading as it prevents allocations due to a limit that does > not reflect reality. Right, and the comments warn about this. > Your patch addresses the second issue, but there's a > cost to it when calculating the watermarks, and it would be better to > address the root issue instead. Theoretically, yes. And I don't think it's actually measurable considering the paths (alloc/reclaim) we are in -- all the data structures this patch accesses should already have been cache-hot, due to unreserve_highatomic_pageblock(), etc. Also, we have not agreed on the root cause yet. > >> that those pages can become usable to others. There is nothing to > >> unreserve when they have no free pages. > > Yeah there are no actual free pages to unreserve, but unreserving would f= ix > the nr_highatomic overestimate and thus allow allocations to proceed. Yes, but honestly, I think this is going to cause regression in highatomic allocs. > > I do not follow. How can you have reserved highatomic pages of that siz= e > > without having page blocks with free memory. In other words is this an > > accounting problem or reserves problem? This is not really clear from > > your description. > > I think it's the problem of finding the highatomic pageblocks for > unreserving them once they become full. The proper fix is not exactly > trivial though. Either we'll have to scan for highatomic pageblocks in th= e > pageblock bitmap, or track them using an additional data structure. Assuming we want to unreserve fully used hightatomic PBs, we wouldn't need to scan for them or track them. We'd only need to track the delta between how many we want to unreserve (full or not) and how many we are able to do so. The first page freed in a PB that's highatomic would need to try to reduce the delta by changing the MT. To summarize, I think this is an estimation problem, which I would categorize as a lesser problem than accounting problems. But it sounds to me that you think it's a policy problem, i.e., the highatomic unreserving policy is wrong or not properly implemented?