From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D950D1CDD5 for ; Tue, 22 Oct 2024 10:53:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A140C6B0083; Tue, 22 Oct 2024 06:53:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 99CA76B0085; Tue, 22 Oct 2024 06:53:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EF676B0088; Tue, 22 Oct 2024 06:53:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5D1026B0083 for ; Tue, 22 Oct 2024 06:53:20 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E0460141890 for ; Tue, 22 Oct 2024 10:53:02 +0000 (UTC) X-FDA: 82700925972.06.EF87661 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf14.hostedemail.com (Postfix) with ESMTP id B2E1F100012 for ; Tue, 22 Oct 2024 10:52:59 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=ebQhSo8r; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=YMEJyFwb; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=ebQhSo8r; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=YMEJyFwb; spf=pass (imf14.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729594246; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LMW+K6rUQg/2azU5UZpb7HQSGIwUBL2V/U6CHfSO/2E=; b=6VezY/jGraYnKvk8QxcaKHUp/lH+3Uv9e4TWBusTg+xJAR7sqTkif/lEpTOxGoDVGhfHRq pRaBzwPB8YEpsTrRMz3Btdc/pH8yjr223H4ETbAW9S0q0FqIqCtdmrO3YxznEU1LMNNUTN I2ZfyChKJRYoAVeou6vdly/2zni2MoQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729594246; a=rsa-sha256; cv=none; b=YUi5kRPD4jVuL2OKj8zxJ/9cx3yna3capeCaUAi9UiMQhw1E6q+ZaBx6wlRf0D8R64jJTb jFLPFdnXCwI5anoM9Y54owvpkyTpiiu15nOXT1r7N2gQXXGZES42dLAwOtRSU1wuORVa4F nCoNagEHPE0fBnBXV/QKceUMs5YFggM= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=ebQhSo8r; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=YMEJyFwb; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=ebQhSo8r; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=YMEJyFwb; spf=pass (imf14.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C98581FBBC; Tue, 22 Oct 2024 10:53:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1729594395; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LMW+K6rUQg/2azU5UZpb7HQSGIwUBL2V/U6CHfSO/2E=; b=ebQhSo8rNsPES7IQTs+ZrGw33maldAHJvOZnIBrvVXRBsSvj9u9EEXRYtQixS7cYsKqfj9 qjZpSiLDaeCppWZHTUaTiev0MfhNXpqn7A9yBKDnh6A0OlzmYw0e0qbVFsQhOdVPugVm8I tRHIkd4McvM4LfFEzVR4fCzUzdP0Do0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1729594395; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LMW+K6rUQg/2azU5UZpb7HQSGIwUBL2V/U6CHfSO/2E=; b=YMEJyFwbds9mO3ZyDM+2bgfLUU0iIpNqbsjYAgqyHSB+o0Oql6tJSa0Qw7Cr0RiBEuH0U8 b8jP1etz1Rvkk9Bw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1729594395; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LMW+K6rUQg/2azU5UZpb7HQSGIwUBL2V/U6CHfSO/2E=; b=ebQhSo8rNsPES7IQTs+ZrGw33maldAHJvOZnIBrvVXRBsSvj9u9EEXRYtQixS7cYsKqfj9 qjZpSiLDaeCppWZHTUaTiev0MfhNXpqn7A9yBKDnh6A0OlzmYw0e0qbVFsQhOdVPugVm8I tRHIkd4McvM4LfFEzVR4fCzUzdP0Do0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1729594395; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LMW+K6rUQg/2azU5UZpb7HQSGIwUBL2V/U6CHfSO/2E=; b=YMEJyFwbds9mO3ZyDM+2bgfLUU0iIpNqbsjYAgqyHSB+o0Oql6tJSa0Qw7Cr0RiBEuH0U8 b8jP1etz1Rvkk9Bw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id B34CC13894; Tue, 22 Oct 2024 10:53:15 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id f5J1KxuEF2eKKwAAD6G6ig (envelope-from ); Tue, 22 Oct 2024 10:53:15 +0000 Message-ID: <82e6d623-bbf3-4dd8-af32-fdfc120fc759@suse.cz> Date: Tue, 22 Oct 2024 12:53:15 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH mm-unstable v1] mm/page_alloc: try not to overestimate free highatomic Content-Language: en-US To: Michal Hocko , Yu Zhao Cc: Andrew Morton , David Rientjes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Link Lin , Mel Gorman , Matt Fleming References: <20241020051315.356103-1-yuzhao@google.com> From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B2E1F100012 X-Stat-Signature: iid8983mydazzfb7y6wtxixsxxw16snh X-HE-Tag: 1729594379-616943 X-HE-Meta: U2FsdGVkX18qeRS7htJtXm8KX5W2/UgQ/8hEI96sDKQwovTlKhEMnsQs3PcPIoH5krkj9uxpyFHFiX9UD29n/aYXDwqXlI1ia1dN+KPJUML8Jm1FQYukeRi4kUTUoWJAwbHpFTOC7DyrlrCEF/FhcUd0/ad3Nb4I7pC6t7CM/yRE4I1LprMf3ZxfYNg65VLg0UM/iiO1Qk6p8nHZVfuiuf6t1Yz6dhTcjt8YlWD05ZcKEtj6oMe3qGK31tS4jqH1fo8qldScGhJKD8QOGmJDorGyu0U3Antm3h7TUuYwlCnkhhbL89Tekdz3PU5yY94f6NV/320WgEjcifToYYVH7Z2QvRiRSbQLpdfv1JjZL8e9WSOMLTln1Ie4YWqmmKUtR91rTaz7KVz8GtWCVgH3UmU1F7FK8w+XfUR2JcOPUHZL7DiqfIDNTxSL/By61QLT7BdSgQ5GX6UQu/gIH9Ovc+dj18xnJsDLE6MHYR/O0X56PUvylcErtjckSo64UvvSvSUG4NYm0RQK6OQwjUsfYiWp0DHViZdfFKVFbYleh2qd9OFQ/hB2xlSuJRS7ayWkNunx2Hb6ZqHBK7WfCNeYDBm0htgTunNszw33DYbTehD3eEcSGBA7MKqfJS847lf52W8YLkyrqN6JQTdNPFUW7YiQ+A9mBQT5UmC71hCHCNITcOFmhslRLIgEDEPXM9adUL8CclmKoZN3ProwKrdwtHE2lqVrsJBhP9Cx6QInK/GVikrwgyjJ87ADGDsNhS9oO/H8g2NuP+nRPDCs1+8RwJKPDILaKvEZAaEHRYOVzmExHGvKGtHQ11mlAhyRIRwWSjh4U2HQR76oOSanPPz0i95ElNxVdbak1wwW5jjmNNSujdz25dmj/ADyqAoCG/0rJeJlfuVPQ5cjxM/YY7oVhe/Vp7/DzSN+MpCxiAXWL0R/+lKYBa1FVU44kmI1jG4y69GVXPcNSOGfuiGL2Al NkwBPA/R maJEDKzP5Mx1jq8yF9yvlvCiMMJJDDe8oeJuFgnFEL8kHvwTy/L1t213GrVOg/doKPMXTPNwyLq2nauj40NBkO8hmZdoJGIC2wbTWgFNVSQ53pZ6OBlA6ZAm3rKLRFq+3MqUCcbvluFZbuLeq3uBoLd4UQlc8Oh/Sv+DH1I7fJZwqDZ/2Fwjr6QzuJGdPKpu8L1ZcYoQpe5NwniWiX3IXe4AGhxyvYRXUYnpPiPoXai8m5vddGgAsSbGA27i8Pwi+nhyk0BXutsryqec7bsbCrsxZmvGhKP7QmzbL6jS/nHg8ggX+LIRM5cCkvupHl0w7C0Z3bKl5MlPyS/MKSFhuSkD7BuJxNyTu92mTgy+M/Op+d4Q1iSFISkQ8a48vxGvGz96INF2ivEjvT16SH82oMFkrYmTRGIWMdJ+x X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: +Cc Mel and Matt On 10/21/24 19:25, Michal Hocko wrote: > On Mon 21-10-24 11:10:50, Yu Zhao wrote: >> On Mon, Oct 21, 2024 at 2:13 AM Michal Hocko wrote: >> > >> > On Sat 19-10-24 23:13:15, Yu Zhao wrote: >> > > OOM kills due to vastly overestimated free highatomic reserves were >> > > observed: >> > > >> > > ... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ... >> > > Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ... >> > > Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB >> > > >> > > The second line above shows that the OOM kill was due to the following >> > > condition: >> > > >> > > free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB) >> > > >> > > And the third line shows there were no free pages in any >> > > MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type >> > > 'H'. Therefore __zone_watermark_unusable_free() overestimated free >> > > highatomic reserves. IOW, it underestimated the usable free memory by >> > > over 1GB, which resulted in the unnecessary OOM kill. >> > >> > Why doesn't unreserve_highatomic_pageblock deal with this situation? >> >> The current behavior of unreserve_highatomic_pageblock() seems WAI to >> me: it unreserves highatomic pageblocks that contain *free* pages so Hm I don't think it's completely WAI. The intention is that we should be able to unreserve the highatomic pageblocks before going OOM, and there seems to be an unintended corner case that if the pageblocks are fully exhausted, they are not reachable for unreserving. The nr_highatomic is then also fully misleading as it prevents allocations due to a limit that does not reflect reality. Your patch addresses the second issue, but there's a cost to it when calculating the watermarks, and it would be better to address the root issue instead. >> that those pages can become usable to others. There is nothing to >> unreserve when they have no free pages. Yeah there are no actual free pages to unreserve, but unreserving would fix the nr_highatomic overestimate and thus allow allocations to proceed. > I do not follow. How can you have reserved highatomic pages of that size > without having page blocks with free memory. In other words is this an > accounting problem or reserves problem? This is not really clear from > your description. I think it's the problem of finding the highatomic pageblocks for unreserving them once they become full. The proper fix is not exactly trivial though. Either we'll have to scan for highatomic pageblocks in the pageblock bitmap, or track them using an additional data structure.