From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB50CD2E01E for ; Wed, 23 Oct 2024 07:35:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FEFF6B007B; Wed, 23 Oct 2024 03:35:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5AED46B0082; Wed, 23 Oct 2024 03:35:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 428F36B0083; Wed, 23 Oct 2024 03:35:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 25A0F6B007B for ; Wed, 23 Oct 2024 03:35:31 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B3741AB5F9 for ; Wed, 23 Oct 2024 07:34:35 +0000 (UTC) X-FDA: 82704055602.18.818F0EF Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf11.hostedemail.com (Postfix) with ESMTP id D838140005 for ; Wed, 23 Oct 2024 07:34:45 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=WKXSi68d; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=Q83Gw9h+; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=WKXSi68d; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=Q83Gw9h+; spf=pass (imf11.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729668755; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iZIdhv9BeZcI+mn12fiemly8KbslTq+qKAU4CBI1fRE=; b=k/rM409hFgTiQ3B1u8C0XUPPiaZ070fCfeUyxo9QFA2xej6o7zQCwpwicFDHT4WI8uT9Ko Q3p5McdDTzf2PAM+DoNEgsabX0ztUZH1IbQ0joDouiCFEjUjh6mL9uq56ia5Xjp0ONQ80R 0DS6FsTkekwa/hPBctuf1apgzzMPDlI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729668755; a=rsa-sha256; cv=none; b=YF/oF9h0y2xjoHCU1gmYsNX5tAUM0LfMgkAfJZt2bX0M7qZZrwSUcENy7mIC0AwfGKvKSm iluloiqfCvhU6Z3JhB3RfkT+wUAeP74fOgFi4YX6SC0BqyScnhnLMR0PkcxCfmKGGHGtMO l0zvQIA3aMg+CRJBeww6DWh+nGjxi0E= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=WKXSi68d; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=Q83Gw9h+; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=WKXSi68d; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=Q83Gw9h+; spf=pass (imf11.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.131 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 92FF91F7A1; Wed, 23 Oct 2024 07:34:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1729668899; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=iZIdhv9BeZcI+mn12fiemly8KbslTq+qKAU4CBI1fRE=; b=WKXSi68drJAB0m+75Bmg1QuH/xhWWp8yXf5+nsxMOzrxBA4clNilfCLsIiQWi7IAvLUl0A qp7UC0uG0ePxwKWUWrp2p2ld8mwZJwe5RtjA3eIry91zXc4Mn+KfI3GYRerY6NYE5EXu72 F57GGxZFtl05sl7bnn8YQlfKEfahfiQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1729668899; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=iZIdhv9BeZcI+mn12fiemly8KbslTq+qKAU4CBI1fRE=; b=Q83Gw9h+djmUsQLuQCoauSzPlghBxzYKZvAsQRWystZgqpArVAsOSLBUc1Wmo5cfNLb/vX 6hKB7bxWKTttfTAg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1729668899; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=iZIdhv9BeZcI+mn12fiemly8KbslTq+qKAU4CBI1fRE=; b=WKXSi68drJAB0m+75Bmg1QuH/xhWWp8yXf5+nsxMOzrxBA4clNilfCLsIiQWi7IAvLUl0A qp7UC0uG0ePxwKWUWrp2p2ld8mwZJwe5RtjA3eIry91zXc4Mn+KfI3GYRerY6NYE5EXu72 F57GGxZFtl05sl7bnn8YQlfKEfahfiQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1729668899; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=iZIdhv9BeZcI+mn12fiemly8KbslTq+qKAU4CBI1fRE=; b=Q83Gw9h+djmUsQLuQCoauSzPlghBxzYKZvAsQRWystZgqpArVAsOSLBUc1Wmo5cfNLb/vX 6hKB7bxWKTttfTAg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 6E29113AD3; Wed, 23 Oct 2024 07:34:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id G+0eGiOnGGdCBwAAD6G6ig (envelope-from ); Wed, 23 Oct 2024 07:34:59 +0000 Message-ID: <97ccf48e-f30c-4abd-b8ff-2b5310a8b60f@suse.cz> Date: Wed, 23 Oct 2024 09:34:59 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH mm-unstable v1] mm/page_alloc: try not to overestimate free highatomic To: Yu Zhao Cc: Michal Hocko , Andrew Morton , David Rientjes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Link Lin , Mel Gorman , Matt Fleming References: <20241020051315.356103-1-yuzhao@google.com> <82e6d623-bbf3-4dd8-af32-fdfc120fc759@suse.cz> Content-Language: en-US From: Vlastimil Babka Autocrypt: addr=vbabka@suse.cz; keydata= xsFNBFZdmxYBEADsw/SiUSjB0dM+vSh95UkgcHjzEVBlby/Fg+g42O7LAEkCYXi/vvq31JTB KxRWDHX0R2tgpFDXHnzZcQywawu8eSq0LxzxFNYMvtB7sV1pxYwej2qx9B75qW2plBs+7+YB 87tMFA+u+L4Z5xAzIimfLD5EKC56kJ1CsXlM8S/LHcmdD9Ctkn3trYDNnat0eoAcfPIP2OZ+ 9oe9IF/R28zmh0ifLXyJQQz5ofdj4bPf8ecEW0rhcqHfTD8k4yK0xxt3xW+6Exqp9n9bydiy tcSAw/TahjW6yrA+6JhSBv1v2tIm+itQc073zjSX8OFL51qQVzRFr7H2UQG33lw2QrvHRXqD Ot7ViKam7v0Ho9wEWiQOOZlHItOOXFphWb2yq3nzrKe45oWoSgkxKb97MVsQ+q2SYjJRBBH4 8qKhphADYxkIP6yut/eaj9ImvRUZZRi0DTc8xfnvHGTjKbJzC2xpFcY0DQbZzuwsIZ8OPJCc LM4S7mT25NE5kUTG/TKQCk922vRdGVMoLA7dIQrgXnRXtyT61sg8PG4wcfOnuWf8577aXP1x 6mzw3/jh3F+oSBHb/GcLC7mvWreJifUL2gEdssGfXhGWBo6zLS3qhgtwjay0Jl+kza1lo+Cv BB2T79D4WGdDuVa4eOrQ02TxqGN7G0Biz5ZLRSFzQSQwLn8fbwARAQABzSBWbGFzdGltaWwg QmFia2EgPHZiYWJrYUBzdXNlLmN6PsLBlAQTAQoAPgIbAwULCQgHAwUVCgkICwUWAgMBAAIe AQIXgBYhBKlA1DSZLC6OmRA9UCJPp+fMgqZkBQJkBREIBQkRadznAAoJECJPp+fMgqZkNxIQ ALZRqwdUGzqL2aeSavbum/VF/+td+nZfuH0xeWiO2w8mG0+nPd5j9ujYeHcUP1edE7uQrjOC Gs9sm8+W1xYnbClMJTsXiAV88D2btFUdU1mCXURAL9wWZ8Jsmz5ZH2V6AUszvNezsS/VIT87 AmTtj31TLDGwdxaZTSYLwAOOOtyqafOEq+gJB30RxTRE3h3G1zpO7OM9K6ysLdAlwAGYWgJJ V4JqGsQ/lyEtxxFpUCjb5Pztp7cQxhlkil0oBYHkudiG8j1U3DG8iC6rnB4yJaLphKx57NuQ PIY0Bccg+r9gIQ4XeSK2PQhdXdy3UWBr913ZQ9AI2usid3s5vabo4iBvpJNFLgUmxFnr73SJ KsRh/2OBsg1XXF/wRQGBO9vRuJUAbnaIVcmGOUogdBVS9Sun/Sy4GNA++KtFZK95U7J417/J Hub2xV6Ehc7UGW6fIvIQmzJ3zaTEfuriU1P8ayfddrAgZb25JnOW7L1zdYL8rXiezOyYZ8Fm ZyXjzWdO0RpxcUEp6GsJr11Bc4F3aae9OZtwtLL/jxc7y6pUugB00PodgnQ6CMcfR/HjXlae h2VS3zl9+tQWHu6s1R58t5BuMS2FNA58wU/IazImc/ZQA+slDBfhRDGYlExjg19UXWe/gMcl De3P1kxYPgZdGE2eZpRLIbt+rYnqQKy8UxlszsBNBFsZNTUBCACfQfpSsWJZyi+SHoRdVyX5 J6rI7okc4+b571a7RXD5UhS9dlVRVVAtrU9ANSLqPTQKGVxHrqD39XSw8hxK61pw8p90pg4G /N3iuWEvyt+t0SxDDkClnGsDyRhlUyEWYFEoBrrCizbmahOUwqkJbNMfzj5Y7n7OIJOxNRkB IBOjPdF26dMP69BwePQao1M8Acrrex9sAHYjQGyVmReRjVEtv9iG4DoTsnIR3amKVk6si4Ea X/mrapJqSCcBUVYUFH8M7bsm4CSxier5ofy8jTEa/CfvkqpKThTMCQPNZKY7hke5qEq1CBk2 wxhX48ZrJEFf1v3NuV3OimgsF2odzieNABEBAAHCwXwEGAEKACYCGwwWIQSpQNQ0mSwujpkQ PVAiT6fnzIKmZAUCZAUSmwUJDK5EZgAKCRAiT6fnzIKmZOJGEACOKABgo9wJXsbWhGWYO7mD 8R8mUyJHqbvaz+yTLnvRwfe/VwafFfDMx5GYVYzMY9TWpA8psFTKTUIIQmx2scYsRBUwm5VI EurRWKqENcDRjyo+ol59j0FViYysjQQeobXBDDE31t5SBg++veI6tXfpco/UiKEsDswL1WAr tEAZaruo7254TyH+gydURl2wJuzo/aZ7Y7PpqaODbYv727Dvm5eX64HCyyAH0s6sOCyGF5/p eIhrOn24oBf67KtdAN3H9JoFNUVTYJc1VJU3R1JtVdgwEdr+NEciEfYl0O19VpLE/PZxP4wX PWnhf5WjdoNI1Xec+RcJ5p/pSel0jnvBX8L2cmniYnmI883NhtGZsEWj++wyKiS4NranDFlA HdDM3b4lUth1pTtABKQ1YuTvehj7EfoWD3bv9kuGZGPrAeFNiHPdOT7DaXKeHpW9homgtBxj 8aX/UkSvEGJKUEbFL9cVa5tzyialGkSiZJNkWgeHe+jEcfRT6pJZOJidSCdzvJpbdJmm+eED w9XOLH1IIWh7RURU7G1iOfEfmImFeC3cbbS73LQEFGe1urxvIH5K/7vX+FkNcr9ujwWuPE9b 1C2o4i/yZPLXIVy387EjA6GZMqvQUFuSTs/GeBcv0NjIQi8867H3uLjz+mQy63fAitsDwLmR EP+ylKVEKb0Q2A== In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Action: no action X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D838140005 X-Stat-Signature: 6oub84fok5z8grbnhbr67eu9jm4cc3jw X-HE-Tag: 1729668885-125677 X-HE-Meta: U2FsdGVkX1++aX0tLbPa+8Do8KK1n7qn1oW0dQv4owYX4Ttyl2Yv2MTEvC5zSzfteTD6tCajHZvBtSPRLXG2V6HW+r9Og/POzdRONK3ZRhmHs2B8QCMjbnknzAF7BXULN8TqXcQmJvNlreDtaTmsfLO9GT9PPotT+IutLQOK8lSYLPLex8c5Fhsfgs10xnndngK/Anc16/xFxR+XcfIw2lwWbp0IQFzAFhCiGcDTX0c6RShvoskHkR+uWMB5vRNOSMi+LPnqy7oHRqJGZUdfdoGKnkbY9ogNuFSWrQvofCc0B0si1fXO0yiV2K/fBnaO4LehpwgHspwseX8BpGcY607fwpEIAwbsVZIgOEUUW3Be3QbB6N4wEFwz4oeCCfJMjRs1uwdv6XjHlLnGWDk7ZKVumF60PpOCQ2pQ6f1ybiuwODUO03j+UXAAujPiycm389RBs5xX/Gvu1b2nJqMRNh8oX74kJ8a2x001ThP8GVPMPBgMuy8oVJVW4AHqevmRwcK2ozlc4+aMKjJt16OKi6X8srhtowU3iKtb++MmdLVIk03Jq8rxX8iVFAi2hhvjC7XxIUrp+dayPpv2ZDtxYxL2aV/i8hk6XiwGl3LZTTdn84UBmc4FH/4d8TaKHT5hGbFpprGhCLSl2JRpx3UBpHaTFsLygJ+pv7v1OxbYFikOFiKN0ENXrOOugrXwc0b1YWLPDnSbdKKjhXWoww/z6nFab+KtYN3PCPb5VD6bdk0BOnDaZT3LzuV19+NWjPXCPW/f/0+KVKEDfldt9eX+9fVHkQbOTz0YNxayA9xfBvUC68Or8rWzctqYIYhZT5i2q0n2Ge1zenQJ5s+8GincHMSPhEUT5IBowxbIB2vXZVFaB1yo2/4CyTL2ylUxDAlVd+a8tTU9dcUFfE5is1BTbXlneSyCpSGvhC2o0KtJqA6qzcI/F7KX5XIkno6Q0LDAUllx3dffLBF7ul9VuY8 PeP3Cq0q vUCGNnnAlG8PrMFNOs9KlL2V+qYZYd9+qUfpCtCG9G2TSh4/fq7IMXdG/pZEUbdsAExS1nusiYPkDdYGYPTYWKAN+heDu2hBNUiC8L4el3BC7Rrnp8BPLDlaKzZLYQ18Vmc9C0SLtOyV9A8zJC4kdvDxgn/GiJ69KledM+I0ruPSroJrJbcEFAxkDUh/fuxBcnqx3Dmwwew97BAvRf5lwboBMsKBPKD0CInqnxnZ7dwrRxpbiuTIdqFQnNFXeVrHmH52xYjpDxah6Sexya0LxdaIKPY7MC7qpvgIdSeTOdM5ae3pYAGCVrKjR8UrCQG4B7+mbaMN4cstLCx29j7dK4YPO1+p7oRpB/SFE9ja45M+VQDyt6qx2E0RXZf540vzMS00CCHzvYAoU0uoO6ZFzVOX/wqwgat8Oc+Df X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10/23/24 08:36, Yu Zhao wrote: > On Tue, Oct 22, 2024 at 4:53 AM Vlastimil Babka wrote: >> >> +Cc Mel and Matt >> >> On 10/21/24 19:25, Michal Hocko wrote: >> > On Mon 21-10-24 11:10:50, Yu Zhao wrote: >> >> On Mon, Oct 21, 2024 at 2:13 AM Michal Hocko wrote: >> >> > >> >> > On Sat 19-10-24 23:13:15, Yu Zhao wrote: >> >> > > OOM kills due to vastly overestimated free highatomic reserves were >> >> > > observed: >> >> > > >> >> > > ... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ... >> >> > > Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ... >> >> > > Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB >> >> > > >> >> > > The second line above shows that the OOM kill was due to the following >> >> > > condition: >> >> > > >> >> > > free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB) >> >> > > >> >> > > And the third line shows there were no free pages in any >> >> > > MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type >> >> > > 'H'. Therefore __zone_watermark_unusable_free() overestimated free >> >> > > highatomic reserves. IOW, it underestimated the usable free memory by >> >> > > over 1GB, which resulted in the unnecessary OOM kill. >> >> > >> >> > Why doesn't unreserve_highatomic_pageblock deal with this situation? >> >> >> >> The current behavior of unreserve_highatomic_pageblock() seems WAI to >> >> me: it unreserves highatomic pageblocks that contain *free* pages so >> >> Hm I don't think it's completely WAI. The intention is that we should be >> able to unreserve the highatomic pageblocks before going OOM, and there >> seems to be an unintended corner case that if the pageblocks are fully >> exhausted, they are not reachable for unreserving. > > I still think unreserving should only apply to highatomic PBs that > contain free pages. Otherwise, it seems to me that it'd be > self-defecting because: > 1. Unreserving fully used hightatomic PBs can't fulfill the alloc > demand immediately. I thought the alloc demand is only blocked on the pessimistic watermark calculation. Usable free pages exist, but the allocation is not allowed to use them. > 2. More importantly, it only takes one alloc failure in > __alloc_pages_direct_reclaim() to reset nr_reserved_highatomic to 2MB, > from as high as 1% of a zone (in this case 1GB). IOW, it makes more > sense to me that highatomic only unreserves what it doesn't fully use > each time unreserve_highatomic_pageblock() is called, not everything > it got (except the last PB). But if the highatomic pageblocks are already full, we are not really removing any actual highatomic reserves just by changing the migratetype and decreasing nr_reserved_highatomic? In fact that would allow the reserves grow with some actual free pages in the future. > Also not reachable from free_area[] isn't really a big problem. There > are ways to solve this without scanning the PB bitmap. Sure, if we agree it's the way to go. >> The nr_highatomic is then >> also fully misleading as it prevents allocations due to a limit that does >> not reflect reality. > > Right, and the comments warn about this. Yes and explains it's to avoid the cost of searching free lists. Your fix introduces that cost and that's not really great for a watermark check fast path. I'd rather move the cost to highatomic unreserve which is not a fast path. >> Your patch addresses the second issue, but there's a >> cost to it when calculating the watermarks, and it would be better to >> address the root issue instead. > > Theoretically, yes. And I don't think it's actually measurable > considering the paths (alloc/reclaim) we are in -- all the data > structures this patch accesses should already have been cache-hot, due > to unreserve_highatomic_pageblock(), etc. __zone_watermark_unusable_free() will be executed from every allocation's fast path, and not only after we recently did unreserve_highatomic_pageblock(). AFAICS as soon as nr_reserved_highatomic is over pageblock_nr_pages we'll unconditionally start counting precisely and the design wanted to avoid this. > Also, we have not agreed on the root cause yet. > >> >> that those pages can become usable to others. There is nothing to >> >> unreserve when they have no free pages. >> >> Yeah there are no actual free pages to unreserve, but unreserving would fix >> the nr_highatomic overestimate and thus allow allocations to proceed. > > Yes, but honestly, I think this is going to cause regression in > highatomic allocs. I think not as having more realistic counter of what's actually reserved (and not already used up) can also allow reserving new pageblocks. >> > I do not follow. How can you have reserved highatomic pages of that size >> > without having page blocks with free memory. In other words is this an >> > accounting problem or reserves problem? This is not really clear from >> > your description. >> >> I think it's the problem of finding the highatomic pageblocks for >> unreserving them once they become full. The proper fix is not exactly >> trivial though. Either we'll have to scan for highatomic pageblocks in the >> pageblock bitmap, or track them using an additional data structure. > > Assuming we want to unreserve fully used hightatomic PBs, we wouldn't > need to scan for them or track them. We'd only need to track the delta > between how many we want to unreserve (full or not) and how many we > are able to do so. The first page freed in a PB that's highatomic > would need to try to reduce the delta by changing the MT. Hm that assumes we're adding some checks in free fastpath, and for that to work also that there will be a freed page in highatomic PC in near enough future from the decision we need to unreserve something. Which is not so much different from the current assumption we'll find such a free page already in the free list immediately. > To summarize, I think this is an estimation problem, which I would > categorize as a lesser problem than accounting problems. But it sounds > to me that you think it's a policy problem, i.e., the highatomic > unreserving policy is wrong or not properly implemented? Yeah I'd say not properly implemented, but that sounds like a mechanism, not policy problem to me :)