From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E64DBC4338F for ; Fri, 6 Aug 2021 16:54:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5218461186 for ; Fri, 6 Aug 2021 16:54:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5218461186 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id A3C4B6B006C; Fri, 6 Aug 2021 12:54:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9EDA86B0071; Fri, 6 Aug 2021 12:54:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B3F28D0001; Fri, 6 Aug 2021 12:54:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0186.hostedemail.com [216.40.44.186]) by kanga.kvack.org (Postfix) with ESMTP id 6E4836B006C for ; Fri, 6 Aug 2021 12:54:17 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 1E3358249980 for ; Fri, 6 Aug 2021 16:54:17 +0000 (UTC) X-FDA: 78445253754.18.B9B2607 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf23.hostedemail.com (Postfix) with ESMTP id A26D8900072C for ; Fri, 6 Aug 2021 16:54:16 +0000 (UTC) Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 6B3F521C8B; Fri, 6 Aug 2021 16:54:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1628268855; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wxZVzpP79pSH135ygMJb54Rw1U8W8NfIxU10/tpqYFs=; b=WLBGcvjbDu4HJ9eRQHe5ns1sEQFZ/KN3d3dSf9RQl1Lc2fby2/Ww15YE2UC2hvK3Pp68fz bG299u1jDcpzRarGAaYneYrzOVYFAPJJVjZ4iz+BHKCBaie6azTmIlMlsPxHdXakastYSr 2zZ6V5mV377MosihLd5G1N1/SrsSn40= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1628268855; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wxZVzpP79pSH135ygMJb54Rw1U8W8NfIxU10/tpqYFs=; b=joFbrKvB5WLHlKh/vmwwqkVuX/9S4evCFoy6WEv+7ztAHBU2qzwtInX5t/3DYZloWSOqsd yFj7QoDUN3OpnXCw== Received: from imap1.suse-dmz.suse.de (imap1.suse-dmz.suse.de [192.168.254.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap1.suse-dmz.suse.de (Postfix) with ESMTPS id 3D97313C0E; Fri, 6 Aug 2021 16:54:15 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap1.suse-dmz.suse.de with ESMTPSA id UuJ8DTdpDWH3RgAAGKfGzw (envelope-from ); Fri, 06 Aug 2021 16:54:15 +0000 To: David Hildenbrand , Zi Yan , linux-mm@kvack.org Cc: Matthew Wilcox , "Kirill A . Shutemov" , Mike Kravetz , Michal Hocko , John Hubbard , linux-kernel@vger.kernel.org, Mike Rapoport References: <20210805190253.2795604-1-zi.yan@sent.com> <40982106-0eee-4e62-7ce0-c4787b0afac4@suse.cz> <72b317e5-c78a-f0bc-fe69-f82261ec252e@redhat.com> From: Vlastimil Babka Subject: Re: [RFC PATCH 00/15] Make MAX_ORDER adjustable as a kernel boot time parameter. Message-ID: <3417eb98-36c8-5459-c83e-52f90e42a146@suse.cz> Date: Fri, 6 Aug 2021 18:54:14 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <72b317e5-c78a-f0bc-fe69-f82261ec252e@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=WLBGcvjb; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=joFbrKvB; dmarc=none; spf=pass (imf23.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Stat-Signature: h84ggmsu1mp8fm9s9zr5176s38in6kcc X-Rspamd-Queue-Id: A26D8900072C X-Rspamd-Server: rspam01 X-HE-Tag: 1628268856-761058 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/6/21 6:16 PM, David Hildenbrand wrote: > On 06.08.21 17:36, Vlastimil Babka wrote: >> On 8/5/21 9:02 PM, Zi Yan wrote: >>> From: Zi Yan >> >>> Patch 3 restores the pfn_valid_within() check when buddy allocator ca= n merge >>> pages across memory sections. The check was removed when ARM64 gets r= id of holes >>> in zones, but holes can appear in zones again after this patchset. >> >> To me that's most unwelcome resurrection. I kinda missed it was going = away and >> now I can't even rejoice? I assume the systems that will be bumping ma= x_order >> have a lot of memory. Are they going to have many holes? What if we ju= st >> sacrificed the memory that would have a hole and don't add it to buddy= at all? >=20 > I think the old implementation was just horrible and the description we= have > here still suffers from that old crap: "but holes can appear in zones a= gain". > No, it's not related to holes in zones at all. We can have MAX_ORDER -1= pages > that are partially a hole. >=20 > And to be precise, "hole" here means "there is no memmap" and not "ther= e is a > hole but it has a valid memmap". Yes. > But IIRC, we now have under SPARSEMEM always a complete memmap for a co= mplete > memory sections (when talking about system RAM, ZONE_DEVICE is differen= t but we > don't really care for now I think). >=20 > So instead of introducing what we had before, I think we should look in= to > something that doesn't confuse each person that stumbles over it out th= ere. What > does pfn_valid_within() even mean in the new context? pfn_valid() is mo= st > probably no longer what we really want, as we're dealing with multiple = sections > that might be online or offline; in the old world, this was different, = as a > MAX_ORDER -1 page was completely contained in a memory section that was= either > online or offline. >=20 > I'd imagine something that expresses something different in the context= of > sparsemem: >=20 > "Some page orders, such as MAX_ORDER -1, might span multiple memory sec= tions. > Each memory section has a completely valid memmap if online. Memory sec= tions > might either be completely online or completely offline. pfn_to_online_= page() > might succeed on one part of a MAX_ORDER - 1 page, but not on another p= art. But > it will certainly be consistent within one memory section." >=20 > Further, as we know that MAX_ORDER -1 and memory sections are a power o= f two, we > can actually do a binary search to identify boundaries, instead of havi= ng to > check each and every page in the range. >=20 > Is what I describe the actual reason why we introduce pfn_valid_within(= ) ? (and > might better introduce something new, with a better fitting name?) What I don't like is mainly the re-addition of pfn_valid_within() (or wha= tever we'd call it) into __free_one_page() for performance reasons, and also to various pfn scanners (compaction) for performance and "I must not forget = to check this, or do I?" confusion reasons. It would be really great if we c= ould keep a guarantee that memmap exists for MAX_ORDER blocks. I see two ways = to achieve that: 1. we create memmap for MAX_ORDER blocks, pages in sections not online ar= e marked as reserved or some other state that allows us to do checks such a= s "is there a buddy? no" without accessing a missing memmap 2. smaller blocks than MAX_ORDER are not released to buddy allocator I think 1 would be more work, but less wasteful in the end?