From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04841C001DC for ; Wed, 19 Jul 2023 08:15:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7BCCD900007; Wed, 19 Jul 2023 04:15:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 76D168D004B; Wed, 19 Jul 2023 04:15:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60DE7900007; Wed, 19 Jul 2023 04:15:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 526B78D004B for ; Wed, 19 Jul 2023 04:15:08 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 03B16B09CB for ; Wed, 19 Jul 2023 08:15:07 +0000 (UTC) X-FDA: 81027651096.23.F46327E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id B1AB680010 for ; Wed, 19 Jul 2023 08:15:04 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PXBIKpmH; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689754504; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7pDcFJVSgqhv/09OCSqGt7hoNQfzPpxkLUiFvrb0AWY=; b=HT7jkzzy64OkYHnQGXYd8ST8cfQSNhhcJ9/ig9T0fofA1RtaAkku00ZEm2InYqQfCsSZLz X24DPKqFOx7f1KF14N3INlHvhy5KLTRlZ8ROFUwxKE+F0sz7/GRdSKUi2Guv0T2+2Kzx11 usGHSKuuAJ5RnWMPP8e9x7jM61B3FAU= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PXBIKpmH; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689754504; a=rsa-sha256; cv=none; b=os6HfSI/pjvoBWZ+za5mluBmhIu7z4uA/jU4ysqWdTrDhOqY5ip9eRGz4hlQPrbrafFLo5 VbG/FW2f8KZkjtdex3Ucn7Loo372mJt3QI043bHwZ8d3aa2dG/RajfxfR0cBxMoQ+vLUt1 nWri/BQAmzRdtw98dXziO0XWWaJZTao= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689754504; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7pDcFJVSgqhv/09OCSqGt7hoNQfzPpxkLUiFvrb0AWY=; b=PXBIKpmHcxkpbmEI2w67bZcdaz+ZyvJTPij0UVdL+y6WyAe7TZo+Sl2Tzhx+fh0MVv9bg2 7jRn8yFxno2UCBZB+dQ24F06LAmkpsRmsjFUK1HPkpje63G6wYKVHKrn77taGjU/hlWjjc K5UI2JPDEV5TKXaspzVr6cXt47bt98g= Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-222-Bhq6cxVIMzKujWLjTEylJw-1; Wed, 19 Jul 2023 04:15:02 -0400 X-MC-Unique: Bhq6cxVIMzKujWLjTEylJw-1 Received: by mail-lf1-f70.google.com with SMTP id 2adb3069b0e04-4fb7d06a7e6so6020392e87.2 for ; Wed, 19 Jul 2023 01:15:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689754501; x=1692346501; h=content-transfer-encoding:in-reply-to:organization:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=7pDcFJVSgqhv/09OCSqGt7hoNQfzPpxkLUiFvrb0AWY=; b=YXffYEtI+rQdtY3z9d5guvuvSwNq9LqxM2+4a7QCLUoqGU8emgBTPyg1wcOtCq38TO QVjlj0mHsm+09UmcP78JorPH4bkIvtZSoIORrhKtKUbvxUY3xZNAktF6QzV1lON3aljX M2mvvkIB+2mIxE4LwY81U+EMOxzZ0bKZW5uy3HhxULqjH9W3n9fmvXtmVxCaGlzuhsAI G2aJHXGUzXN8g+cgzPYoj+dXaw7KerCBIe+BCLzwSbRwHCFmRM99ZmhUjcQUYE81idv7 PFApwHuP0tGRxFfVJxGXx8CaG5RanLHnGDObfDp6gvHBHdxbnfSCfQ4hQsZccfKojlXk N9Lw== X-Gm-Message-State: ABy/qLbYcivJAN2wCPSLs5wciX5o45vurMwESV7hdP3srslX7Ym7GWyS kqk3qA7hwfKNUn0OjCFYwHuD6fl068L7dF49bv2PkBHTZJNdqFQegF0pKYckb6QZzSmbOgBcUaw pX2DZZ7jdn+Q= X-Received: by 2002:ac2:5de2:0:b0:4fb:89f5:f6d2 with SMTP id z2-20020ac25de2000000b004fb89f5f6d2mr11273847lfq.47.1689754501088; Wed, 19 Jul 2023 01:15:01 -0700 (PDT) X-Google-Smtp-Source: APBJJlETubtOclNH0gFOjqmVul9NIAAxYsyQG4mcKQRB2xmn+kaOnFCo79bAxf1jWHocdxubjWJdng== X-Received: by 2002:ac2:5de2:0:b0:4fb:89f5:f6d2 with SMTP id z2-20020ac25de2000000b004fb89f5f6d2mr11273836lfq.47.1689754500622; Wed, 19 Jul 2023 01:15:00 -0700 (PDT) Received: from ?IPV6:2003:cb:c74b:4f00:b030:1632:49f2:63? (p200300cbc74b4f00b030163249f20063.dip0.t-ipconnect.de. [2003:cb:c74b:4f00:b030:1632:49f2:63]) by smtp.gmail.com with ESMTPSA id k15-20020a056000004f00b003143ba62cf4sm4554352wrx.86.2023.07.19.01.14.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 19 Jul 2023 01:15:00 -0700 (PDT) Message-ID: <9770454d-f840-c7cf-314e-ce81839393e3@redhat.com> Date: Wed, 19 Jul 2023 10:14:59 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: collision between ZONE_MOVABLE and memblock allocations To: Michal Hocko , Mike Rapoport Cc: Ross Zwisler , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Matthew Wilcox , Mel Gorman , Vlastimil Babka References: <20230718220106.GA3117638@google.com> <20230719075952.GH1901145@kernel.org> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: rnnmws99ydxm38quhmifd4nnfu6n3ggi X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B1AB680010 X-HE-Tag: 1689754504-923294 X-HE-Meta: U2FsdGVkX1/enyZiThRbkbLQQebSscicUv7su8W68KHjrRaG9JYvqU47grLHyA66BdNGbPVNE4dww6Xjl4piYKFGtV9QjF75u7XVJ81BNvMKlSdIa8Dd6eESnUvyV/6RxDuw8DpeMDzlHDIrjLrOe5pEgqYYdj9kFW/D6hZZi4YjcBunGiPGIItOp7GKFM8jeCG5nxha1UxNFYj8aziy+0o/Zip5ihQsG8Ri+vN/LiqJ+6kExIfZNg4+84KMqoVXhtHCit/5V9NrM/wdyGJdFjjcCF01503/vRdW/qQXmfB/19rJ/TnXLmBLS/m55cS4cpEzGL8d8coFTZ1TtE8U3HSX11EmOmJw3XcF0TQXlLdupbhztKT6aynmvfF7UyIPUpOvn0sDAQs3qAh+NcotnrlOkDs7QVhVH4gnxUYY/+/JWH9JbdbMyvUs+iR9QCzc9aCPG23Y5bp9td/wQMFkgPxZ1kThrj8WUBylufJEuMWNWK/6BPxQTm33pbSlZ6yLG1vj0Jg12HB7aHGrQQ6MuRJWFvVkCI8jkcykxZEufIu30D+QBExdVOWCRhC6OBgYQeY6yKGqYICzm5vFbAQEDiZSSrhXwXQdLJI3l4mdAHT9MrOqhX2JbyluVJRkXt4EtCJVewSHDPa0QjEBcUupeUG9mkg4+Jx45lbhVgyRcUL4XCfCqDKHR8FcQfbg7jdHFT/vUCWjUkMHGvwsoJSuIp8H5YQ0dlmqqEIGAACez51eHMefSIZraYA5Z1hTf/rloFMDE7KSBAco1yYV/+SELUyeVTmyU/6+UTlmMPPQnDshOW/sJMph+OlwQiHe7qmwLBhSSO5a+t9S06EjVmil2iJoZbcTGvQxvKRz9+m53vzm7MXEAIojFG0p/HIihlRb5VSBOmDNWesZMnvELba4k4ePrhKQsHaOJro0XupM2IJX0KoqNasmT972Ubm5+AjtvP66GzndOo6KTik5h3G plD6Eugc hFi/lPutG9HMqlFLZkwU0qUpEFJWiJW+8a7no5gNEqscZ0CWgMcQ58gEqEXlrORXGW/CnsUHRHUaaqFasOohmdD5n8/nJefAtOBVlEF7JkFS+OE1xH9w0gnH3R25SufXRv+f+7rAin4WUIn9z6p7/woLbHIo/ioH8JnhwLM9gEIlJYmLjHhP3rzlbORlWukjvA41Jm9TQDdV112TNnugdH8nvl43USqFRhjy+yRnFVXC/++4OY7j/TemVGkBBewCFrDzxJ/mmqwkAbpZtryk83VEGub4+EJ5F+WU53leg0ZxH87dKw8gAU3Q9Ur5BQZPCA8HmYMw7SvfLyZ0Zkd9xGC1a4PwEN0pX8ZCgaADdZy3zM4W2jXDP7WwQHj+L8/+O6dsKPLSN5HAVaHTaZGRCRptdhPDQi3CybtVqiihz+z5RnYvkW9vwNql8vWaOqjwOdLiKCoCp/Sc5vXyxrB8JReQvGJmzPHlPeRr2sm3H5QT09SQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 19.07.23 10:06, Michal Hocko wrote: > On Wed 19-07-23 10:59:52, Mike Rapoport wrote: >> On Wed, Jul 19, 2023 at 08:14:48AM +0200, Michal Hocko wrote: >>> On Tue 18-07-23 16:01:06, Ross Zwisler wrote: >>> [...] >>>> I do think that we need to fix this collision between ZONE_MOVABLE and memmap >>>> allocations, because this issue essentially makes the movablecore= kernel >>>> command line parameter useless in many cases, as the ZONE_MOVABLE region it >>>> creates will often actually be unmovable. >>> >>> movablecore is kinda hack and I would be more inclined to get rid of it >>> rather than build more into it. Could you be more specific about your >>> use case? >>> >>>> Here are the options I currently see for resolution: >>>> >>>> 1. Change the way ZONE_MOVABLE memory is allocated so that it is allocated from >>>> the beginning of the NUMA node instead of the end. This should fix my use case, >>>> but again is prone to breakage in other configurations (# of NUMA nodes, other >>>> architectures) where ZONE_MOVABLE and memblock allocations might overlap. I >>>> think that this should be relatively straightforward and low risk, though. >>>> >>>> 2. Make the code which processes the movablecore= command line option aware of >>>> the memblock allocations, and have it choose a region for ZONE_MOVABLE which >>>> does not have these allocations. This might be done by checking for >>>> PageReserved() as we do with offlining memory, though that will take some boot >>>> time reordering, or we'll have to figure out the overlap in another way. This >>>> may also result in us having two ZONE_NORMAL zones for a given NUMA node, with >>>> a ZONE_MOVABLE section in between them. I'm not sure if this is allowed? >>> >>> Yes, this is no problem. Zones are allowed to be sparse. >> >> The current initialization order is roughly >> >> * very early initialization with some memblock allocations >> * determine zone locations and sizes >> * initialize memory map >> - memblock_alloc(lots of memory) >> * lots of unrelated initializations that may allocate memory >> * release free pages from memblock to the buddy allocator >> >> With 2) we can make sure the memory map and early allocations won't be in >> the ZONE_MOVABLE, but we'll still may have reserved pages there. > > Yes this will always be fragile. If the spefic placement of the movable > memory is not important and the only thing that matters is the size and > numa locality then an easier to maintain solution would be to simply > offline enough memory blocks very early in the userspace bring up and > online it back as movable. If offlining fails just try another > memblock. This doesn't require any kernel code change. As an alternative, we might use the "memmap=nn[KMG]!ss[KMG]" [1] parameter to mark some memory as protected. That memory can then be configured as devdax device and online to ZONE_MOVABLE (dev/dax). [1] https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap -- Cheers, David / dhildenb