From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6797C48BCF for ; Wed, 9 Jun 2021 13:39:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 63ABA613AD for ; Wed, 9 Jun 2021 13:39:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 63ABA613AD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C6AD66B0036; Wed, 9 Jun 2021 09:39:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C426A6B006C; Wed, 9 Jun 2021 09:39:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A94DB6B0070; Wed, 9 Jun 2021 09:39:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0223.hostedemail.com [216.40.44.223]) by kanga.kvack.org (Postfix) with ESMTP id 7A8B86B0036 for ; Wed, 9 Jun 2021 09:39:57 -0400 (EDT) Received: from smtpin40.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 0F5C7181AEF07 for ; Wed, 9 Jun 2021 13:39:57 +0000 (UTC) X-FDA: 78234293634.40.099194B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 536B580192FE for ; Wed, 9 Jun 2021 13:39:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1623245996; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=H6Fv8cyR8yCvQ4j0moLfoI41DpyegKzP7atlEEKWA1Y=; b=e3phKTw36pPF5pt9MCK8wQkAwFyt8R3D/q1HV5Au4lUCiBMdxOeiy0OBv0gxOmg9BuiCX/ clN2I27csDpOJ8j8xJ8Dol2JWqbe1Qe0Y6vIRN/UUpZPZ/BW9LYywx8Dx/SkXI+LkJZpQS LW7eyVjIcGSNcEPXBxOmaPz5b+K9ChA= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-351-LtD2G9wlN8iinYMhBesAtw-1; Wed, 09 Jun 2021 09:39:55 -0400 X-MC-Unique: LtD2G9wlN8iinYMhBesAtw-1 Received: by mail-wm1-f69.google.com with SMTP id f22-20020a1c6a160000b029018f49a7efb7so2624526wmc.1 for ; Wed, 09 Jun 2021 06:39:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:organization:to:cc:subject:message-id:date :user-agent:mime-version:content-language:content-transfer-encoding; bh=H6Fv8cyR8yCvQ4j0moLfoI41DpyegKzP7atlEEKWA1Y=; b=Q9XRu3CElQMPJbKM7/YGR+dI1JrZIcs1ER0+YXrYIXboIH2Sg4MJZGp+UMoQdZCGrk Gf9+ZpNmg99d4S9AA4yzIqHqLjgeN//IL1aGDTA8V8LnL7IqDGG3LfA5xlGZR0f22rM/ d01frsYK+/Pbq/VlT8ydDyGZzUDtUi70TOySnzJhAAd5cpss8p1+/j+77OwtOViYSYUI VsSs8z9+Z98aCHIzvdbBKT4LctFRKVl+60PRKcuACl3MgNC7r4tT/ThcEcy0k7icjyId GTfUimV6B9dOMRUI+cqhdAeFJBu/6vxY3s/88ZCTPp/vqfDqDYDou2vw7e5O2r6wAWZR lC0w== X-Gm-Message-State: AOAM531v8JbRkCPIxXXvnAwJXKAWca/TMTkAs/TG8QEjg1wwJIafnS9R +37N1eaJIERKz1mWLx100Lz3roJtZA4kYuSULiRHTmMDbu0vekpOWBFwVVq2mUib8zum8FJfQUU WktG0Ey33c2ggW09XGj0CbUfeRNswwm7Ufk6HHxBJlTliSw1NSsL5n8vIErk= X-Received: by 2002:adf:e950:: with SMTP id m16mr28661829wrn.249.1623245993136; Wed, 09 Jun 2021 06:39:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxLHJRNnYwhgyJGXIXW3cZhzCFwWofdvUTmNfyU4JqQm98SEVFfXmsqv1CX+2w8Rj6/oTU4dQ== X-Received: by 2002:adf:e950:: with SMTP id m16mr28661794wrn.249.1623245992732; Wed, 09 Jun 2021 06:39:52 -0700 (PDT) Received: from [192.168.3.132] (p5b0c611d.dip0.t-ipconnect.de. [91.12.97.29]) by smtp.gmail.com with ESMTPSA id p10sm11961681wrr.33.2021.06.09.06.39.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 09 Jun 2021 06:39:52 -0700 (PDT) From: David Hildenbrand Organization: Red Hat To: lsf-pc@lists.linux-foundation.org Cc: "linux-mm@kvack.org" Subject: [LSF/MM/BPF TOPIC] Improving alloc_contig_range() Message-ID: Date: Wed, 9 Jun 2021 15:39:51 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 536B580192FE Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=e3phKTw3; spf=none (imf27.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: oe4okjqfewe8zo7fy1876ka7bz8yefdb X-HE-Tag: 1623245992-627417 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.065147, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, our range allocator -- alloc_contig_range() -- already works fairly=20 reliable with MIGRATE_CMA, as used by the CMA allocator, and=20 ZONE_MOVABLE, as used by virtio-mem for memory hotunplug. However, there=20 are some things to improve, especially when allocating from one of the=20 kernel zones, such as ZONE_NORMAL, as used for allocating gigantic pages=20 and by virtio-mem for memory hotunplug. a) MAX_ORDER (and pageblock_order) limitation The current implementation is tightly glued to pageblock_order and=20 MAX_ORDER. For example, alloc_contig_range() works fairly unreliable on=20 ZONE_NORMAL with granularity < MAX_ORDER - 1, because we isolate all=20 pageblocks in the MAX_ORDER - 1 range and any unmovable page in that=20 range will bail out. Further, when isolating a pageblock we lose=20 movability information, so isolating a (partially) unmovable pageblock=20 might be problematic and we would like to retain the original movability=20 information. As one example, virtio-mem currently uses MAX_ORDER - 1 granularity=20 instead of smaller (like pageblock_order) granularity, for example,=20 supporting (un)plug of 4MiB chunks on x86-64 only. We'd like to support=20 2 MiB here. As another example, a CMA area has to be aligned to MAX_ORDER - 1 due to=20 the current limitations. pageblock_order is still problematic on some=20 archs (arm64 with 64 KiB base pages), but getting rid of the MAX_ORDER=20 limitation feels like a low hanging fruit. As there is interest in increasing MAX_ORDER, the problem will get worse=20 over time. The question are 1) what it takes to only isolate a single=20 pageblock and not all pageblocks composing a MAX_ORDER - 1 range when=20 not required and 2) how to handle isolating partially unmovable pageblock= s. b) Shrinking the slab set_migratetype_isolate() has a nice comment "FIXME: Now, memory hotplug=20 doesn't call shrink_slab() by itself". IIUC, we could significantly=20 improve alloc_contig_range() reliability on ZONE_NORMAL when shrinking=20 the slab in some environments. The questions are, 1) who should shrink=20 the slab and 2) when, because it obviously can temporarily harm=20 performance. However, memory hotunplug already temporarily harms=20 performance. Ideally, we'd want to shrink the slab only on the area of interest. How=20 could something like that be realized? c) PCP handling While we disable the PCP right now when offlining memory to avoid races=20 with concurrent freeing to the PCP, we don't do the same in=20 alloc_contig_range(); instead, we only drain the PCP once. Disabling the PCP will currently lock a mutex until re-enabled, which=20 would essentially serialize alloc_contig_range(), which is undesired. What would it take to make disabling the PCP scale? Do we care at all or=20 can the races actually result in significant allocation failures,=20 especially on ZONE_MOVABLE or MIGRATE_CMA? d) Unification of alloc_contig_range() and memory offlining code. Both do roughly the same thing, however, with some notable differences=20 (dissolving huge pages, retry handling, ...). What does it take to unify=20 both, or are there compelling reasons to not unify them? --=20 Thanks, David / dhildenb