From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BB45C77B73 for ; Thu, 18 May 2023 20:03:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7A50D900004; Thu, 18 May 2023 16:03:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7555A900003; Thu, 18 May 2023 16:03:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6449B900004; Thu, 18 May 2023 16:03:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 583DC900003 for ; Thu, 18 May 2023 16:03:47 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 29C3E1A0961 for ; Thu, 18 May 2023 20:03:47 +0000 (UTC) X-FDA: 80804451294.03.05D468E Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf17.hostedemail.com (Postfix) with ESMTP id E637040013 for ; Thu, 18 May 2023 20:03:44 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=oJGyPRK5; spf=pass (imf17.hostedemail.com: domain of song@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=song@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684440225; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s5akZbkqIYokP0UfzU6damnPIQ9kf68gM2KCOhjo17k=; b=LvzUBS5OXnbxvAonX5nlTiUuYgUVYsQUWN+F1x7bCC13DxuKJp5pNAjvw8DydsBobhHyo5 nb3Y5E5+qqoHIJ1RlHLFXHucY0S6jjfqgph5AFLmOWA0fJFiHm6FWVvlY0f97hP5JNa9XY 49w05SkjzNlwxJ6I9zqsOWgXr0jCF04= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684440225; a=rsa-sha256; cv=none; b=Y0T7ZGVBTo4uGmv/zUz7aJV1HXLVN02vctRDAakrWhwIejI8wYq7iHMveGPKzH9U2TxtoW zv+xiOSA4vBJHjg+n5H9U0qesbQRt/ThCQRSxMDkcIqLhuxhmwihpx1dwMZB7JPYw9q6q8 a3f25J/Et3vxkJMKaOXng8sKg3VmCkQ= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=oJGyPRK5; spf=pass (imf17.hostedemail.com: domain of song@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=song@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id C746D651F5 for ; Thu, 18 May 2023 20:03:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BD53FC4339B for ; Thu, 18 May 2023 20:03:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1684440222; bh=nKITgJy7liqGSAvU1qV/vaHdz2NbzxKAJNfHBMK/A6I=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=oJGyPRK5eH+do4d5a06JYGcoZr+p2ZGGjmK2JPVjQF01jinGz9WidVHlZJQOxtHCe 1BLbheNVoVv9+vdGFC3ag8vqOQ8na/QlTVort+sU0r4gIdghZgYy04Q8hQ2a8h05RL 8qcD0BeXuBkwYUzImXnVh90Yk7+b5hopu7w8HN/ABbzqpsztrZgB4YGK+rW051vIBP lzNqHJxBU1q/r/dGS6TBe5FZ4W6B1TZV2pmq8v9xmyH3ddGIyXdyk1fnvgRwaoyIGU nKG35Ak2bLdM9KzkNopyEUmWJ7QVtYI2XUr44KCtnbTKL3Ihfy+07/zJ0XV2h9Momp G+BnSHzTgrxPw== Received: by mail-lj1-f173.google.com with SMTP id 38308e7fff4ca-2ac831bb762so26978771fa.3 for ; Thu, 18 May 2023 13:03:42 -0700 (PDT) X-Gm-Message-State: AC+VfDzuWq8c4l71oxFjRRB2Gj1iMn5Nmrey1ZUJaGgCrC0pDBp1sacv 9BuRVs/QpksIKbJ+HbgoRQW8cubV3Zr5yt+Oyi0= X-Google-Smtp-Source: ACHHUZ6LHtdqfkUOrsOnRv3P3RbUrmSu+MY8KwnFonflNO7p7DZ96HkoRvCpFA9UfbfGtazjQPHdV4vQH2HlAr1LX4Y= X-Received: by 2002:a19:ad02:0:b0:4f3:7b3c:2e16 with SMTP id t2-20020a19ad02000000b004f37b3c2e16mr35252lfc.39.1684440220668; Thu, 18 May 2023 13:03:40 -0700 (PDT) MIME-Version: 1.0 References: <20230308094106.227365-1-rppt@kernel.org> <20230308094106.227365-2-rppt@kernel.org> <20230518152354.GD4967@kernel.org> In-Reply-To: From: Song Liu Date: Thu, 18 May 2023 13:03:28 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 1/5] mm: intorduce __GFP_UNMAPPED and unmapped_alloc() To: Kent Overstreet Cc: Mike Rapoport , linux-mm@kvack.org, Andrew Morton , Dave Hansen , Peter Zijlstra , Rick Edgecombe , Thomas Gleixner , Vlastimil Babka , linux-kernel@vger.kernel.org, x86@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: E637040013 X-Rspam-User: X-Stat-Signature: ygua5jcuzouzc5kyhxodjnqr6f6qdate X-Rspamd-Server: rspam03 X-HE-Tag: 1684440224-714768 X-HE-Meta: U2FsdGVkX18NTR+mkLHb4zmNW0RBeEjwBJkxPijwE6xoA01neEqWGbtpL6wLyN6Wbnr1AG/eo6MqXzjCzX2q0pHHHfd29XCip/i9twOo5sk6fhtQZplfo7FkWrTExC8F4dfARbz5Nnk7SzgM7ah4dV6AuXFmKV7GMSbkLpUEwaxUkljZfQ+T0pWHtHM50Vjz67ZoIhv5VJ4R0euIbonf94h8bMlvEMe2bJ7f1XlIkJGSWuEnSSRs9GYPIty9K10boH4MoLcBF9AfITtrPqYxdaCn2PkLRaeC5/zoRx+9S+vqNKdiAr0CCH2R3kOeaW0+A692h5gLmXfcaoTrCbDxYFT6yeUBjJ6YaASGTxXEI0ftr7wEOCG+R1Gg5cBfl+DAZZLCOp08nemk/qXvT8kGnRjDnfDojh+z6HvbEComR/cL3eZCI+f4DWtBs9uUlbHS/IALgWy/0mArsbqj/amxKjkC07Q/98J44WYOK0JrdB+cFJMYkwLD65SPXBBpC8RqlHv7GCjPZJpigoA4Xz8CZWcF0F+iDnDyKcnJyFVnzLOzg+7RqyZ5Zj85GHmaJVcBoLezubhF5SJxUoHA3MCS5GHJxytS2/Igl8UrAEQ2zVgIrz+NNoKXh1/Xzb8ru7Ro4+INbHnx3/yrdp+NSVT33p50rTzk4AXHiShG0D1VcfUgdrlo1DvZINrTE3sJ0XoK6yzEYvvNhfW7oxNUg5gqsm8Xka1Qc5288Vshk8/o2l92XmHq+VTuzmIoL+b7pFFlNHMFfGhEakq5W2o1Pc+RKjMhRYE96XC3+Jg+xnjJ7RuWOwVelR72f16SoIT3TjaN7H+ILv/W7tpIajG4CMDFayDeirjADw9tnZE8/0XNFWsTwKyvp/KA4YvczteDXceEay6+GJak/bJPROZJeS0mg5b2vo4ffmNR78RC1+23ss38Rjy4Z+hIUr3maXexH3yKibrM9qAuGWvZW/QQMDz a0SqUjPj EwiteqQASfCDWRubEuXfYVBtOfiDxMmmCTUh8tZeNXlpVCzLbpjXNw4FtG1IZNsNkrgBD8l1z+ZoRJvuvVe+GI3laDeBPC9C3An/1llCOFp8yowThJSPFM8RzJgvGtux5E1W1+sZxFJpRGAqaGkAA0yO9usb0bZh2Vrs08cvfHLy/x+ycJagMHw2mk35e55anWNZR00waLs/JEDGKdtMjxMeyA+ZF37qFr+HgW8McnBhmdFCgC0JLQ8s3QcOn6qp5jWM706Gy5CzZWED2zmTak6oDhVL5eDsnN4HCOtpIqtF1qyv2Sr9p/XgS2C9QNSCXtiaDBDXouJwT67/aqmXT+kfr4qOGvyJKH7Ez5wBiOMiQ+1IF5k29SiBBRMOl4tYswIkAR5IwimfvvSho+YlgbH7qcB+ywysSawg5CWDMy5WRBx0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 18, 2023 at 12:15=E2=80=AFPM Kent Overstreet wrote: > > On Thu, May 18, 2023 at 12:03:03PM -0700, Song Liu wrote: > > On Thu, May 18, 2023 at 11:47=E2=80=AFAM Song Liu wro= te: > > > > > > On Thu, May 18, 2023 at 10:24=E2=80=AFAM Kent Overstreet > > > wrote: > > > > > > > > On Thu, May 18, 2023 at 10:00:39AM -0700, Song Liu wrote: > > > > > On Thu, May 18, 2023 at 9:48=E2=80=AFAM Kent Overstreet > > > > > wrote: > > > > > > > > > > > > On Thu, May 18, 2023 at 09:33:20AM -0700, Song Liu wrote: > > > > > > > I am working on patches based on the discussion in [1]. I am = planning to > > > > > > > send v1 for review in a week or so. > > > > > > > > > > > > Hey Song, I was reviewing that thread too, > > > > > > > > > > > > Are you taking a different approach based on Thomas's feedback?= I think > > > > > > he had some fair points in that thread. > > > > > > > > > > Yes, the API is based on Thomas's suggestion, like 90% from the d= iscussions. > > > > > > > > > > > > > > > > > My own feeling is that the buddy allocator is our tool for allo= cating > > > > > > larger variable sized physically contiguous allocations, so I'd= like to > > > > > > see something based on that - I think we could do a hybrid budd= y/slab > > > > > > allocator approach, like we have for regular memory allocations= . > > > > > > > > > > I am planning to implement the allocator based on this (reuse > > > > > vmap_area logic): > > > > > > > > Ah, you're still doing vmap_area approach. > > > > > > > > Mike's approach looks like it'll be _much_ lighter weight and highe= r > > > > performance, to me. vmalloc is known to be slow compared to the bud= dy > > > > allocator, and with Mike's approach we're only modifying mappings o= nce > > > > per 2 MB chunk. > > > > > > > > I don't see anything in your code for sub-page sized allocations to= o, so > > > > perhaps I should keep going with my slab allocator. > > > > > > The vmap_area approach handles sub-page allocations. In 5/5 of set [2= ], > > > we showed that multiple BPF programs share the same page with some > > > kernel text (_etext). > > > > > > > Could you share your thoughts on your approach vs. Mike's? I'm newe= r to > > > > this area of the code than you two so maybe there's an angle I've m= issed > > > > :) > > > > > > AFAICT, tree based solution (vmap_area) is more efficient than bitmap > > > based solution. > > Tree based requires quite a bit of overhead for the rbtree pointers, and > additional vmap_area structs. > > With a buddy allocator based approach, there's no additional state that > needs to be allocated, since it all fits in struct page. > > > > First, for 2MiB page with 64B chunk size, we need a bitmap of > > > 2MiB / 64B =3D 32k bit =3D 4k bytes > > > While the tree based solution can adapt to the number of allocations = within > > > This 2MiB page. Also, searching a free range within 4kB of bitmap may > > > actually be slower than searching in the tree. > > > > > > Second, bitmap based solution cannot handle > 2MiB allocation cleanly= , > > > while tree based solution can. For example, if a big driver uses 3MiB= , the > > > tree based allocator can allocate 4MiB for it, and use the rest 1MiB = for > > > smaller allocations. > > We're not talking about a bitmap based solution for >=3D PAGE_SIZE > allocations, the alternative is a buddy allocator - so no searching, > just per power-of-two freelists. > > > > > Missed one: > > > > Third, bitmap based solution requires a "size" parameter in free(). It = is an > > overhead for the user. Tree based solution doesn't have this issue. > > No, we can recover the size of the allocation via compound_order() - > hasn't historically been done for alloc_pages() allocations to avoid > setting up the state in each page for compound head/tail, but it perhaps > should be (and is with folios, which we've generally been switching to). If we use compound_order(), we will round up to power of 2 for all allocations. Does this mean we will use 4MiB for a 2.1MiB allocation? Thanks, Song