From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 605C8EB64DC for ; Mon, 26 Jun 2023 18:37:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 947EB8D0002; Mon, 26 Jun 2023 14:37:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F7598D0001; Mon, 26 Jun 2023 14:37:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 798568D0002; Mon, 26 Jun 2023 14:37:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 68C128D0001 for ; Mon, 26 Jun 2023 14:37:14 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 16D4D160692 for ; Mon, 26 Jun 2023 18:37:14 +0000 (UTC) X-FDA: 80945756388.03.9B0D9A9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf07.hostedemail.com (Postfix) with ESMTP id CA2C84001B for ; Mon, 26 Jun 2023 18:37:11 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IiRZhKzd; spf=pass (imf07.hostedemail.com: domain of dakr@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dakr@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687804631; a=rsa-sha256; cv=none; b=SBElI8/hdvb0wIyxVuX7A/RiUSUmv97p3MChFBmPYwMGWBPslcgZGRsPBlexheo+uMt3+E o5OqwVXLD/HBAaL9mjmwb+4SD7bLZRu+jp9FV/a/aGZyK8hvMXD0uAuvJBU+NmhFhfaE6E 9CQcK5i4bZuf5NGorQ0DOe11fycjcbc= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IiRZhKzd; spf=pass (imf07.hostedemail.com: domain of dakr@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dakr@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687804631; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O8jWJNO7puMQ1CYq+CDB5AikytZhom+Q/Z7QO9yT5ME=; b=Xmvr27RdBJ0H/5KeI4DQSJ+J3k4Xiq0yvR0Bkw8+LXQEDmfFbH0d8Py1rBmksSTk1FtBZM vBkaNnkCjLmU3UFDZ4xV45uEKZgNFWAbt4U0tNiAK8Gr3BEjs4SRp/vPzTgsbzXo5rDnJy ZuZhPox7PI+/K1TpwFRSUrZ/EEhRbVs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687804631; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O8jWJNO7puMQ1CYq+CDB5AikytZhom+Q/Z7QO9yT5ME=; b=IiRZhKzdb4M5M8vNy1wW6uh7rAOhOXr4kMJzcgj3gf8VhEfa3qxY5G3C0Uw8jXjTxJsz3m +IIRXd+gAwonUWytvbTXM4c8Uh7qwpAL6zzuDaQSTLypkjNfu7QIS9E7fQLf0EgTZZiVr0 k7czUQEQU87lFbwqCiFNU6bBOAC4Wd4= Received: from mail-lf1-f69.google.com (mail-lf1-f69.google.com [209.85.167.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-304-j2bNolzrPaOQD3HcyW2UPA-1; Mon, 26 Jun 2023 14:37:09 -0400 X-MC-Unique: j2bNolzrPaOQD3HcyW2UPA-1 Received: by mail-lf1-f69.google.com with SMTP id 2adb3069b0e04-4f864ca6243so2581007e87.0 for ; Mon, 26 Jun 2023 11:37:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687804628; x=1690396628; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=O8jWJNO7puMQ1CYq+CDB5AikytZhom+Q/Z7QO9yT5ME=; b=S8j8gdGkf2MkBbjHvzL3N20YGfW4JOcxsjXIKQS4qn4L4kIz7wbmZSPaSwIppUQRFx bwjfr5Inu1z6k9LIwgWLjo+m3o2+PwyXm+lH+UbxJd9rDihkwM7tYC9EltjKxOHunJlg LtxmQjydYE/f5w4kfrboK+ZCPcR6wW2y2StKJxWAPFSMkOBZpRVZLKn3CTB9dbI9J3DL HlMkoQ73K8hit7fGZAD/gIiyS1A73dKvh9ADhAe0Fp4LlHK96WEVUqknjb6gMIHf4ZS1 NDbT49KYzqBNk+UgmQqdo3slMrgjE/o+OLa0z9U6N8YcIzIV2GoYrlpzKGSjZul79KrJ T5Zg== X-Gm-Message-State: AC+VfDygXDHglf4uvmd5Y+g+D+QupTwJUtwj7c2WZnONw0T+Z9AvfB/o ynIDz1yMfzhPRdhlNmU4OMX6BLTIN2bsf5hwSinsSIVjC2MHegGC305CbQQcU1CRvq/nm6g8WI0 CE4tb4/258AY= X-Received: by 2002:a05:6512:3601:b0:4f9:5196:5ed0 with SMTP id f1-20020a056512360100b004f951965ed0mr12258707lfs.7.1687804628113; Mon, 26 Jun 2023 11:37:08 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ499UzKbjHgoMsuKPDNCvDuM8qCGFh8Am+tR1VVLJPoTSAsTMW75uFdUKKapdp+Ph1gk0dhtA== X-Received: by 2002:a05:6512:3601:b0:4f9:5196:5ed0 with SMTP id f1-20020a056512360100b004f951965ed0mr12258695lfs.7.1687804627767; Mon, 26 Jun 2023 11:37:07 -0700 (PDT) Received: from ?IPV6:2a02:810d:4b3f:de9c:642:1aff:fe31:a15c? ([2a02:810d:4b3f:de9c:642:1aff:fe31:a15c]) by smtp.gmail.com with ESMTPSA id z3-20020a5d4d03000000b0030aec5e020fsm8070670wrt.86.2023.06.26.11.37.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 26 Jun 2023 11:37:07 -0700 (PDT) Message-ID: <8cc06224-8243-e08e-d0ea-4db71ddc7745@redhat.com> Date: Mon, 26 Jun 2023 20:37:05 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH v2 14/16] maple_tree: Refine mas_preallocate() node calculations To: Matthew Wilcox Cc: Peng Zhang , maple-tree@lists.infradead.org, linux-mm@kvack.org, Andrew Morton , "Liam R. Howlett" , linux-kernel@vger.kernel.org, Peng Zhang , David Airlie , Boris Brezillon References: <20230612203953.2093911-1-Liam.Howlett@oracle.com> <20230612203953.2093911-15-Liam.Howlett@oracle.com> <26d8fbcf-d34f-0a79-9d91-8c60e66f7341@redhat.com> <43ce08db-210a-fec8-51b4-351625b3cdfb@redhat.com> <57527c36-57a1-6699-b6f0-373ba895014c@redhat.com> From: Danilo Krummrich Organization: RedHat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: CA2C84001B X-Stat-Signature: 1ogzg3w3t74hk4j54op4fukrdfy8n3w8 X-Rspam-User: X-HE-Tag: 1687804631-238616 X-HE-Meta: U2FsdGVkX1+ZTjwF/ylV/hmSHI2OaUw7RZSiwzMlLW7JGz9WPaBHKrdJXOhPw+DTfMQUmqJUBqzOV5byd/MXh9nmJ9o9U08FCGWPabWl3XAwZinGSnt+FeupT9ddzbmEZf8Ql7SYEYtENh+GllQGiO/veF1Q9/8x2T3NmOeK0ZGbwXXRsEZpKkWu9EJP8aBA0cPQoWs6MuP2v+8+sTqUwN+etAt6+jSxdQbRZVylj0m4/0Vjd1eysCWZGYfUaoNRViO6PsfgJT6AFQyeGt0wHPd7jeEplxuJfBbx/FBpsWENStOqAavLoGf9pmK3+vEYbZiB9b2wVH8EA399iOR6zU8Dz+cGbU0fwG8XLQ2jXyVmB921ojrsvb5v6VSr97VtWLEizNpIZkLnOTUxeX+ps3bHt3GgAUNBbFTBeWc41sRQZn7Cw0GHVj5GQAEON+Hwre2t/jELEGsXAB+nImV/SkFEx9/hJ778ULYvFyG2tBjbkOO+FEQnXfkpk0Tr/jdXi35gA6htOnCJMP+e0DxTJW4t3yL/HJcuMRmOs3cQFlpf0Gh0y842CbZoVHIqKqSoMrbcPTTz5z8AjNAGkbL4avp/aQ/Y/3CZiQqaVJRtlmTn7u1bG5mPbRBuOO1qPY5dRz355CX2DBsSk1Nff1QhG0khIcG8LO6eJ+PC2c6zhmRH7vsA8OZiGY51fwRE4ZLHtqn5MCu89ROwzn+1ZLtJUY+ua/nqERw/qMzVMqCzeVS5Fnk0sbfI68Y8IjoBe24y3KjRt+HoxRGtoivQB+28l0MdnaPz7eu7Yy+APHNhNYYEde908ahrW6b5Y6pCT0qOfQ0mdyLo0ZeGU9oWFwCq1smHRBCnInVfS4YI+EQUpY7fFE8PHTFGad/mxdUFFc44cIKuyLLMJheVExd9fl5R+4P/LFbvdZMArBVQglGl1GelYLQa8I9Nz4Gp4lT2Zdy5A7t9t9LUHMlHwnbkjCu f1EtQqti U6RKUE7E8igYtCTJ1X+j09vmMfjCu+AXmiMptQcrfU41xep0PPlrVqofBlo3b7eODvwDRqcV6N+I0c71QQcPLGtk1d3jbx5MDWuA8PPzAxOyOUM3XwKwCUpSfOdE3CPO2XNZy1wBajkodPWDqv7zsd6NZ9LlclDJAxUedzijaRMncCR2P1W3JquudrWUGZpAXN5YRgkuLzB8IlwHWnF5DEPqU40LCpbdX7PXRsPvW/ki8VVOwXwW2FHyjCpkbSu7LWBhxSbL4KRrZpNqXDLUe0OXxut4DjoZkpkW89ZXtmHmyZo+B6ID4wqySysv6c0AJgWqfRAPLRuUV6rpNkVlnopg5OC0i0i02KZKZoR2qmpqtxwEyYcH9cUB/G+Um3WNMuz4z4zYfSYIT8JcJlGe0zkKPw6TVK/2NGZKlOycn56Zv6SUnjxTFwpJD9RNCfSuSRXcOOo33+5YNsUY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/26/23 16:52, Matthew Wilcox wrote: > On Mon, Jun 26, 2023 at 04:27:54PM +0200, Danilo Krummrich wrote: >> On 6/26/23 15:19, Matthew Wilcox wrote: >>> On Mon, Jun 26, 2023 at 02:38:06AM +0200, Danilo Krummrich wrote: >>>> On the other hand, unless I miss something (and if so, please let me know), >>>> something is bogus with the API then. >>>> >>>> While the documentation of the Advanced API of the maple tree explicitly >>>> claims that the user of the API is responsible for locking, this should be >>>> limited to the bounds set by the maple tree implementation. Which means, the >>>> user must decide for either the internal (spin-) lock or an external lock >>>> (which possibly goes away in the future) and acquire and release it >>>> according to the rules maple tree enforces through lockdep checks. >>>> >>>> Let's say one picks the internal lock. How is one supposed to ensure the >>>> tree isn't modified using the internal lock with mas_preallocate()? >>>> >>>> Besides that, I think the documentation should definitely mention this >>>> limitation and give some guidance for the locking. >>>> >>>> Currently, from an API perspective, I can't see how anyone not familiar with >>>> the implementation details would be able to recognize this limitation. >>>> >>>> In terms of the GPUVA manager, unfortunately, it seems like I need to drop >>>> the maple tree and go back to using a rb-tree, since it seems there is no >>>> sane way doing a worst-case pre-allocation that does not suffer from this >>>> limitation. >>> >>> I haven't been paying much attention here (too many other things going >>> on), but something's wrong. >>> >>> First, you shouldn't need to preallocate. Preallocation is only there >> >> Unfortunately, I think we really have a case where we have to. Typically GPU >> mappings are created in a dma-fence signalling critical path and that is >> where such mappings need to be added to the maple tree. Hence, we can't do >> any sleeping allocations there. > > OK, so there are various ways to hadle this, depending on what's > appropriate for your case. > > The simplest is to use GFP_ATOMIC. Essentially, you're saying to the MM > layer "This is too hard, let me tap into the emergency reserves". It's > mildly frowned upon, so let's see if we can do better. > > If you know where the allocation needs to be stored, but want it to act as > NULL until the time is right, you can store a ZERO entry. That will read > as NULL until you store to it. A pure overwriting store will not cause > any memory allocation since all the implementation has to do is change > a pointer. The XArray wraps this up nicely behind an xa_reserve() API. > As you're discovering, the Maple Tree API isn't fully baked yet. > Unfortunately, GFP_ATOMIC seems the be the only option. I think storing entries in advance would not work. Typically userspace submits a job to the kernel issuing one or multiple requests to map and unmap memory in an ioctl. Such a job is then put into a queue and processed asynchronously in a dma-fence signalling critical section. Hence, at the we'd store entries in advance we could have an arbitrary amount of pending jobs potentially still messing with the same address space region. So, the only way to go seems to be to use mas_store_gfp() with GFP_ATOMIC directly in the fence signalling critical path. I guess mas_store_gfp() does not BUG_ON() if it can't get atomic pages? Also, I just saw that the tree is limited in it's height (MAPLE_HEIGHT_MAX). Do you think it could be a sane alternative to pre-allocate with MAPLE_HEIGHT_MAX rather than to rely on atomic pages? Or maybe a compromise of pre-allocating just a couple of nodes and then rely on atomic pages for the rest? FYI, we're talking about a magnitude of hundreds of thousands of entries to be stored in the tree. - Danilo