From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7538CC02198 for ; Tue, 18 Feb 2025 15:44:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DCE8D280147; Tue, 18 Feb 2025 10:44:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D7D8528013C; Tue, 18 Feb 2025 10:44:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF8C1280147; Tue, 18 Feb 2025 10:44:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9F3D928013C for ; Tue, 18 Feb 2025 10:44:41 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 50C1CADCEF for ; Tue, 18 Feb 2025 15:44:41 +0000 (UTC) X-FDA: 83133487962.30.F90EE75 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf04.hostedemail.com (Postfix) with ESMTP id C94054000C for ; Tue, 18 Feb 2025 15:44:38 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DqNKHzoe; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739893478; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WcUPNx2JQC70JR9g0edingK7bsjKh0hOq9aSnZC+a6M=; b=DkB93L91JENiAUmw5wjQ8CnHyCFqJBENCJw6rigRS8RoW4AD5nYlR1CC9iDsyFQtqNz2I1 gdQEHAGP73HkIPO1bOtJjj3lYAImyD+F5XezFiGZ69Spc98MiFEf3T3ezKyCPfOZ+9rDir 0Y9JKUPd16YQdOuRd806Sb7c47giBHo= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=DqNKHzoe; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739893478; a=rsa-sha256; cv=none; b=Iiss/nEpYRCkQ/npIhUvE7n2n3t1WV05Q5jZwfu7M3uHu8A78FKy3lfKbEKuWZBSBD/82/ XHPMTIAOXQIQG02rILZFNsKCfK0fjodONNoXDeXL82BpxVY7NN/F7Y2k51Cs+tBAB2eiR/ z9Pk7Vdt3sMXnTR2MU67zFsM515LZ+0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739893478; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=WcUPNx2JQC70JR9g0edingK7bsjKh0hOq9aSnZC+a6M=; b=DqNKHzoe795/hESUn7zvCDSN5savBzNhzY66UtP9L7xj3F+GgLyygUnCE7AGoSBR39g4MY w/Gpepf2qRCmbzcLGV9NJhQRavbMsA8S/oqEIwKgltDCMWAxWiLJ+Ppa2YZmgmjOKe5RdQ 2PUZG4WiXEcPgizC1Z7KJcyGV0w+KBk= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-125-5ZRMq5xgNN2GC1UZbYUzkw-1; Tue, 18 Feb 2025 10:44:34 -0500 X-MC-Unique: 5ZRMq5xgNN2GC1UZbYUzkw-1 X-Mimecast-MFC-AGG-ID: 5ZRMq5xgNN2GC1UZbYUzkw_1739893473 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-38f4e3e9c70so738613f8f.1 for ; Tue, 18 Feb 2025 07:44:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739893472; x=1740498272; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=WcUPNx2JQC70JR9g0edingK7bsjKh0hOq9aSnZC+a6M=; b=i83j6Gll4kSkzjDSG4V/elK9DIPI8lQkHdwrImv9cWCFglokAYkLh29+kyCg2Lj05m +ZInIn6KBzb9Oi/QzyMBAkcgyFElfj+YvYlcV462YxttZXuFlX1oYqhdUGuDuJFtiAmQ 2uQJW/315ZfLkmkO7CAKjTK7rzDAiaLFTCtV+d2T7mh4lSoR19flVgBZyz+x+wch5nlc x6ev/DEIXpaweaEFR2JPOwd5WarJ3IS/lY53EdHiUBvhdXbQzzXyXQUs4q5Gf7FWjP59 B8lpt8K0dNgXLA/DdMUIlVRMaHfQtVtP1JWDoU4dgJwH3HL1UHijDWBoL9qxY6ZQ69FP YE1A== X-Gm-Message-State: AOJu0Yx0ho+dHJp/ujWWl78X9fWSwzG9EEWdZhEYJEt6A/swJACi7Je1 eM6gHhEL4R6AgIdOJ5fdftK2BkxgLpHgoBaEZs3qzwBlHWKf85y0N11BeMTJNaFuF+JMAQbGiTX cYsVx5lBC8bTPbzy0oHRXiL9hJh3JR/fFC52iJJHE5jcJOBRt X-Gm-Gg: ASbGncvDoH0wv+ipjlJ1gmSUK2r7SoBW3sY8+fEeie+8Vb5xWECn+RLHwgUiTZBeMr0 CkZBX4JIoZ8w1PvrVYUyjvDdwpdyGuI/tAlNolNalBJTdC1EzOCDWxfX0FdJFytwpuOUtIj2g7F 2l1NwY+cwhpNDn9jTJVVeOY+OccbZFb9+0s9HQr/rCUYmcMQSxTePIJrvQVHK9qQXAlrwTebaCQ 2Mb/WBydgS4ncM6TErk9JIy+ZSd3mtQQn8wkh5eHd31CoxpdvAAle3uA7J7HSE+E+g+q7dpjc0N RUrKo6HSw2xodvl2vb3oo6ss2TveyZBzyh6DT6ZOthmAMwIeXeABX514OXutmcvOXgdoqry9fiY NzPtNaJLJPyjjhXlA7J0KDYI1rKeNOU9o X-Received: by 2002:a05:6000:1f84:b0:38d:dc03:a3d6 with SMTP id ffacd0b85a97d-38f57bce501mr200195f8f.4.1739893472525; Tue, 18 Feb 2025 07:44:32 -0800 (PST) X-Google-Smtp-Source: AGHT+IGaCfKmA14NfZC6X6I9NJIGyPqDFslk1wKOQE8jruH0uq2xV6T15XNvNwo+oWnoq8Dr3kHSIA== X-Received: by 2002:a05:6000:1f84:b0:38d:dc03:a3d6 with SMTP id ffacd0b85a97d-38f57bce501mr200164f8f.4.1739893471971; Tue, 18 Feb 2025 07:44:31 -0800 (PST) Received: from ?IPV6:2003:cb:c70d:fb00:d3ed:5f44:1b2d:12af? (p200300cbc70dfb00d3ed5f441b2d12af.dip0.t-ipconnect.de. [2003:cb:c70d:fb00:d3ed:5f44:1b2d:12af]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4398a64febasm46852565e9.1.2025.02.18.07.44.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 18 Feb 2025 07:44:31 -0800 (PST) Message-ID: Date: Tue, 18 Feb 2025 16:44:28 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v7 1/8] xarray: add xas_try_split() to split a multi-index entry. To: Zi Yan Cc: linux-mm@kvack.org, Andrew Morton , "Kirill A . Shutemov" , "Matthew Wilcox (Oracle)" , Ryan Roberts , Hugh Dickins , Yang Shi , Miaohe Lin , Kefeng Wang , Yu Zhao , John Hubbard , Baolin Wang , linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org References: <20250211155034.268962-1-ziy@nvidia.com> <20250211155034.268962-2-ziy@nvidia.com> <0bb75517-9418-4145-8aa8-b05373010711@redhat.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: _MzWVujkRbaog_afR63jmzeJZBWoA5MCk33ciC4MPCY_1739893473 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: C94054000C X-Stat-Signature: f41n85r1qj8ffctjanh4hc6iro95pk5x X-HE-Tag: 1739893478-478220 X-HE-Meta: U2FsdGVkX1/DOHvgTXU5Vg+rs29b8urPWtEBHxupiWuXh27l/kYWUOS+WQbJozpF9t0h0Q5jV603SKh6F0AVFhycBhUMWinU5/V+7HY5+0impMEJ9rm6i8Ip3q6E2x+1CDx1rvQSjcS9anFIEwyLRQVHaaEfVAnjcCrXNR4Cku1zGmbosb1U4XgUrJuI+X4QS1ahqYV+3HuJbcnGhVnNF5y6L3f7yWqtdWLI52QuA9RfaFE1jTFdVkG+9INHWtTPL27anjVw7Jl6K+O+rHxw67xHDwKTImMJfA4wbMnHvtbZ0khjpjer0oFZQTp1Vng+NgzqpRl4WJ1C7GbRMGkWh4XEEvWLVRLrkn0uxElqeby0/FSVC7FMpqckVMgwB3g2fMSfW+y64FzB9QRulOagBNm7r+JcXrN3CY9p3dmudHel1A0dXeFcLFrocXd8nPTWiNEz9TnwHhbf8uNylgtGtSfFhVmqf8xHaEWCP6mUIFGO+ptTe9DVY6nz/nUte71pZsBHcg1Ryfo/x2uM2ckRW9ADGF/kQNFpFsTsmU1HkS7+iF2PCgtzZt95Y2DbI1ZPimaX9ANwcLOONLAosVL4qacTNikxFx7HCO5Fy/g7vEABow5FPQqwZ6aU5AfvZXMENUcT7yTllm2Y397WeWS2065Lr0uNQsrnh9Fi6tMbhJbW645H1pS67ZNQaeQGaM9oJm8F9LFmZY/ZhP29DXxDGIvOIfrFzZhzK8lgy6Ob8XlNggapyfcJm8ZccHrbulUz0GcGOzmqC79V7R3o05DRIycjvOKQTL6gGmvNLpR/oipa5c5wZ5zC2s9ShkoivnhXABwxfR4b/bE+aRgW5pLDdAC+PSPaka89GZ2TOG5MLiXuy1afaBX0zizVA1iw0dn30wHo0pBMTqwVobroKOCX8jTOlUHdrbN4FKws67ludBWKHrCZ/sApxsegSbFSDbT4DWbDKCO3E7fx1VwJS6z HBuf8SJU wTa3ZhAuBhpaYUFRVklAotK38loGzKT8xptdFpLpq+rJMmehTvy1AogZUOH/0lJj47sOGqwH8EUv8y2BuR2vhECy8kT76ybW6u9uZIBDngNJwF/kZrjt5py/gEfrKQSfT2PWnzLcpHa5lokcE9r8LNq1Zki1KdP2mI461wpi6qxnn5TiOiC2TESAf+qSdwZxbUS5BZMPtGFGTz4mC6hKXx4t81MFwpTtkM16G3PB7grPf5yxnIgkk9HyBCb1uPhpVF4IfIGKBdRZ30KcAYbcu+SgQ4eDu1YiWsLI2f30mFbqR599G////hY+x01Elsrl5hRME+Ss+RCd+Jyk75Kgt2t1YqF9TgA0zdpQxpQoprGffVWKxXuI0jf8Xg9q9pQF9VA+oaWeQEAc8gqsjkUw9Kno8EezBCzWNTXjCXs9aFPp6J9UszyulrnnMnFVjkFWF63wMUhDgHgQcWchUsK6WQqK2+LExxSKNg4XFTt8iHDqBKnDEqMT8H3okq40h3l8Axvs3OiYQ4YVMj/0uBKCG7DOQzXgMQGyfjPbn5OelQ6tX88sWsmHduJxziNV0jqyH6f1KGdHBvSs6PMK3/E+2GmP8qw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 17.02.25 23:05, Zi Yan wrote: > On 17 Feb 2025, at 16:44, David Hildenbrand wrote: > >> On 11.02.25 16:50, Zi Yan wrote: >>> It is a preparation patch for non-uniform folio split, which always split >>> a folio into half iteratively, and minimal xarray entry split. >>> >>> Currently, xas_split_alloc() and xas_split() always split all slots from a >>> multi-index entry. They cost the same number of xa_node as the to-be-split >>> slots. For example, to split an order-9 entry, which takes 2^(9-6)=8 >>> slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8 xa_node are >>> needed. Instead xas_try_split() is intended to be used iteratively to split >>> the order-9 entry into 2 order-8 entries, then split one order-8 entry, >>> based on the given index, to 2 order-7 entries, ..., and split one order-1 >>> entry to 2 order-0 entries. When splitting the order-6 entry and a new >>> xa_node is needed, xas_try_split() will try to allocate one if possible. >>> As a result, xas_try_split() would only need one xa_node instead of 8. >>> >>> When a new xa_node is needed during the split, xas_try_split() can try to >>> allocate one but no more. -ENOMEM will be return if a node cannot be >>> allocated. -EINVAL will be return if a sibling node is split or >>> cascade split happens, where two or more new nodes are needed, and these >>> are not supported by xas_try_split(). >>> >>> xas_split_alloc() and xas_split() split an order-9 to order-0: >>> >>> --------------------------------- >>> | | | | | | | | | >>> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | >>> | | | | | | | | | >>> --------------------------------- >>> | | | | >>> ------- --- --- ------- >>> | | ... | | >>> V V V V >>> ----------- ----------- ----------- ----------- >>> | xa_node | | xa_node | ... | xa_node | | xa_node | >>> ----------- ----------- ----------- ----------- >>> >>> xas_try_split() splits an order-9 to order-0: >>> --------------------------------- >>> | | | | | | | | | >>> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | >>> | | | | | | | | | >>> --------------------------------- >>> | >>> | >>> V >>> ----------- >>> | xa_node | >>> ----------- >>> >>> Signed-off-by: Zi Yan >>> --- >>> Documentation/core-api/xarray.rst | 14 ++- >>> include/linux/xarray.h | 7 ++ >>> lib/test_xarray.c | 47 +++++++++++ >>> lib/xarray.c | 136 ++++++++++++++++++++++++++---- >>> tools/testing/radix-tree/Makefile | 1 + >>> 5 files changed, 188 insertions(+), 17 deletions(-) >>> >>> diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst >>> index f6a3eef4fe7f..c6c91cbd0c3c 100644 >>> --- a/Documentation/core-api/xarray.rst >>> +++ b/Documentation/core-api/xarray.rst >>> @@ -489,7 +489,19 @@ Storing ``NULL`` into any index of a multi-index entry will set the >>> entry at every index to ``NULL`` and dissolve the tie. A multi-index >>> entry can be split into entries occupying smaller ranges by calling >>> xas_split_alloc() without the xa_lock held, followed by taking the lock >>> -and calling xas_split(). >>> +and calling xas_split() or calling xas_try_split() with xa_lock. The >>> +difference between xas_split_alloc()+xas_split() and xas_try_alloc() is >>> +that xas_split_alloc() + xas_split() split the entry from the original >>> +order to the new order in one shot uniformly, whereas xas_try_split() >>> +iteratively splits the entry containing the index non-uniformly. >>> +For example, to split an order-9 entry, which takes 2^(9-6)=8 slots, >>> +assuming ``XA_CHUNK_SHIFT`` is 6, xas_split_alloc() + xas_split() need >>> +8 xa_node. xas_try_split() splits the order-9 entry into >>> +2 order-8 entries, then split one order-8 entry, based on the given index, >>> +to 2 order-7 entries, ..., and split one order-1 entry to 2 order-0 entries. >>> +When splitting the order-6 entry and a new xa_node is needed, xas_try_split() >>> +will try to allocate one if possible. As a result, xas_try_split() would only >>> +need 1 xa_node instead of 8. >>> Functions and structures >>> ======================== >>> diff --git a/include/linux/xarray.h b/include/linux/xarray.h >>> index 0b618ec04115..9eb8c7425090 100644 >>> --- a/include/linux/xarray.h >>> +++ b/include/linux/xarray.h >>> @@ -1555,6 +1555,8 @@ int xa_get_order(struct xarray *, unsigned long index); >>> int xas_get_order(struct xa_state *xas); >>> void xas_split(struct xa_state *, void *entry, unsigned int order); >>> void xas_split_alloc(struct xa_state *, void *entry, unsigned int order, gfp_t); >>> +void xas_try_split(struct xa_state *xas, void *entry, unsigned int order, >>> + gfp_t gfp); >>> #else >>> static inline int xa_get_order(struct xarray *xa, unsigned long index) >>> { >>> @@ -1576,6 +1578,11 @@ static inline void xas_split_alloc(struct xa_state *xas, void *entry, >>> unsigned int order, gfp_t gfp) >>> { >>> } >>> + >>> +static inline void xas_try_split(struct xa_state *xas, void *entry, >>> + unsigned int order, gfp_t gfp) >>> +{ >>> +} >>> #endif >>> /** >>> diff --git a/lib/test_xarray.c b/lib/test_xarray.c >>> index 6932a26f4927..598ca38a2f5b 100644 >>> --- a/lib/test_xarray.c >>> +++ b/lib/test_xarray.c >>> @@ -1857,6 +1857,49 @@ static void check_split_1(struct xarray *xa, unsigned long index, >>> xa_destroy(xa); >>> } >>> +static void check_split_2(struct xarray *xa, unsigned long index, >>> + unsigned int order, unsigned int new_order) >>> +{ >>> + XA_STATE_ORDER(xas, xa, index, new_order); >>> + unsigned int i, found; >>> + void *entry; >>> + >>> + xa_store_order(xa, index, order, xa, GFP_KERNEL); >>> + xa_set_mark(xa, index, XA_MARK_1); >>> + >>> + xas_lock(&xas); >>> + xas_try_halve(&xas, xa, order, GFP_KERNEL); >>> + if (((new_order / XA_CHUNK_SHIFT) < (order / XA_CHUNK_SHIFT)) && >>> + new_order < order - 1) { >>> + XA_BUG_ON(xa, !xas_error(&xas) || xas_error(&xas) != -EINVAL); >>> + xas_unlock(&xas); >>> + goto out; >>> + } >>> + for (i = 0; i < (1 << order); i += (1 << new_order)) >>> + __xa_store(xa, index + i, xa_mk_index(index + i), 0); >>> + xas_unlock(&xas); >>> + >>> + for (i = 0; i < (1 << order); i++) { >>> + unsigned int val = index + (i & ~((1 << new_order) - 1)); >>> + XA_BUG_ON(xa, xa_load(xa, index + i) != xa_mk_index(val)); >>> + } >>> + >>> + xa_set_mark(xa, index, XA_MARK_0); >>> + XA_BUG_ON(xa, !xa_get_mark(xa, index, XA_MARK_0)); >>> + >>> + xas_set_order(&xas, index, 0); >>> + found = 0; >>> + rcu_read_lock(); >>> + xas_for_each_marked(&xas, entry, ULONG_MAX, XA_MARK_1) { >>> + found++; >>> + XA_BUG_ON(xa, xa_is_internal(entry)); >>> + } >>> + rcu_read_unlock(); >>> + XA_BUG_ON(xa, found != 1 << (order - new_order)); >>> +out: >>> + xa_destroy(xa); >>> +} >>> + >>> static noinline void check_split(struct xarray *xa) >>> { >>> unsigned int order, new_order; >>> @@ -1868,6 +1911,10 @@ static noinline void check_split(struct xarray *xa) >>> check_split_1(xa, 0, order, new_order); >>> check_split_1(xa, 1UL << order, order, new_order); >>> check_split_1(xa, 3UL << order, order, new_order); >>> + >>> + check_split_2(xa, 0, order, new_order); >>> + check_split_2(xa, 1UL << order, order, new_order); >>> + check_split_2(xa, 3UL << order, order, new_order); >>> } >>> } >>> } >>> diff --git a/lib/xarray.c b/lib/xarray.c >>> index 116e9286c64e..c38beca77830 100644 >>> --- a/lib/xarray.c >>> +++ b/lib/xarray.c >>> @@ -1007,6 +1007,31 @@ static void node_set_marks(struct xa_node *node, unsigned int offset, >>> } >>> } >>> +static struct xa_node *__xas_alloc_node_for_split(struct xa_state *xas, >>> + void *entry, gfp_t gfp) >>> +{ >>> + unsigned int i; >>> + void *sibling = NULL; >>> + struct xa_node *node; >>> + unsigned int mask = xas->xa_sibs; >>> + >>> + node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); >>> + if (!node) >>> + return NULL; >>> + node->array = xas->xa; >>> + for (i = 0; i < XA_CHUNK_SIZE; i++) { >>> + if ((i & mask) == 0) { >>> + RCU_INIT_POINTER(node->slots[i], entry); >>> + sibling = xa_mk_sibling(i); >>> + } else { >>> + RCU_INIT_POINTER(node->slots[i], sibling); >>> + } >>> + } >>> + RCU_INIT_POINTER(node->parent, xas->xa_alloc); >>> + >>> + return node; >>> +} >>> + >>> /** >>> * xas_split_alloc() - Allocate memory for splitting an entry. >>> * @xas: XArray operation state. >>> @@ -1025,7 +1050,6 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, >>> gfp_t gfp) >>> { >>> unsigned int sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; >>> - unsigned int mask = xas->xa_sibs; >>> /* XXX: no support for splitting really large entries yet */ >>> if (WARN_ON(xas->xa_shift + 2 * XA_CHUNK_SHIFT <= order)) >>> @@ -1034,23 +1058,9 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order, >>> return; >>> do { >>> - unsigned int i; >>> - void *sibling = NULL; >>> - struct xa_node *node; >>> - >>> - node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); >>> + struct xa_node *node = __xas_alloc_node_for_split(xas, entry, gfp); >>> if (!node) >>> goto nomem; >>> - node->array = xas->xa; >>> - for (i = 0; i < XA_CHUNK_SIZE; i++) { >>> - if ((i & mask) == 0) { >>> - RCU_INIT_POINTER(node->slots[i], entry); >>> - sibling = xa_mk_sibling(i); >>> - } else { >>> - RCU_INIT_POINTER(node->slots[i], sibling); >>> - } >>> - } >>> - RCU_INIT_POINTER(node->parent, xas->xa_alloc); >>> xas->xa_alloc = node; >>> } while (sibs-- > 0); >>> @@ -1122,6 +1132,100 @@ void xas_split(struct xa_state *xas, void *entry, unsigned int order) >>> xas_update(xas, node); >>> } >>> EXPORT_SYMBOL_GPL(xas_split); >>> + >>> +/** >>> + * xas_try_split() - Try to split a multi-index entry. >>> + * @xas: XArray operation state. >>> + * @entry: New entry to store in the array. >>> + * @order: Current entry order. >>> + * @gfp: Memory allocation flags. >>> + * >>> + * The size of the new entries is set in @xas. The value in @entry is >>> + * copied to all the replacement entries. If and only if one xa_node needs to >>> + * be allocated, the function will use @gfp to get one. If more xa_node are >>> + * needed, the function gives EINVAL error. >>> + * >>> + * Context: Any context. The caller should hold the xa_lock. >>> + */ >>> +void xas_try_split(struct xa_state *xas, void *entry, unsigned int order, >>> + gfp_t gfp) >>> +{ >>> + unsigned int sibs = (1 << (order % XA_CHUNK_SHIFT)) - 1; >>> + unsigned int offset, marks; >>> + struct xa_node *node; >>> + void *curr = xas_load(xas); >>> + int values = 0; >>> + >>> + node = xas->xa_node; >>> + if (xas_top(node)) >>> + return; >>> + >>> + if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT) >>> + gfp |= __GFP_ACCOUNT; >>> + >>> + marks = node_get_marks(node, xas->xa_offset); >>> + >>> + offset = xas->xa_offset + sibs; >>> + do { >>> + if (xas->xa_shift < node->shift) { >>> + struct xa_node *child = xas->xa_alloc; >>> + unsigned int expected_sibs = >>> + (1 << ((order - 1) % XA_CHUNK_SHIFT)) - 1; >>> + >>> + /* >>> + * No support for splitting sibling entries >>> + * (horizontally) or cascade split (vertically), which >>> + * requires two or more new xa_nodes. >>> + * Since if one xa_node allocation fails, >>> + * it is hard to free the prior allocations. >>> + */ >>> + if (sibs || xas->xa_sibs != expected_sibs) { >>> + xas_destroy(xas); >>> + xas_set_err(xas, -EINVAL); >>> + return; >>> + } >>> + >>> + if (!child) { >>> + child = __xas_alloc_node_for_split(xas, entry, >>> + gfp); >>> + if (!child) { >>> + xas_destroy(xas); >>> + xas_set_err(xas, -ENOMEM); >>> + return; >>> + } >>> + } >> >> No expert on this, just wondering ... >> >> ... what is the effect if we halfway-through fail the split? Is it okay to leave that "partially split" thing in place? Can callers deal with that? > > Good question. > Let me rephrase: In __split_unmapped_folio(), we call xas_try_split(). If that fails, we stop the split and effectively skip over the __split_folio_to_order(). The folio remains unsplit (no order change: old_order). xas_try_split() was instructed to split from old_order -> split_order. xas_try_split() documents that: "The value in @entry is copied to all the replacement entries.", meaning after the split, all entries will be pointing at the folio. Now, can it happen that xas_try_split() would ever perform a partial split in any way, when invoked from __split_unmapped_folio(), such that we run into the do { } while(); loop and fail with -ENOMEM after already having performed changes -- xas_update(). Or is that simply impossible? Maybe it's just the do { } while(); loop in there that is confusing me. (again, no expert) > xas_try_split() imposes what kind of split it does and is usually used to > split from order N to order N-1: You mean that old_order -> split_order will in the case of __split_unmapped_folio() always be a difference of 1? > > 1. when N is a multiplier of XA_CHUNK_SHIFT, a new xa_node is needed, so > either child (namely xas->xa_alloc) is not NULL, meaning someone called > xa_nomem() to allocate a xa_node before xas_try_split() or child is NULL > and an allocation is needed. If child is still NULL after the allocation, > meaning we are out of memory, no split is done; > > 2. when N is not, no new xa_node is needed, xas_try_split() just rewrites > existing slot values to perform the split (the code after the hunk above). > No fail will happen. For this split, since no new xa_node is needed, > the caller is actually allowed to split from N to a value smaller than > N-1 as long as N-1 is >= (N - N % XA_CHUNK_SHIFT). > > > Various checks make sure xas_try_split() only sees the two above situation: > > a. "xas->xa_shift < node->shift" means the split crosses XA_CHUNK_SHIFT, > at least 1 new xa_node is needed; the else branch only handles the case > 2 above; > > b. for the then branch the "if (sibs || xas->xa_sibs != expected_sibs)" > check makes sure N is a multiplier of XA_CHUNK_SHIFT and the new order > has to be N-1. In "if (sibs || xas->xa_sibs != expected_sibs)", > "sibs != 0" means the from order N covers more than 1 slot, so more than 1 > new xa_node is needed, "xas->xa_sibs != expected_sibs" makes sure > the new order is N-1 (you can see it from how expected_sibs is assigned). Thanks! > > Let me know if you have any other question. Thanks for the details! -- Cheers, David / dhildenb