From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 904DCC10DC3 for ; Thu, 7 Dec 2023 09:44:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC5DA6B0087; Thu, 7 Dec 2023 04:44:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C767C6B0088; Thu, 7 Dec 2023 04:44:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF0416B0089; Thu, 7 Dec 2023 04:44:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9DD876B0087 for ; Thu, 7 Dec 2023 04:44:24 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 60B4014013F for ; Thu, 7 Dec 2023 09:44:24 +0000 (UTC) X-FDA: 81539536848.06.E633A4B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf05.hostedemail.com (Postfix) with ESMTP id E11CC100012 for ; Thu, 7 Dec 2023 09:44:21 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Qy2kcEot; spf=pass (imf05.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701942262; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h05mdbrqLN7Yc/bLUgwFcpdqUZJ8tVPuufWxT9czt48=; b=zqHfuvLiZkQkPFe+PNqYIKDuv4c+CjnwSDw6DObW+Xf0yDvgyLqkjTCPqUgwYup9zgNJSw vaMeZ7EQ/i2dDKDUnxnYb1zTT0lROkhdm6fg99j8EeaGHuHz5CeeyatSk/5+m0NpvA9ckZ 0k3uCj5I5MNdYnHtbFhH8zLIMlwnwcg= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Qy2kcEot; spf=pass (imf05.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701942262; a=rsa-sha256; cv=none; b=T852zikPEBr0JnaNB9vvHF2I9882zJDsenJawtMQy/EgZMOlAXyq4XJ/yeO9dJUwEK1cgz sAKQIY9KmTYoleOmS5K5MLWus6pUayzcyzSq+11iYU4Vci5e6ks0DC3a4y8UGFyxqhy1kZ /PVIqVv1Q3RexSGX+xucMVT8vtFJZ2A= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1701942261; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=h05mdbrqLN7Yc/bLUgwFcpdqUZJ8tVPuufWxT9czt48=; b=Qy2kcEot3W6ijgtH/Q+BQBBTj9acLHcXTis8dKN1aaMo3/878ZYGyRCU59Q+McyMOWcmSA lJO4ZPSGEBQSNaYq1ROObVzRww60YSkmpvUey26S00hY539fPVbHQjRJoY00IvzXvvNgxj 7NzFWDCF9rIUhDPVm7kp89YWARVkDkE= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-547-cwnTNIabPPGl7RhF6RLnYw-1; Thu, 07 Dec 2023 04:44:18 -0500 X-MC-Unique: cwnTNIabPPGl7RhF6RLnYw-1 Received: by mail-lf1-f71.google.com with SMTP id 2adb3069b0e04-50bf0f26df4so424587e87.2 for ; Thu, 07 Dec 2023 01:44:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701942257; x=1702547057; h=content-transfer-encoding:in-reply-to:organization:autocrypt:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=h05mdbrqLN7Yc/bLUgwFcpdqUZJ8tVPuufWxT9czt48=; b=aoxl6BdyGSu+0gTDglbnnSZ2WlcUoWzEB0jZvgGNB5h20Ftziot0FMATwXUq/d3XP5 j6S2TqHvHBzExUTRlQsChn4UYoUey5ohpE3lCnNXgUcpvC7ICPtq37OIVJO0dwcfAxtj grlDbwAqmKiuRj+ul/haIZq8UDZjaLfm1YHZLu1kAc4B3IQTzvDliStx3Bi1G8Xe23qG G+ps+rw+5e9x8woiFo82vat5bEmHrs5A4t2I+Fb5u5jdk0KPEuuhYOJNQnkT73mci2Xh TSrdb9VtlXDTBsfReNdcPGiNr52RSmO8UHVcmwvWNWzktGXmaUZSqQdj/FFFN45ub2pw KBJA== X-Gm-Message-State: AOJu0YygQbkv8npfhurathcIvOFCn3181k38/ObXHdsni7+e14XKU9jo mcpfIQWJRhrX5IBus3vYJdAWMpaN1H3QNhfyKTSzG1K7JZul+PoQcSK9mm2ZH9p71nxFAgBPkxE Sjn5U74DV424= X-Received: by 2002:a19:e01b:0:b0:50b:f88e:ae16 with SMTP id x27-20020a19e01b000000b0050bf88eae16mr1282450lfg.120.1701942257410; Thu, 07 Dec 2023 01:44:17 -0800 (PST) X-Google-Smtp-Source: AGHT+IH6TRFqsT8ftvUDl0fiLoimThJsOpxZyZZJPLys124JtFt7lykeaNdDWzYTGAGjS4G5CfaYJw== X-Received: by 2002:a19:e01b:0:b0:50b:f88e:ae16 with SMTP id x27-20020a19e01b000000b0050bf88eae16mr1282432lfg.120.1701942256938; Thu, 07 Dec 2023 01:44:16 -0800 (PST) Received: from ?IPV6:2003:cb:c71b:5d00:18d7:1475:24bc:2417? (p200300cbc71b5d0018d7147524bc2417.dip0.t-ipconnect.de. [2003:cb:c71b:5d00:18d7:1475:24bc:2417]) by smtp.gmail.com with ESMTPSA id p13-20020a056000018d00b00333415503a7sm971693wrx.22.2023.12.07.01.44.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 07 Dec 2023 01:44:16 -0800 (PST) Message-ID: Date: Thu, 7 Dec 2023 10:44:14 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 3/5] mm/gup: Introduce memfd_pin_user_pages() for pinning memfd pages (v6) To: "Kasireddy, Vivek" , "dri-devel@lists.freedesktop.org" , "linux-mm@kvack.org" Cc: Christoph Hellwig , Daniel Vetter , Mike Kravetz , Hugh Dickins , Peter Xu , Gerd Hoffmann , "Kim, Dongwon" , "Chang, Junxiao" , Jason Gunthorpe References: <20231205053509.2342169-1-vivek.kasireddy@intel.com> <20231205053509.2342169-4-vivek.kasireddy@intel.com> <5ffe2ea3-83da-4b5f-adc8-af9cd9a57cd2@redhat.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: E11CC100012 X-Rspam-User: X-Stat-Signature: 3drf4sm4wg3tp1ab5ck75krpscc5ydxo X-Rspamd-Server: rspam01 X-HE-Tag: 1701942261-523399 X-HE-Meta: U2FsdGVkX18eYinvZcOixuAzZvo3puhekZ2EpP/XXFvepiaRD2nGp+lBVzAJmkQ17EtHVPHPHnqArlsN/l55//UEFPvAOT9jPmq2AUCmdUidzwC/9dB0wim50cGM4hDHo1z8BaqMpXxqlrmZiwrTFDW7XBSfwW25VnvecBFO3dwKr8I3gK0ySc3dcxGmPllq09aTI6mldPijxk5BQ5Vpx/SGWoZn1W11PM94sEeLteaE6SEquwuO1+j2XuPZs2ZX7iy0KThcLNxWE23OFF4H4MzuDlxjeFMMLmxtSMQEiSj1McCcImZerJX77KSniqyeZ7TCIm4T8G+wwTClGrdq0BhL++zNBqDSuZ9y0BZVgHX8UB+R/tPhbzMIiheymyFe50gbadFyiDRxTi23KYWsKNKgjnn4/b7Y+Bf/ArC8Bh4Ce37EDHVTvpHMCvUmTqD9rWOX+Q8gvJzmWBDH49dH58fKEuV42lOSm1ImE9lt5AjHcgiiAXdKzeAXmQEOhLMOuF6DAH0jTkesLpuv4asT+JzPYi04hhA5VT6MA5ErxculA4THl+QowMlPHX2GNLiG8tMFHIAwg0Dv6oUDgwCFhIcuEAGJHlHWjttkYoa1vehLfATqlWi5OFZAkOEs7tyReOlH/wgEH0RuBTDC3I3+MjErZM5OkqmnuLkebQgPn8arYL+6EZL7PhKoxrMd8VnBMVbagP3n5NAmHLwiSPJbBE8Gu1lPbVdZDJvzEJvpgmZt1bT4tpcRx0bWUTjghEcZdCydafiTt20oOytd9m8Th3KnCyzoPdyAWs1MiuQ6DXM0SznRBQV3LEtwVhTv7ELUoClTA80DXI+YdnEgaJeLWWqY8OnO0kV9MvaYTQd+Ht/qU50Qz+j8jCPP6FVnBVKbFJyt/r9xdr29rtY/IWKCGp9f5VV79roZvYIOgYYzYtZyBQ/WjvIoQlDn1HOMHMfcOHKoCMQlPccyB2TkzJV Px6S/s3C TC5qic3ppauMsO1COEk1y4EhVhTPQEBNTiQHRDVzQ06pKbGzEQiwt686zzRyS/dSiHkEcKtuXk23QjGOL35dbUXjAZIw1Pqehnc7aPoYbyvIDPjTg+nH/XKUUL/rfIOCOGGLMHvDsrv6pMPgLVGedO+4LcJS1hzs5k7JW6Q6qZ8oK7dVlRkfutV2KwGJ/WCeK0fChEHYLxlBuv2tGM/JzwVOpkqh9TKw4U8v2+rB7OmXGvVkR/eWkxfTHD5BkTEuGaRsTdddpCnj4lqaG4CfiO6GYRyJL8LrJ+VG2guOfNxy7D8R/011sBwuTDVBW9upqiqGkMMeSL1JlMOAJNIiL0QiwzViUCDFrEn5oevWdxZ4E2ye+a7OcKSD/juusTE6Smft/VmX0xm7gRZhl1hn/56o5IrPOQeCoMee5nXqcu1skxl57KqaNSylcg2A7tlw+p+U1Z/YeoTWZnowx0TXlgfk2Jd3v/J1ltqaeFH1azMAFydftT4N6wv1ImqcdQHRnJ8MVuKgXacgchT3+W2Hd4iocg8sVLg5HaaQSHWSmJcVFQGB6gab2+NycY1QodUQ2G7a+CfVD85cEpxXhiySDm+PY7qL3QsBiT3hiSRAkkccaCly1BMtKymhiyUS/yyqKgIp803AnGh4wj0ASiasqIVGyjCZ/bk9gwYWku2GBhPXpu6Ntib+dehAAKQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 07.12.23 06:09, Kasireddy, Vivek wrote: > Hi David, > >> On 05.12.23 06:35, Vivek Kasireddy wrote: >>> For drivers that would like to longterm-pin the pages associated >>> with a memfd, the pin_user_pages_fd() API provides an option to >>> not only pin the pages via FOLL_PIN but also to check and migrate >>> them if they reside in movable zone or CMA block. This API >>> currently works with memfds but it should work with any files >>> that belong to either shmemfs or hugetlbfs. Files belonging to >>> other filesystems are rejected for now. >>> >>> The pages need to be located first before pinning them via FOLL_PIN. >>> If they are found in the page cache, they can be immediately pinned. >>> Otherwise, they need to be allocated using the filesystem specific >>> APIs and then pinned. >>> >>> v2: >>> - Drop gup_flags and improve comments and commit message (David) >>> - Allocate a page if we cannot find in page cache for the hugetlbfs >>> case as well (David) >>> - Don't unpin pages if there is a migration related failure (David) >>> - Drop the unnecessary nr_pages <= 0 check (Jason) >>> - Have the caller of the API pass in file * instead of fd (Jason) >>> >>> v3: (David) >>> - Enclose the huge page allocation code with #ifdef >> CONFIG_HUGETLB_PAGE >>> (Build error reported by kernel test robot ) >>> - Don't forget memalloc_pin_restore() on non-migration related errors >>> - Improve the readability of the cleanup code associated with >>> non-migration related errors >>> - Augment the comments by describing FOLL_LONGTERM like behavior >>> - Include the R-b tag from Jason >>> >>> v4: >>> - Remove the local variable "page" and instead use 3 return statements >>> in alloc_file_page() (David) >>> - Add the R-b tag from David >>> >>> v5: (David) >>> - For hugetlb case, ensure that we only obtain head pages from the >>> mapping by using __filemap_get_folio() instead of find_get_page_flags() >>> - Handle -EEXIST when two or more potential users try to simultaneously >>> add a huge page to the mapping by forcing them to retry on failure >>> >>> v6: (Christoph) >>> - Rename this API to memfd_pin_user_pages() to make it clear that it >>> is intended for memfds >>> - Move the memfd page allocation helper from gup.c to memfd.c >>> - Fix indentation errors in memfd_pin_user_pages() >>> - For contiguous ranges of folios, use a helper such as >>> filemap_get_folios_contig() to lookup the page cache in batches >>> >>> Cc: David Hildenbrand >>> Cc: Christoph Hellwig >>> Cc: Daniel Vetter >>> Cc: Mike Kravetz >>> Cc: Hugh Dickins >>> Cc: Peter Xu >>> Cc: Gerd Hoffmann >>> Cc: Dongwon Kim >>> Cc: Junxiao Chang >>> Suggested-by: Jason Gunthorpe >>> Reviewed-by: Jason Gunthorpe (v2) >>> Reviewed-by: David Hildenbrand (v3) >>> Signed-off-by: Vivek Kasireddy >>> --- >>> include/linux/memfd.h | 5 +++ >>> include/linux/mm.h | 2 + >>> mm/gup.c | 102 ++++++++++++++++++++++++++++++++++++++++++ >>> mm/memfd.c | 34 ++++++++++++++ >>> 4 files changed, 143 insertions(+) >>> >>> diff --git a/include/linux/memfd.h b/include/linux/memfd.h >>> index e7abf6fa4c52..6fc0d1282151 100644 >>> --- a/include/linux/memfd.h >>> +++ b/include/linux/memfd.h >>> @@ -6,11 +6,16 @@ >>> >>> #ifdef CONFIG_MEMFD_CREATE >>> extern long memfd_fcntl(struct file *file, unsigned int cmd, unsigned int >> arg); >>> +extern struct page *memfd_alloc_page(struct file *memfd, pgoff_t idx); >>> #else >>> static inline long memfd_fcntl(struct file *f, unsigned int c, unsigned int a) >>> { >>> return -EINVAL; >>> } >>> +static inline struct page *memfd_alloc_page(struct file *memfd, pgoff_t >> idx) >>> +{ >>> + return ERR_PTR(-EINVAL); >>> +} >>> #endif >>> >>> #endif /* __LINUX_MEMFD_H */ >>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>> index 418d26608ece..ac69db45509f 100644 >>> --- a/include/linux/mm.h >>> +++ b/include/linux/mm.h >>> @@ -2472,6 +2472,8 @@ long get_user_pages_unlocked(unsigned long >> start, unsigned long nr_pages, >>> struct page **pages, unsigned int gup_flags); >>> long pin_user_pages_unlocked(unsigned long start, unsigned long >> nr_pages, >>> struct page **pages, unsigned int gup_flags); >>> +long memfd_pin_user_pages(struct file *file, pgoff_t start, >>> + unsigned long nr_pages, struct page **pages); >>> >>> int get_user_pages_fast(unsigned long start, int nr_pages, >>> unsigned int gup_flags, struct page **pages); >>> diff --git a/mm/gup.c b/mm/gup.c >>> index 231711efa390..eb93d1ec9dc6 100644 >>> --- a/mm/gup.c >>> +++ b/mm/gup.c >>> @@ -5,6 +5,7 @@ >>> #include >>> >>> #include >>> +#include >>> #include >>> #include >>> #include >>> @@ -17,6 +18,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include >>> #include >>> >>> @@ -3410,3 +3412,103 @@ long pin_user_pages_unlocked(unsigned long >> start, unsigned long nr_pages, >>> &locked, gup_flags); >>> } >>> EXPORT_SYMBOL(pin_user_pages_unlocked); >>> + >>> +/** >>> + * memfd_pin_user_pages() - pin user pages associated with a memfd >>> + * @memfd: the memfd whose pages are to be pinned >>> + * @start: starting memfd offset >>> + * @nr_pages: number of pages from start to pin >>> + * @pages: array that receives pointers to the pages pinned. >>> + * Should be at-least nr_pages long. >>> + * >>> + * Attempt to pin pages associated with a memfd; given that a memfd is >> either >>> + * backed by shmem or hugetlb, the pages can either be found in the page >> cache >>> + * or need to be allocated if necessary. Once the pages are located, they >> are >>> + * all pinned via FOLL_PIN. And, these pinned pages need to be released >> either >>> + * using unpin_user_pages() or unpin_user_page(). >>> + * >>> + * It must be noted that the pages may be pinned for an indefinite amount >>> + * of time. And, in most cases, the duration of time they may stay pinned >>> + * would be controlled by the userspace. This behavior is effectively the >>> + * same as using FOLL_LONGTERM with other GUP APIs. >>> + * >>> + * Returns number of pages pinned. This would be equal to the number of >>> + * pages requested. If no pages were pinned, it returns -errno. >>> + */ >>> +long memfd_pin_user_pages(struct file *memfd, pgoff_t start, >>> + unsigned long nr_pages, struct page **pages) >>> +{ >>> + pgoff_t start_idx, end_idx = start + nr_pages - 1; >>> + unsigned int flags, nr_folios, i, j; >>> + struct folio_batch fbatch; >>> + struct page *page = NULL; >>> + struct folio *folio; >>> + long ret; >>> + >>> + if (!nr_pages) >>> + return -EINVAL; >>> + >>> + if (!memfd) >>> + return -EINVAL; >>> + >>> + if (!shmem_file(memfd) && !is_file_hugepages(memfd)) >>> + return -EINVAL; >>> + >>> + flags = memalloc_pin_save(); >>> + do { >>> + folio_batch_init(&fbatch); >>> + start_idx = start; >>> + i = 0; >>> + >>> + while (start_idx <= end_idx) { >>> + /* >>> + * In most cases, we should be able to find the page >>> + * in the page cache. If we cannot find it for some >>> + * reason, we try to allocate one and add it to the >>> + * page cache. >>> + */ >>> + nr_folios = filemap_get_folios_contig(memfd- >>> f_mapping, >>> + &start_idx, >>> + end_idx, >>> + &fbatch); >>> + if (page) { >>> + put_page(page); >>> + page = NULL; >>> + } >>> + for (j = 0; j < nr_folios; j++) { >>> + folio = fbatch.folios[j]; >>> + ret = try_grab_page(&folio->page, FOLL_PIN); >>> + if (unlikely(ret)) { >>> + folio_batch_release(&fbatch); >>> + goto err; >>> + } >>> + >>> + pages[i++] = &folio->page; >>> + } >> >> I might be wrong, but that interface is still inconsistent. I think your >> intention is to always return folios (head pages), but why are we >> returning pages from this interface then? >> >> It would be more consistent regarding the other GUP interfaces to return >> the actual tail pages that fit the given "pgoff_t start". So if you >> punch in "nr_pages" you expect to get "nr_pages" pages, and not some >> other number of folios. >> >> Otherwise, this interface is highly confusing. >> >> If you always want to return folios, then better name it >> "memfd_pin_user_folios" (or just "memfd_pin_folios") and pass in a range >> (instead of a nr_pages parameter), and somehow indicate to the caller >> how many folio were in that range, and if that range was fully covered. > I think it makes sense to return folios from this interface; and considering my > use-case, I'd like have this API return an error if it cannot pin (or allocate) > the exact number of folios the caller requested. Okay, then better use folios. Assuming a caller puts in "start = X" and gets some large folio back. How is the caller supposed to know at which offset to look into that folio (IOW< which subpage)? For "pages" it was obvious (you get the actual subpages), but as soon as we return a large folio, some information is missing for the caller. How can the caller figure that out? > >> >> Or am I missing something? > I can make the udmabuf driver use folios instead of pages too but the function > check_and_migrate_movable_pages() in GUP still takes a list of pages. Do you > think it is ok to use a local variable to collect all the head pages for this? I think you can simply pass in the head page, because only whole folios can be converted. At some point we should convert that one to use folios as well. -- Cheers, David / dhildenb