From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 271C0E92FF7 for ; Fri, 6 Oct 2023 08:03:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 792206B0291; Fri, 6 Oct 2023 04:03:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 741B86B0292; Fri, 6 Oct 2023 04:03:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60D2A6B0294; Fri, 6 Oct 2023 04:03:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 518026B0291 for ; Fri, 6 Oct 2023 04:03:42 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 20189C03B5 for ; Fri, 6 Oct 2023 08:03:42 +0000 (UTC) X-FDA: 81314297484.29.D4C85D5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id C7C53C0010 for ; Fri, 6 Oct 2023 08:03:38 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eynmmSG5; spf=pass (imf28.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696579419; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XkP8fX/lwBAbWVwI0ovy1iMFjgsEo1LcLA94ICCi4H4=; b=peX4hZn3YKPTzIsLl/9rnJZHp3J/b/HyJkititFnzZj99Ux3/gR8H8auPLM5v4eoJrlNOT S2X67Z1GoeT+xsTzR3EFQczwKgL0CPcs4d5A2yCo/sng4GwucjGyEYtwh/upV1fW4H7pM+ 7KatYxIsLPsETPohPkSi7NYOcWLae3o= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696579419; a=rsa-sha256; cv=none; b=KQjjM4nOACRP3Kzk6pAqaNTP1jv9WxZdtPuXkBCElUEK6i5ZBT4mYU7JKgegbce51csYOV vk0QqZSjWab7bgduLjsmTft4hmNA/VHXphjA/iP1nknlEj1ZjD6Ke+L7R0PnQyR7teGvW5 C6cF+d6bzQThwzTF+3B8GISE01KDAIk= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eynmmSG5; spf=pass (imf28.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696579418; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XkP8fX/lwBAbWVwI0ovy1iMFjgsEo1LcLA94ICCi4H4=; b=eynmmSG5KS0aaaNSn8i+rWuv32aMHkcZHNXPJ/ZrGW4oXD2vjzJaVx1fOCJjvosBh7GSN2 RRWvn/1AD+RY+f+gDty3/RS3pt5YhsGjoygQ9p70BPlyN6G6IIRE0SIKvDISVFE3qCD9za 0X/eUYVHVDZygFQWYEOVO8iMO+yid7M= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-284-Y6prqyw4MISvKkWCzdO3Fw-1; Fri, 06 Oct 2023 04:03:36 -0400 X-MC-Unique: Y6prqyw4MISvKkWCzdO3Fw-1 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-3251bc06680so1380441f8f.2 for ; Fri, 06 Oct 2023 01:03:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696579415; x=1697184215; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=XkP8fX/lwBAbWVwI0ovy1iMFjgsEo1LcLA94ICCi4H4=; b=X+EIuEUD8aUbRJwNaYrDS5t/wRX7VRSuuDsU46Ou2LcgZp2BU7pddrCL3c4BtxVk6S bqHxMAnhf5mmWd3VvXfgV3R5q9SEqycUIzjqEXxqSNiXJn1GNHBbaNNuR4s5N16jhc94 KwkJ9guXUHNAdQlWjF6HUOGfKRw/R8GvTNDitiO7jgJ2FgZNOwODA6ngLBTiTpRjEqa7 0hII9j8CxIRT5vN+3XqZ2HzXL+WhdcryugULDqsDMB1CxVaZqr94p5R+qRPjOGzwo1x2 gMDcbY/svDaYwTEcQaMIGEWBx8TAoksUwaQm/bTAQOwl4CXYY6zpgoVxBjNG2beIvJI+ N/Zw== X-Gm-Message-State: AOJu0Yzx8zb9oUyQCQ3BYLKwp6Re0CAbrAUn7yWmrbcUySfipHxH0ahd JgO6Q4dDEPoUYUiS+ToA6vyqYX5XjSkXLrRdhuhWnmeMnl3QjRbdNoRE8GYwcnZt5XW7rSUyj02 nIg4daxDIv7I= X-Received: by 2002:a5d:500b:0:b0:31f:f753:5897 with SMTP id e11-20020a5d500b000000b0031ff7535897mr6304244wrt.59.1696579415228; Fri, 06 Oct 2023 01:03:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGl7whorgWW11yV3Rlk/hS9JfFYJ9DRAM3E8dIDZfFdA9kZ3iHGe9Su0hKNaNQuWg7Aj9/xsA== X-Received: by 2002:a5d:500b:0:b0:31f:f753:5897 with SMTP id e11-20020a5d500b000000b0031ff7535897mr6304211wrt.59.1696579414711; Fri, 06 Oct 2023 01:03:34 -0700 (PDT) Received: from ?IPV6:2003:cb:c715:ee00:4e24:cf8e:3de0:8819? (p200300cbc715ee004e24cf8e3de08819.dip0.t-ipconnect.de. [2003:cb:c715:ee00:4e24:cf8e:3de0:8819]) by smtp.gmail.com with ESMTPSA id c4-20020a5d4cc4000000b003247d3e5d99sm1025352wrt.55.2023.10.06.01.03.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 06 Oct 2023 01:03:34 -0700 (PDT) Message-ID: <4c272313-d2cd-fa29-3126-496636e14115@redhat.com> Date: Fri, 6 Oct 2023 10:03:33 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 To: Vivek Kasireddy , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, Mike Kravetz Cc: Daniel Vetter , Hugh Dickins , Peter Xu , Gerd Hoffmann , Dongwon Kim , Junxiao Chang , Jason Gunthorpe References: <20231003074447.3245729-1-vivek.kasireddy@intel.com> <20231003074447.3245729-2-vivek.kasireddy@intel.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 1/3] mm/gup: Introduce pin_user_pages_fd() for pinning shmem/hugetlbfs file pages In-Reply-To: <20231003074447.3245729-2-vivek.kasireddy@intel.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: qri98su7sofpkdp5d85zxkqutueu4x4k X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: C7C53C0010 X-Rspam-User: X-HE-Tag: 1696579418-820775 X-HE-Meta: U2FsdGVkX1/+Wl8CPCVl6X0VCM2wJQ5SoFw21D/w1FqxNcG3SIqzhvUSaoXF4Eepjy0Yh7fNBn4CrqBlbUhs5OciV6MN0xmPpFII4Z464EU7JdjcVoO9FLA8KCOaTfX4G8En4hv7FegxJpeGd88BetVWWFLR55yfbvFczbgqVdvnJ2WC04756ovvY4fRRJGDSZ//a3rKXizRdiAhyaJZ8+zGmkrwF7WRlhjqPO/S1rsLbu2vhI6JvL1ZKhPCQXbu7BhILswcWAbzSVADgqvCTTKHy7JQ3REYRalQXxRGNUdjGGCfTRGI86tpObJeznJDi+bam7H2wJrCYbJJynbGsvnDRKwZ+uYnQMots5B7CkqkCbG3OqWqRUZnCZUQPEchzatLE4pdaqaGotr5IMxMNExBGuq55JAAq+Of94uYF2ocRbmP2MXZ5Zbw27IU62NrGtfxntZxw8TBC9jrIh+v156rClrAKneE5MKhg53mx/kD57qYJ2ZyvwZuSL8kE0cvScFUbAbnrGCrlhWX22kIQIa/BW+uywjhvtxJLpFkTzm7Yv+vDfbA9/ilkt+PrphxrRlE+m2mPRlDLKXaReogSFT/y/gPeBCHRYbntOhqSGLspYFNJiKvOQq9ApudN3j0za+ZxcUKHOI9RO80fhfOoMizvlBDWG4h7MHe4LDHDb3+TOEEnaohLF54G48QGSF5v/ZrhEhNRWHdhdpMpv512lUuuydRhVqji9C0cHCw87upiqivYALLvaW2/KqDG3wAofxzbz6P3nfmnq8XucFMN9iS7IFImv+Pn8NC0BGPIydkXitsisrWiqN/3m+2qTLZ1p6cyNWgqJhzGWlVu0/ics9OqzWuVcnmz4G1w+Fs8bTyn99J1lbQObOjeuG3jtXjuW4bC4PjVYQ1hCqCqXUhgwZ4wZDGJYkaakFW0WS8hb/yHZdA+SlIOuXjcJFxSiwOfpeRLJgT/4gAuW4us8l vAb3S9n4 gP1yIHwIUCz3E+tmkrGKfodo/R+B0hkT685bx9z3CwoMGd/nUmy08C2USZiFAS28icWMdwGUxI+lm0W8x93XzdPTa4gvJ6n2jNy5Thq5+f1bijXeJYWyb3M03Mu/deRJDx6Lg9ca2MXGR1uJ/DG/LGKjDg+rqr0j2rNsLDNROUrFvYWNP7doNQibHUp+p6ZizBrdrL5TPGVots792WnKOuYg1GlEyzlsQxtz1h5pLYUbJpna0feiBTxAQkmnTvGhYczs7RBXisDS8ACQlQSgS+UWaQEnLn+9fxiV2GjFk1q80KG0MvG0s0UR32Cz1D6qb4KHx2P8blqNZvJ5haGdfl0bHIOpqj2ddlr9si3xSrzBAag7BkYyohBYFFICn9sZetmuySHdDyn8h4TKmZB6CmdRjRLU234+8cA67b5OJZkQxu/YtvHqwF2U2PYPvgoMfc1qnb5aiIgyZtl7uAj5D2dCRZTzeJbw7X4koeSibEMNaFwyKrXrnd1lFtkBWjOnQc3hNZnmkFZq3Xl2enfaiDvAINSmWtw9zMB6jUNmmPthgG+8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 03.10.23 09:44, Vivek Kasireddy wrote: > For drivers that would like to longterm-pin the pages associated > with a file, the pin_user_pages_fd() API provides an option to > not only FOLL_PIN the pages but also to check and migrate them > if they reside in movable zone or CMA block. For now, this API > can only work with files belonging to shmem or hugetlbfs given > that the udmabuf driver is the only user. Maybe add "Other files are rejected.". Wasn't clear to me before I looked into the code. > > It must be noted that the pages associated with hugetlbfs files > are expected to be found in the page cache. An error is returned > if they are not found. However, shmem pages can be swapped in or > allocated if they are not present in the page cache. > > Cc: David Hildenbrand > Cc: Daniel Vetter > Cc: Mike Kravetz > Cc: Hugh Dickins > Cc: Peter Xu > Cc: Gerd Hoffmann > Cc: Dongwon Kim > Cc: Junxiao Chang > Suggested-by: Jason Gunthorpe > Signed-off-by: Vivek Kasireddy > --- > include/linux/mm.h | 2 ++ > mm/gup.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 89 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index bf5d0b1b16f4..af2121fb8101 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2457,6 +2457,8 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, > struct page **pages, unsigned int gup_flags); > long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, > struct page **pages, unsigned int gup_flags); > +long pin_user_pages_fd(int fd, pgoff_t start, unsigned long nr_pages, > + unsigned int gup_flags, struct page **pages); > > int get_user_pages_fast(unsigned long start, int nr_pages, > unsigned int gup_flags, struct page **pages); > diff --git a/mm/gup.c b/mm/gup.c > index 2f8a2d89fde1..e34b77a15fa8 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -3400,3 +3400,90 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, > &locked, gup_flags); > } > EXPORT_SYMBOL(pin_user_pages_unlocked); > + This does look quite neat, nice! Let's take a closer look ... > +/** > + * pin_user_pages_fd() - pin user pages associated with a file > + * @fd: the fd whose pages are to be pinned > + * @start: starting file offset > + * @nr_pages: number of pages from start to pin > + * @gup_flags: flags modifying pin behaviour ^ I assume we should drop that. At least for now the flags are completely unused. And most likely we would want a different set of flags later (GUPFD_ ...). > + * @pages: array that receives pointers to the pages pinned. > + * Should be at least nr_pages long. > + * > + * Attempt to pin (and migrate) pages associated with a file belonging to I'd drop the "and migrate" part, it's more of an implementation detail. > + * either shmem or hugetlbfs. An error is returned if pages associated with > + * hugetlbfs files are not present in the page cache. However, shmem pages > + * are swapped in or allocated if they are not present in the page cache. Why don't we do the same for hugetlbfs? Would make the interface more streamlined. Certainly add that pinned pages have to be released using unpin_user_pages(). > + * > + * Returns number of pages pinned. This would be equal to the number of > + * pages requested. > + * If nr_pages is 0 or negative, returns 0. If no pages were pinned, returns > + * -errno. > + */ > +long pin_user_pages_fd(int fd, pgoff_t start, unsigned long nr_pages, > + unsigned int gup_flags, struct page **pages) > +{ > + struct page *page; > + struct file *filep; > + unsigned int flags, i; > + long ret; > + > + if (nr_pages <= 0) > + return 0; I think we should just forbid that and use a WARN_ON_ONCE() here / return -EINVAL. So we'll never end up returning 0. > + if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN)) > + return 0; > + > + if (start < 0) > + return -EINVAL; > + > + filep = fget(fd); > + if (!filep) > + return -EINVAL; > + > + if (!shmem_file(filep) && !is_file_hugepages(filep)) > + return -EINVAL; > + > + flags = memalloc_pin_save(); > + do { > + for (i = 0; i < nr_pages; i++) { > + if (shmem_mapping(filep->f_mapping)) { > + page = shmem_read_mapping_page(filep->f_mapping, > + start + i); > + if (IS_ERR(page)) { > + ret = PTR_ERR(page); > + goto err; > + } > + } else { > + page = find_get_page_flags(filep->f_mapping, > + start + i, > + FGP_ACCESSED); > + if (!page) { > + ret = -EINVAL; > + goto err; > + } > + } > + ret = try_grab_page(page, FOLL_PIN); > + if (unlikely(ret)) > + goto err; > + > + pages[i] = page; > + put_page(pages[i]); > + } > + > + ret = check_and_migrate_movable_pages(nr_pages, pages); > + } while (ret == -EAGAIN); > + > +err: > + memalloc_pin_restore(flags); > + fput(filep); > + if (!ret) > + return nr_pages; > + > + while (i > 0 && pages[--i]) { > + unpin_user_page(pages[i]); > + pages[i] = NULL; If migrate_longterm_unpinnable_pages() failed, say with -ENOMEM, the pages were already unpinned, but pages[i] has not been cleared, no? > + } > + return ret; > +} > +EXPORT_SYMBOL_GPL(pin_user_pages_fd); > + -- Cheers, David / dhildenb