* [PATCH 0/1] Fix zero copy I/O on __get_user_pages allocated pages [not found] <CGME20250507154119uscas1p2a4055d14ab111fdb94a6378789c38d9d@uscas1p2.samsung.com> @ 2025-05-07 15:41 ` Pantelis Antoniou [not found] ` <CGME20250507154119uscas1p17799fe7589e4f1bd53d2d3dc7f44cb8c@uscas1p1.samsung.com> ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Pantelis Antoniou @ 2025-05-07 15:41 UTC (permalink / raw) To: Andrew Morton, linux-mm Cc: linux-kernel, Artem Krupotkin, Charles Briere, Wade Farnsworth Updates to network filesystems enabled zero copy I/O by using the netfslib common accessors. One example of that is the 9p filesystem which is commonly used in qemu based setups for sharing files with the host. In our emulation environment we have noticed failing writes when performing I/O from a userspace mapped DRM GEM buffer object. The platform does not use VRAM, all graphics memory is regular DRAM memory, allocated via __get_free_pages The same write was successful from a heap allocated bounce buffer. The sequence of events is as follows. 1. A BO (Buffer Object) is created, and it's backing memory is allocated via __get_user_pages() 2. Userspace mmaps a BO (Buffer Object) via a mmap call on the opened file handle of a DRM driver. The mapping is done via the drm_gem_mmap_obj() call. 3. Userspace issues a write to a file copying the contents of the BO. 3a. If the file is located on regular filesystem (like ext4), the write completes successfully. 3b. If the file is located on a network filesystem, like 9p the write fails. The write fails because v9fs_file_write_iter() will call netfs_unbuffered_write_iter(), netfs_unbuffered_write_iter_locked() which will call netfs_extract_user_iter() netfs_extract_user_iter() will in turn call iov_iter_extract_pages() which for a user backed iterator will call iov_iter_extract_user_pages which will call pin_user_pages_fast() which finally will call __gup_longterm_locked(). __gup_longterm_locked() will call __get_user_pages_locked() which will fail because the VMA is marked with the VM_IO and VM_PFNMAP flags. Pantelis Antoniou (1): Fix zero copy I/O on __get_user_pages allocated pages mm/gup.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CGME20250507154119uscas1p17799fe7589e4f1bd53d2d3dc7f44cb8c@uscas1p1.samsung.com>]
* [PATCH 1/1] Fix zero copy I/O on __get_user_pages allocated pages [not found] ` <CGME20250507154119uscas1p17799fe7589e4f1bd53d2d3dc7f44cb8c@uscas1p1.samsung.com> @ 2025-05-07 15:41 ` Pantelis Antoniou 2025-05-08 15:03 ` David Hildenbrand 0 siblings, 1 reply; 9+ messages in thread From: Pantelis Antoniou @ 2025-05-07 15:41 UTC (permalink / raw) To: Andrew Morton, linux-mm Cc: linux-kernel, Artem Krupotkin, Charles Briere, Wade Farnsworth Recent updates to net filesystems enabled zero copy operations, which require getting a user space page pinned. This does not work for pages that were allocated via __get_user_pages and then mapped to user-space via remap_pfn_rage. remap_pfn_range_internal() will turn on VM_IO | VM_PFNMAP vma bits. VM_PFNMAP in particular mark the pages as not having struct_page associated with them, which is not the case for __get_user_pages() This in turn makes any attempt to lock a page fail, and breaking I/O from that address range. This patch address it by special casing pages in those VMAs and not calling vm_normal_page() for them. Signed-off-by: Pantelis Antoniou <p.antoniou@partner.samsung.com> --- mm/gup.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 84461d384ae2..e185c18c0c81 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -833,6 +833,20 @@ static inline bool can_follow_write_pte(pte_t pte, struct page *page, return !userfaultfd_pte_wp(vma, pte); } +static struct page *gup_normal_page(struct vm_area_struct *vma, + unsigned long address, pte_t pte) +{ + unsigned long pfn; + + if (vma->vm_flags & (VM_MIXEDMAP | VM_PFNMAP)) { + pfn = pte_pfn(pte); + if (!pfn_valid(pfn) || is_zero_pfn(pfn) || pfn > highest_memmap_pfn) + return NULL; + return pfn_to_page(pfn); + } + return vm_normal_page(vma, address, pte); +} + static struct page *follow_page_pte(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, unsigned int flags, struct dev_pagemap **pgmap) @@ -858,7 +872,9 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, if (pte_protnone(pte) && !gup_can_follow_protnone(vma, flags)) goto no_page; - page = vm_normal_page(vma, address, pte); + page = gup_normal_page(vma, address, pte); + if (page && (vma->vm_flags & (VM_MIXEDMAP | VM_PFNMAP))) + (void)follow_pfn_pte(vma, address, ptep, flags); /* * We only care about anon pages in can_follow_write_pte() and don't @@ -1130,7 +1146,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, *vma = get_gate_vma(mm); if (!page) goto out; - *page = vm_normal_page(*vma, address, entry); + *page = gup_normal_page(*vma, address, entry); if (!*page) { if ((gup_flags & FOLL_DUMP) || !is_zero_pfn(pte_pfn(entry))) goto unmap; @@ -1271,8 +1287,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) int foreign = (gup_flags & FOLL_REMOTE); bool vma_anon = vma_is_anonymous(vma); - if (vm_flags & (VM_IO | VM_PFNMAP)) - return -EFAULT; if ((gup_flags & FOLL_ANON) && !vma_anon) return -EFAULT; -- 2.25.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] Fix zero copy I/O on __get_user_pages allocated pages 2025-05-07 15:41 ` [PATCH 1/1] " Pantelis Antoniou @ 2025-05-08 15:03 ` David Hildenbrand 2025-05-08 15:23 ` Pantelis Antoniou 0 siblings, 1 reply; 9+ messages in thread From: David Hildenbrand @ 2025-05-08 15:03 UTC (permalink / raw) To: Pantelis Antoniou, Andrew Morton, linux-mm Cc: linux-kernel, Artem Krupotkin, Charles Briere, Wade Farnsworth, Peter Xu On 07.05.25 17:41, Pantelis Antoniou wrote: Hi, > Recent updates to net filesystems enabled zero copy operations, > which require getting a user space page pinned. > > This does not work for pages that were allocated via __get_user_pages > and then mapped to user-space via remap_pfn_rage. Right. Because the struct page of a VM_PFNMAP *must not be touched*. It has to be treated like it doesn't exist. > > remap_pfn_range_internal() will turn on VM_IO | VM_PFNMAP vma bits. > VM_PFNMAP in particular mark the pages as not having struct_page > associated with them, which is not the case for __get_user_pages() > > This in turn makes any attempt to lock a page fail, and breaking > I/O from that address range. > > This patch address it by special casing pages in those VMAs and not > calling vm_normal_page() for them. > > Signed-off-by: Pantelis Antoniou <p.antoniou@partner.samsung.com> > --- > mm/gup.c | 22 ++++++++++++++++++---- > 1 file changed, 18 insertions(+), 4 deletions(-) > > diff --git a/mm/gup.c b/mm/gup.c > index 84461d384ae2..e185c18c0c81 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -833,6 +833,20 @@ static inline bool can_follow_write_pte(pte_t pte, struct page *page, > return !userfaultfd_pte_wp(vma, pte); > } > > +static struct page *gup_normal_page(struct vm_area_struct *vma, > + unsigned long address, pte_t pte) > +{ > + unsigned long pfn; > + > + if (vma->vm_flags & (VM_MIXEDMAP | VM_PFNMAP)) { > + pfn = pte_pfn(pte); > + if (!pfn_valid(pfn) || is_zero_pfn(pfn) || pfn > highest_memmap_pfn) > + return NULL; > + return pfn_to_page(pfn); > + } > + return vm_normal_page(vma, address, pte); I enjoy seeing vm_normal_page() checks in GUP code. I don't enjoy seeing what you added before that :) If vm_normal_page() tells you "this is not a normal", then we should not touch it. There is one exception: the shared zeropage. So, unfortunately, this is wrong. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] Fix zero copy I/O on __get_user_pages allocated pages 2025-05-08 15:03 ` David Hildenbrand @ 2025-05-08 15:23 ` Pantelis Antoniou 2025-05-08 15:37 ` David Hildenbrand 0 siblings, 1 reply; 9+ messages in thread From: Pantelis Antoniou @ 2025-05-08 15:23 UTC (permalink / raw) To: David Hildenbrand Cc: Andrew Morton, linux-mm, linux-kernel, Artem Krupotkin, Charles Briere, Wade Farnsworth, Peter Xu On Thu, 8 May 2025 17:03:46 +0200 David Hildenbrand <david@redhat.com> wrote: Hi there, > On 07. 05. 25 17: 41, Pantelis Antoniou wrote: Hi, > Recent updates > to net filesystems enabled zero copy operations, > which require > getting a user space page pinned. > > This does not work for pages > that were allocated via __get_user_pages > On 07.05.25 17:41, Pantelis Antoniou wrote: > > Hi, > > > Recent updates to net filesystems enabled zero copy operations, > > which require getting a user space page pinned. > > > > This does not work for pages that were allocated via > > __get_user_pages and then mapped to user-space via remap_pfn_rage. > > Right. Because the struct page of a VM_PFNMAP *must not be touched*. > It has to be treated like it doesn't exist. > Well, that's not exactly the case. For pages mapped to user space via remap_pfn_range() the VM_PFNMAP bit is set even though the pages do have a struct page. The details of how it happens are at the cover page of this patch but let me paste the relevant bits here. "In our emulation environment we have noticed failing writes when performing I/O from a userspace mapped DRM GEM buffer object. The platform does not use VRAM, all graphics memory is regular DRAM memory, allocated via __get_free_pages The same write was successful from a heap allocated bounce buffer. The sequence of events is as follows. 1. A BO (Buffer Object) is created, and it's backing memory is allocated via __get_user_pages() 2. Userspace mmaps a BO (Buffer Object) via a mmap call on the opened file handle of a DRM driver. The mapping is done via the drm_gem_mmap_obj() call. 3. Userspace issues a write to a file copying the contents of the BO. 3a. If the file is located on regular filesystem (like ext4), the write completes successfully. 3b. If the file is located on a network filesystem, like 9p the write fails. The write fails because v9fs_file_write_iter() will call netfs_unbuffered_write_iter(), netfs_unbuffered_write_iter_locked() which will call netfs_extract_user_iter() netfs_extract_user_iter() will in turn call iov_iter_extract_pages() which for a user backed iterator will call iov_iter_extract_user_pages which will call pin_user_pages_fast() which finally will call __gup_longterm_locked(). __gup_longterm_locked() will call __get_user_pages_locked() which will fail because the VMA is marked with the VM_IO and VM_PFNMAP flags." > > > > remap_pfn_range_internal() will turn on VM_IO | VM_PFNMAP vma bits. > > VM_PFNMAP in particular mark the pages as not having struct_page > > associated with them, which is not the case for __get_user_pages() > > > > This in turn makes any attempt to lock a page fail, and breaking > > I/O from that address range. > > > > This patch address it by special casing pages in those VMAs and not > > calling vm_normal_page() for them. > > > > Signed-off-by: Pantelis Antoniou <p.antoniou@partner.samsung.com> > > --- > > mm/gup.c | 22 ++++++++++++++++++---- > > 1 file changed, 18 insertions(+), 4 deletions(-) > > > > diff --git a/mm/gup.c b/mm/gup.c > > index 84461d384ae2..e185c18c0c81 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -833,6 +833,20 @@ static inline bool can_follow_write_pte(pte_t > > pte, struct page *page, return !userfaultfd_pte_wp(vma, pte); > > } > > > > +static struct page *gup_normal_page(struct vm_area_struct *vma, > > + unsigned long address, pte_t pte) > > +{ > > + unsigned long pfn; > > + > > + if (vma->vm_flags & (VM_MIXEDMAP | VM_PFNMAP)) { > > + pfn = pte_pfn(pte); > > + if (!pfn_valid(pfn) || is_zero_pfn(pfn) || pfn > > > highest_memmap_pfn) > > + return NULL; > > + return pfn_to_page(pfn); > > + } > > + return vm_normal_page(vma, address, pte); > > I enjoy seeing vm_normal_page() checks in GUP code. > > I don't enjoy seeing what you added before that :) > > If vm_normal_page() tells you "this is not a normal", then we should > not touch it. There is one exception: the shared zeropage. > > > So, unfortunately, this is wrong. > Well, lets talk about a proper fix then for the previously mentioned user-space regression. Regards -- Pantelis ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] Fix zero copy I/O on __get_user_pages allocated pages 2025-05-08 15:23 ` Pantelis Antoniou @ 2025-05-08 15:37 ` David Hildenbrand 2025-05-08 15:47 ` Pantelis Antoniou 0 siblings, 1 reply; 9+ messages in thread From: David Hildenbrand @ 2025-05-08 15:37 UTC (permalink / raw) To: Pantelis Antoniou Cc: Andrew Morton, linux-mm, linux-kernel, Artem Krupotkin, Charles Briere, Wade Farnsworth, Peter Xu On 08.05.25 17:23, Pantelis Antoniou wrote: > On Thu, 8 May 2025 17:03:46 +0200 > David Hildenbrand <david@redhat.com> wrote: > > Hi there, > >> On 07. 05. 25 17: 41, Pantelis Antoniou wrote: Hi, > Recent updates >> to net filesystems enabled zero copy operations, > which require >> getting a user space page pinned. > > This does not work for pages >> that were allocated via __get_user_pages >> On 07.05.25 17:41, Pantelis Antoniou wrote: >> >> Hi, >> >>> Recent updates to net filesystems enabled zero copy operations, >>> which require getting a user space page pinned. >>> >>> This does not work for pages that were allocated via >>> __get_user_pages and then mapped to user-space via remap_pfn_rage. >> >> Right. Because the struct page of a VM_PFNMAP *must not be touched*. >> It has to be treated like it doesn't exist. >> > > Well, that's not exactly the case. For pages mapped to user space via > remap_pfn_range() the VM_PFNMAP bit is set even though the pages do > have a struct page. Yes. And VM_PFNMAP is the big flag that these pages shall not be touched. Even if it exists. Even if you say please. :) See the comment above vm_normal_page(): "Special" mappings do not wish to be associated with a "struct page" (either it doesn't exist, or it exists but they don't want to touch it) VM_MIXEDMAP could maybe be used for that purpose: possibly GUP also has to be updated to make use of that. (I was hoping we can get rid of VM_MIXEDMAP at some point) > > The details of how it happens are at the cover page of this patch but > let me paste the relevant bits here. > > "In our emulation environment we have noticed failing writes when > performing I/O from a userspace mapped DRM GEM buffer object. > The platform does not use VRAM, all graphics memory is regular DRAM > memory, allocated via __get_free_pages > > The same write was successful from a heap allocated bounce buffer. > > The sequence of events is as follows. > > 1. A BO (Buffer Object) is created, and it's backing memory is allocated via > __get_user_pages() __get_user_pages() only allocates memory via page faults. Are you sure you meant __get_user_pages() here? > > 2. Userspace mmaps a BO (Buffer Object) via a mmap call on the opened > file handle of a DRM driver. The mapping is done via the > drm_gem_mmap_obj() call. > > 3. Userspace issues a write to a file copying the contents of the BO. > > 3a. If the file is located on regular filesystem (like ext4), the write > completes successfully. > > 3b. If the file is located on a network filesystem, like 9p the write fails. > > The write fails because v9fs_file_write_iter() will call > netfs_unbuffered_write_iter(), netfs_unbuffered_write_iter_locked() which will > call netfs_extract_user_iter() > > netfs_extract_user_iter() will in turn call iov_iter_extract_pages() which for > a user backed iterator will call iov_iter_extract_user_pages which will call > pin_user_pages_fast() which finally will call __gup_longterm_locked(). > > __gup_longterm_locked() will call __get_user_pages_locked() which will fail > because the VMA is marked with the VM_IO and VM_PFNMAP flags." So, drm_gem_mmap_obj() ends up using remap_pfn_rage()? I can spot that drm_gem_mmap_obj() has a path where it explicitly sets vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); Which is a clear sign to core-MM (incl. GUP) to never mess with the mapped pages. > >>> >>> remap_pfn_range_internal() will turn on VM_IO | VM_PFNMAP vma bits. >>> VM_PFNMAP in particular mark the pages as not having struct_page >>> associated with them, which is not the case for __get_user_pages() >>> >>> This in turn makes any attempt to lock a page fail, and breaking >>> I/O from that address range. >>> >>> This patch address it by special casing pages in those VMAs and not >>> calling vm_normal_page() for them. >>> >>> Signed-off-by: Pantelis Antoniou <p.antoniou@partner.samsung.com> >>> --- >>> mm/gup.c | 22 ++++++++++++++++++---- >>> 1 file changed, 18 insertions(+), 4 deletions(-) >>> >>> diff --git a/mm/gup.c b/mm/gup.c >>> index 84461d384ae2..e185c18c0c81 100644 >>> --- a/mm/gup.c >>> +++ b/mm/gup.c >>> @@ -833,6 +833,20 @@ static inline bool can_follow_write_pte(pte_t >>> pte, struct page *page, return !userfaultfd_pte_wp(vma, pte); >>> } >>> >>> +static struct page *gup_normal_page(struct vm_area_struct *vma, >>> + unsigned long address, pte_t pte) >>> +{ >>> + unsigned long pfn; >>> + >>> + if (vma->vm_flags & (VM_MIXEDMAP | VM_PFNMAP)) { >>> + pfn = pte_pfn(pte); >>> + if (!pfn_valid(pfn) || is_zero_pfn(pfn) || pfn > >>> highest_memmap_pfn) >>> + return NULL; >>> + return pfn_to_page(pfn); >>> + } >>> + return vm_normal_page(vma, address, pte); >> >> I enjoy seeing vm_normal_page() checks in GUP code. >> >> I don't enjoy seeing what you added before that :) >> >> If vm_normal_page() tells you "this is not a normal", then we should >> not touch it. There is one exception: the shared zeropage. >> >> >> So, unfortunately, this is wrong. >> > > Well, lets talk about a proper fix then for the previously mentioned > user-space regression. You really have to find out the responsible commit. GUP has been behaving like that forever I'm afraid. And even the VM_PFNMAP was in drm_gem_mmap_obj() already at least in 2012 if I am not wrong. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] Fix zero copy I/O on __get_user_pages allocated pages 2025-05-08 15:37 ` David Hildenbrand @ 2025-05-08 15:47 ` Pantelis Antoniou 0 siblings, 0 replies; 9+ messages in thread From: Pantelis Antoniou @ 2025-05-08 15:47 UTC (permalink / raw) To: David Hildenbrand Cc: Andrew Morton, linux-mm, linux-kernel, Artem Krupotkin, Charles Briere, Wade Farnsworth, Peter Xu On Thu, 8 May 2025 17:37:04 +0200 David Hildenbrand <david@redhat.com> wrote: > On 08. 05. 25 17: 23, Pantelis Antoniou wrote: > On Thu, 8 May 2025 > 17: 03: 46 +0200 > David Hildenbrand <david@ redhat. com> wrote: > > > Hi there, > >> On 07. 05. 25 17: 41, Pantelis Antoniou wrote: Hi, > > Recent updates > On 08.05.25 17:23, Pantelis Antoniou wrote: > > On Thu, 8 May 2025 17:03:46 +0200 > > David Hildenbrand <david@redhat.com> wrote: > > > > Hi there, > > > >> On 07. 05. 25 17: 41, Pantelis Antoniou wrote: Hi, > Recent updates > >> to net filesystems enabled zero copy operations, > which require > >> getting a user space page pinned. > > This does not work for pages > >> that were allocated via __get_user_pages > >> On 07.05.25 17:41, Pantelis Antoniou wrote: > >> > >> Hi, > >> > >>> Recent updates to net filesystems enabled zero copy operations, > >>> which require getting a user space page pinned. > >>> > >>> This does not work for pages that were allocated via > >>> __get_user_pages and then mapped to user-space via remap_pfn_rage. > >> > >> Right. Because the struct page of a VM_PFNMAP *must not be > >> touched*. It has to be treated like it doesn't exist. > >> > > > > Well, that's not exactly the case. For pages mapped to user space > > via remap_pfn_range() the VM_PFNMAP bit is set even though the > > pages do have a struct page. > > Yes. And VM_PFNMAP is the big flag that these pages shall not be > touched. Even if it exists. Even if you say please. :) > > See the comment above vm_normal_page(): > > "Special" mappings do not wish to be associated with a "struct > page" (either it doesn't exist, or it exists but they don't > want to touch it) > > VM_MIXEDMAP could maybe be used for that purpose: possibly GUP also > has to be updated to make use of that. (I was hoping we can get rid > of VM_MIXEDMAP at some point) > > > > > > The details of how it happens are at the cover page of this patch > > but let me paste the relevant bits here. > > > > "In our emulation environment we have noticed failing writes when > > performing I/O from a userspace mapped DRM GEM buffer object. > > The platform does not use VRAM, all graphics memory is regular DRAM > > memory, allocated via __get_free_pages > > > > The same write was successful from a heap allocated bounce buffer. > > > > The sequence of events is as follows. > > > > 1. A BO (Buffer Object) is created, and it's backing memory is > > allocated via __get_user_pages() > > __get_user_pages() only allocates memory via page faults. Are you > sure you meant __get_user_pages() here? > Oops, yeah, __get_free_pages(). Apologies for the confusion. > > > > 2. Userspace mmaps a BO (Buffer Object) via a mmap call on the > > opened file handle of a DRM driver. The mapping is done via the > > drm_gem_mmap_obj() call. > > > > 3. Userspace issues a write to a file copying the contents of the > > BO. > > > > 3a. If the file is located on regular filesystem (like ext4), the > > write completes successfully. > > > > 3b. If the file is located on a network filesystem, like 9p the > > write fails. > > > > The write fails because v9fs_file_write_iter() will call > > netfs_unbuffered_write_iter(), netfs_unbuffered_write_iter_locked() > > which will call netfs_extract_user_iter() > > > > netfs_extract_user_iter() will in turn call > > iov_iter_extract_pages() which for a user backed iterator will call > > iov_iter_extract_user_pages which will call pin_user_pages_fast() > > which finally will call __gup_longterm_locked(). > > > > __gup_longterm_locked() will call __get_user_pages_locked() which > > will fail because the VMA is marked with the VM_IO and VM_PFNMAP > > flags." > > So, drm_gem_mmap_obj() ends up using remap_pfn_rage()? > Yes. > I can spot that drm_gem_mmap_obj() has a path where it explicitly sets > > vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | > VM_DONTDUMP); > > Which is a clear sign to core-MM (incl. GUP) to never mess with the > mapped pages. > Well, let just say this not quite right for pages that are normally allocated via __get_free_pages(). DRM has to handle both VRAM and regular system memory maps so maybe it's playing it safe here. > > > >>> > >>> remap_pfn_range_internal() will turn on VM_IO | VM_PFNMAP vma > >>> bits. VM_PFNMAP in particular mark the pages as not having > >>> struct_page associated with them, which is not the case for > >>> __get_user_pages() > >>> > >>> This in turn makes any attempt to lock a page fail, and breaking > >>> I/O from that address range. > >>> > >>> This patch address it by special casing pages in those VMAs and > >>> not calling vm_normal_page() for them. > >>> > >>> Signed-off-by: Pantelis Antoniou <p.antoniou@partner.samsung.com> > >>> --- > >>> mm/gup.c | 22 ++++++++++++++++++---- > >>> 1 file changed, 18 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/mm/gup.c b/mm/gup.c > >>> index 84461d384ae2..e185c18c0c81 100644 > >>> --- a/mm/gup.c > >>> +++ b/mm/gup.c > >>> @@ -833,6 +833,20 @@ static inline bool can_follow_write_pte(pte_t > >>> pte, struct page *page, return !userfaultfd_pte_wp(vma, pte); > >>> } > >>> > >>> +static struct page *gup_normal_page(struct vm_area_struct *vma, > >>> + unsigned long address, pte_t pte) > >>> +{ > >>> + unsigned long pfn; > >>> + > >>> + if (vma->vm_flags & (VM_MIXEDMAP | VM_PFNMAP)) { > >>> + pfn = pte_pfn(pte); > >>> + if (!pfn_valid(pfn) || is_zero_pfn(pfn) || pfn > > >>> highest_memmap_pfn) > >>> + return NULL; > >>> + return pfn_to_page(pfn); > >>> + } > >>> + return vm_normal_page(vma, address, pte); > >> > >> I enjoy seeing vm_normal_page() checks in GUP code. > >> > >> I don't enjoy seeing what you added before that :) > >> > >> If vm_normal_page() tells you "this is not a normal", then we > >> should not touch it. There is one exception: the shared zeropage. > >> > >> > >> So, unfortunately, this is wrong. > >> > > > > Well, lets talk about a proper fix then for the previously mentioned > > user-space regression. > > You really have to find out the responsible commit. GUP has been > behaving like that forever I'm afraid. > > And even the VM_PFNMAP was in drm_gem_mmap_obj() already at least in > 2012 if I am not wrong. > There is no single responsible commit, it was broken forever. It is just that no-one has ever tried to pin the pages to perform I/O before now. Regards -- Pantelis ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/1] Fix zero copy I/O on __get_user_pages allocated pages 2025-05-07 15:41 ` [PATCH 0/1] Fix zero copy I/O on __get_user_pages allocated pages Pantelis Antoniou [not found] ` <CGME20250507154119uscas1p17799fe7589e4f1bd53d2d3dc7f44cb8c@uscas1p1.samsung.com> @ 2025-05-07 21:50 ` Andrew Morton 2025-05-08 8:24 ` Pantelis Antoniou 2025-05-07 21:54 ` Andrew Morton 2 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2025-05-07 21:50 UTC (permalink / raw) To: Pantelis Antoniou Cc: linux-mm, linux-kernel, Artem Krupotkin, Charles Briere, Wade Farnsworth On Wed, 7 May 2025 10:41:04 -0500 Pantelis Antoniou <p.antoniou@partner.samsung.com> wrote: > Updates to network filesystems enabled zero copy I/O by using the > netfslib common accessors. Updates by whom? Are all the people who need to know about this being cc'ed here? > One example of that is the 9p filesystem which is commonly used in qemu > based setups for sharing files with the host. > > In our emulation environment we have noticed failing writes when performing > I/O from a userspace mapped DRM GEM buffer object. > The platform does not use VRAM, all graphics memory is regular DRAM memory, > allocated via __get_free_pages We should identify which kernel version(s) should be patched, please. 6.16-rc1? 6.15? -stable? I often make these decisions but in this case I have far too little information to be able to do that. Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/1] Fix zero copy I/O on __get_user_pages allocated pages 2025-05-07 21:50 ` [PATCH 0/1] " Andrew Morton @ 2025-05-08 8:24 ` Pantelis Antoniou 0 siblings, 0 replies; 9+ messages in thread From: Pantelis Antoniou @ 2025-05-08 8:24 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Artem Krupotkin, Charles Briere, Wade Farnsworth On Wed, 7 May 2025 14:50:18 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 7 May 2025 10: 41: 04 -0500 Pantelis Antoniou <p. antoniou@ > partner. samsung. com> wrote: > Updates to network filesystems > enabled zero copy I/O by using the > netfslib common accessors. > Updates by whom? Are all the people who > On Wed, 7 May 2025 10:41:04 -0500 Pantelis Antoniou > <p.antoniou@partner.samsung.com> wrote: > > > Updates to network filesystems enabled zero copy I/O by using the > > netfslib common accessors. > > Updates by whom? Are all the people who need to know about this being > cc'ed here? > I think the first cover letter contains that information. > > One example of that is the 9p filesystem which is commonly used in > > qemu based setups for sharing files with the host. > > > > In our emulation environment we have noticed failing writes when > > performing I/O from a userspace mapped DRM GEM buffer object. > > The platform does not use VRAM, all graphics memory is regular DRAM > > memory, allocated via __get_free_pages > > We should identify which kernel version(s) should be patched, please. > 6.16-rc1? 6.15? -stable? > The first occurance of the bug was on internal kernel tree that was based on 6.8. This patch is against 6.15-rc5. > I often make these decisions but in this case I have far too little > information to be able to do that. > No worries. I see that this is picked up for mm unstable as is? Do you want me to generate a single patch merging the info of the cover letter and the single patch? The reason for the split is that I was not sure if you needed to have all the sordid details included in the applied patch. FWIW, we also have a buildroot patch that exhibits the problem in a much simplified way that what the original bug report came about. I don't think its appropriate content for the list, but I can share if anyone is curious about it. > Thanks. Regards -- Pantelis ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/1] Fix zero copy I/O on __get_user_pages allocated pages 2025-05-07 15:41 ` [PATCH 0/1] Fix zero copy I/O on __get_user_pages allocated pages Pantelis Antoniou [not found] ` <CGME20250507154119uscas1p17799fe7589e4f1bd53d2d3dc7f44cb8c@uscas1p1.samsung.com> 2025-05-07 21:50 ` [PATCH 0/1] " Andrew Morton @ 2025-05-07 21:54 ` Andrew Morton 2 siblings, 0 replies; 9+ messages in thread From: Andrew Morton @ 2025-05-07 21:54 UTC (permalink / raw) To: Pantelis Antoniou Cc: linux-mm, linux-kernel, Artem Krupotkin, Charles Briere, Wade Farnsworth On Wed, 7 May 2025 10:41:04 -0500 Pantelis Antoniou <p.antoniou@partner.samsung.com> wrote: > One example of that is the 9p filesystem which is commonly used in qemu > based setups for sharing files with the host. > > In our emulation environment we have noticed failing writes when performing > I/O from a userspace mapped DRM GEM buffer object. > The platform does not use VRAM, all graphics memory is regular DRAM memory, > allocated via __get_free_pages > > The same write was successful from a heap allocated bounce buffer. > > The sequence of events is as follows. > > .. > There's a lot of good stuff in this [0/N], but a single patch "series" is cumbersome. Can you please redo this as a standalone patch in which the changelog is the union of [0/1] and [1/1]? ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-05-08 15:47 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CGME20250507154119uscas1p2a4055d14ab111fdb94a6378789c38d9d@uscas1p2.samsung.com>
2025-05-07 15:41 ` [PATCH 0/1] Fix zero copy I/O on __get_user_pages allocated pages Pantelis Antoniou
[not found] ` <CGME20250507154119uscas1p17799fe7589e4f1bd53d2d3dc7f44cb8c@uscas1p1.samsung.com>
2025-05-07 15:41 ` [PATCH 1/1] " Pantelis Antoniou
2025-05-08 15:03 ` David Hildenbrand
2025-05-08 15:23 ` Pantelis Antoniou
2025-05-08 15:37 ` David Hildenbrand
2025-05-08 15:47 ` Pantelis Antoniou
2025-05-07 21:50 ` [PATCH 0/1] " Andrew Morton
2025-05-08 8:24 ` Pantelis Antoniou
2025-05-07 21:54 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox