linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Question about how to map io request sg pages to user spsace
@ 2022-02-18  7:15 Xiaoguang Wang
  0 siblings, 0 replies; only message in thread
From: Xiaoguang Wang @ 2022-02-18  7:15 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-block, target-devel

[-- Attachment #1: Type: text/plain, Size: 4136 bytes --]

hi,

I have some questions about how to map block device io requests' pages to
user space, which may need your help, thanks in advance.

Let me first have a brief introduction, one of our customers use tcm_loop &
tcmu to export a virtual block device to user space, tcm_loop and tcmu are
belong to scsi/target subsystem.  This virtual block device has a user-space
backend, which visit remote distributed filesystem to complete io requests.
The data flow likes below:
   1) client app issue io request to this virtual block.
   2) tcm_loop & tcmu are kernel modules, they handle io requests.
   3) tcmu maintain an internal data area, which indeed is a xarray managing
       kernel pages. tcmu allocates kernel pages to data area and copy 
io requests
       sg pages to tcmu data area's kernel pages.
   4) tcmu maps data area's kernel pages to user space, then tcmu user space
       backend can read or fill mmaped user space area.

But this solution have obvious overhead, allocating tcmu data area pages 
and one
extra copy, which results in tcmu throughput bottleneck, so I try to map 
block device
io requests' sg pages to user space directly, which I believe it can 
improve tcmu
throughput. Currenly I have implemented two prototypes:

Solution 1:
use vm_insert_pages, which is like tcp getsockopt(TCP_ZEROCOPY_RECEIVE).
But there're two restrictions:
   1 anonymous pages can not be mmaped to user space
    ==> vm_insert_pages
    ====> insert_pages
    ======> insert_page_in_batch_locked
    ========> validate_page_before_insert
    In validate_page_before_insert(), it shows anonymous page can not be 
mapped to
    use space, we know that if issuing direct io block device, io 
requests' sg pages maybe
    anonymous page.
        if (PageAnon(page) || PageSlab(page) || page_has_type(page))
            return -EINVAL;
    I wonder why there is such restriction? for safety reasons ?

   2,  warn_on triggered in __folio_mark_dirty
   When doing zap_page_range in tcmu user space backend when io 
completes, there is
   a warn_on triggered in __folio_mark_dirty:
       if (folio->mapping) {   /* Race with truncate? */
           WARN_ON_ONCE(warn && !folio_test_uptodate(folio));

   I'm not familiar with folio yet, but I think the reason is that when 
issuing a buffered
   read to block device, it's page cache mapped to user space, but 
initially it's newly
   allocated, hence page_update flag not set.  In zap_pte_range, there 
is such codes:
       if (!PageAnon(page)) {
           if (pte_dirty(ptent)) {
               force_flush = 1;
               set_page_dirty(page);
           }
  So this warn_on is reasonable.
  Indeed what I want is just to map io request sg pages to tcmu user 
space backend, then backend
  can read or write data to mapped area, I don't want to care about page 
or its mapping status, so
  I choose to use remap_pfn_range.

Then solution 2, use remap_pfn_range.
remap_pfn_range works well, but it has somewhat obvious overhead. For a 
512kb io request,
it has 128 pages, and usually this 128 page‘ pfn are not consecutive, so 
in worst cases, for a 512kb
io request, I'd need to issue 128 calls to remap_pfn_range, it's 
horrible. And in remap_pfn_range,
if x86 page attribute table feature is enabled, lookup_memtype called by 
track_pfn_remap() also
introduce obvious overhead.

Finally my question is that is there any simple and efficient helper to 
map block device sg pages
to user space, it may accept an array of pages as parameter, anonymous 
pages can be mapped
to user space, and pages would be treated as a special pte(pte_special 
returns true), so
vm_normal_page returns NULL,  above warn_on won't trigger.  Does this 
sounds reasonable,
I'm not a qualified mm developer, but if you think this new helper is 
reasonable, I can try to add
such one, thanks.


Regards,
Xiaoguang Wang













Regards,
Xiaoguang Wang

[-- Attachment #2: Type: text/html, Size: 5721 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-02-18  7:15 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-18  7:15 Question about how to map io request sg pages to user spsace Xiaoguang Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox