From: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
To: Bodo Stroesser <bostroesser@gmail.com>,
linux-mm@kvack.org, target-devel@vger.kernel.org,
linux-scsi@vger.kernel.org
Cc: linux-block@vger.kernel.org, xuyu@linux.alibaba.com
Subject: Re: [RFC 0/3] Add zero copy feature for tcmu
Date: Tue, 22 Mar 2022 21:17:07 +0800 [thread overview]
Message-ID: <36b5a8e5-c8e9-6a1f-834c-6bf9bf920f4c@linux.alibaba.com> (raw)
In-Reply-To: <abbe51c4-873f-e96e-d421-85906689a55a@gmail.com>
hi,
> On 18.03.22 10:55, Xiaoguang Wang wrote:
>> The core idea to implement tcmu zero copy feature is really straight,
>> which just maps block device io request's sgl pages to tcmu user space
>> backstore, then we can avoid extra copy overhead between sgl pages and
>> tcmu internal data area(which really impacts io throughput), please see
>> https://www.spinics.net/lists/target-devel/msg21121.html for detailed
>> info.
>>
>
> Can you please tell us, how big the performance improvement is and
> which configuration you are using for measurenments?
Sorry, I should have attached test results here. Initially I tried to use
tcmu user:fbo backstore to evaluate performance improvements, but
it only shows about 10%~15% io throughput improvement. Fio config
is numjobs=1, iodepth=8, bs=256k, which isn't very impressive. The
reason is that user:fbo backstore does buffered reads, it consumes most
of cpu.
Then I test this zero copy feature for our real workload, whose backstore
is a network program visiting distributed file system and it's
multi-threaded.
For 4 job, 8 depth, 256 kb io size, the write throughput improves from
3.6GB/s to 10GB/s.
Regards,
Xiaoguang Wang
>
>> Initially I use remap_pfn_range or vm_insert_pages to map sgl pages to
>> user space, but both of them have limits:
>> 1) Use vm_insert_pages
>> which is like tcp getsockopt(TCP_ZEROCOPY_RECEIVE), but there're two
>> restrictions:
>> 1. anonymous pages can not be mmaped to user spacea.
>> ==> vm_insert_pages
>> ====> insert_pages
>> ======> insert_page_in_batch_locked
>> ========> validate_page_before_insert
>> In validate_page_before_insert(), it shows that anonymous page
>> can not
>> be mapped to use space, we know that if issuing direct io to block
>> device, io request's sgl pages mostly comes from anonymous page.
>> if (PageAnon(page) || PageSlab(page) || page_has_type(page))
>> return -EINVAL;
>> I'm not sure why there is such restriction? for safety reasons ?
>>
>> 2. warn_on triggered in __folio_mark_dirty
>> When calling zap_page_range in tcmu user space backstore when io
>> completes, there is a warn_on triggered in __folio_mark_dirty:
>> if (folio->mapping) { /* Race with truncate? */
>> WARN_ON_ONCE(warn && !folio_test_uptodate(folio));
>>
>> I'm not familiar with folio yet, but I think the reason is that
>> when
>> issuing a buffered read to tcmu block device, it's page cache
>> mapped
>> to user space, backstore write this page and pte will be
>> dirtied. but
>> initially it's newly allocated, hence page_update flag not set.
>> In zap_pte_range(), there is such codes:
>> if (!PageAnon(page)) {
>> if (pte_dirty(ptent)) {
>> force_flush = 1;
>> set_page_dirty(page);
>> }
>> So this warn_on is reasonable.
>> Indeed what I want is just to map io request sgl pages to tcmu user
>> space backstore, then backstore can read or write data to mapped
>> area,
>> I don't want to care about page or its mapping status, so I
>> choose to
>> use remap_pfn_range.
>>
>> 2) Use remap_pfn_range()
>> remap_pfn_range works well, but it has somewhat obvious overhead.
>> For a
>> 512kb io request, it has 128 pages, and usually this 128 page's
>> pfn are
>> not consecutive, so in worst cases, for a 512kb io request, I'd
>> need to
>> issue 128 calls to remap_pfn_range, it's horrible. And in
>> remap_pfn_range,
>> if x86 page attribute table feature is enabled, lookup_memtype
>> called by
>> track_pfn_remap() also introduces obvious overhead.
>>
>> Finally in order to solve these problems, Xu Yu helps to implment a new
>> helper, which accepts an array of pages as parameter, anonymous pages
>> can
>> be mapped to user space, pages would be treated as special
>> pte(pte_special
>> returns true), so vm_normal_page returns NULL, above folio warn_on won't
>> trigger.
>>
>> Thanks.
>>
>> Xiaoguang Wang (2):
>> mm: export zap_page_range()
>> scsi: target: tcmu: Support zero copy
>>
>> Xu Yu (1):
>> mm/memory.c: introduce vm_insert_page(s)_mkspecial
>>
>> drivers/target/target_core_user.c | 257
>> +++++++++++++++++++++++++++++++++-----
>> include/linux/mm.h | 2 +
>> mm/memory.c | 183 +++++++++++++++++++++++++++
>> 3 files changed, 414 insertions(+), 28 deletions(-)
>>
next prev parent reply other threads:[~2022-03-22 13:17 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-18 9:55 Xiaoguang Wang
2022-03-18 9:55 ` [RFC 1/3] mm/memory.c: introduce vm_insert_page(s)_mkspecial Xiaoguang Wang
2022-03-23 16:45 ` Christoph Hellwig
2022-03-24 7:27 ` Xiaoguang Wang
2022-03-18 9:55 ` [RFC 2/3] mm: export zap_page_range() Xiaoguang Wang
2022-03-21 12:01 ` David Hildenbrand
2022-03-22 13:02 ` Xiaoguang Wang
2022-03-22 13:08 ` David Hildenbrand
2022-03-23 13:59 ` Xiaoguang Wang
2022-03-23 16:48 ` Christoph Hellwig
2022-03-23 16:47 ` Christoph Hellwig
2022-03-24 9:16 ` Ming Lei
2022-03-18 9:55 ` [RFC 3/3] scsi: target: tcmu: Support zero copy Xiaoguang Wang
2022-03-22 14:01 ` Bodo Stroesser
2022-03-23 14:33 ` Xiaoguang Wang
2022-03-25 9:06 ` Bodo Stroesser
2022-03-22 12:40 ` [RFC 0/3] Add zero copy feature for tcmu Bodo Stroesser
2022-03-22 13:17 ` Xiaoguang Wang [this message]
2022-03-22 14:05 ` Bodo Stroesser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=36b5a8e5-c8e9-6a1f-834c-6bf9bf920f4c@linux.alibaba.com \
--to=xiaoguang.wang@linux.alibaba.com \
--cc=bostroesser@gmail.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-scsi@vger.kernel.org \
--cc=target-devel@vger.kernel.org \
--cc=xuyu@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox