From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB929C433EF for ; Tue, 22 Mar 2022 13:17:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 155726B0072; Tue, 22 Mar 2022 09:17:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 103CB6B0073; Tue, 22 Mar 2022 09:17:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0D576B0074; Tue, 22 Mar 2022 09:17:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id E38A76B0072 for ; Tue, 22 Mar 2022 09:17:16 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id ABC3F61E2E for ; Tue, 22 Mar 2022 13:17:16 +0000 (UTC) X-FDA: 79272073272.07.3BAF576 Received: from out199-11.us.a.mail.aliyun.com (out199-11.us.a.mail.aliyun.com [47.90.199.11]) by imf02.hostedemail.com (Postfix) with ESMTP id 588098001F for ; Tue, 22 Mar 2022 13:17:14 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R921e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04407;MF=xiaoguang.wang@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0V7wMyaL_1647955029; Received: from 30.225.28.195(mailfrom:xiaoguang.wang@linux.alibaba.com fp:SMTPD_---0V7wMyaL_1647955029) by smtp.aliyun-inc.com(127.0.0.1); Tue, 22 Mar 2022 21:17:10 +0800 Message-ID: <36b5a8e5-c8e9-6a1f-834c-6bf9bf920f4c@linux.alibaba.com> Date: Tue, 22 Mar 2022 21:17:07 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [RFC 0/3] Add zero copy feature for tcmu Content-Language: en-US To: Bodo Stroesser , linux-mm@kvack.org, target-devel@vger.kernel.org, linux-scsi@vger.kernel.org Cc: linux-block@vger.kernel.org, xuyu@linux.alibaba.com References: <20220318095531.15479-1-xiaoguang.wang@linux.alibaba.com> From: Xiaoguang Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Authentication-Results: imf02.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf02.hostedemail.com: domain of xiaoguang.wang@linux.alibaba.com designates 47.90.199.11 as permitted sender) smtp.mailfrom=xiaoguang.wang@linux.alibaba.com X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 588098001F X-Stat-Signature: zdwtjc5ahjo5e8t9x1hiw9nyweidw8o7 X-HE-Tag: 1647955034-113742 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: hi, > On 18.03.22 10:55, Xiaoguang Wang wrote: >> The core idea to implement tcmu zero copy feature is really straight, >> which just maps block device io request's sgl pages to tcmu user space >> backstore, then we can avoid extra copy overhead between sgl pages and >> tcmu internal data area(which really impacts io throughput), please see >> https://www.spinics.net/lists/target-devel/msg21121.html for detailed >> info. >> > > Can you please tell us, how big the performance improvement is and > which configuration you are using for measurenments? Sorry, I should have attached test results here. Initially I tried to use tcmu user:fbo backstore to evaluate performance improvements, but it only shows about 10%~15% io throughput improvement. Fio config is numjobs=1, iodepth=8, bs=256k, which isn't very impressive. The reason is that user:fbo backstore does buffered reads, it consumes most of cpu. Then I test this zero copy feature for our real workload, whose backstore is a network program visiting distributed file system and it's multi-threaded. For 4 job, 8 depth, 256 kb io size, the write throughput improves from 3.6GB/s to 10GB/s. Regards, Xiaoguang Wang > >> Initially I use remap_pfn_range or vm_insert_pages to map sgl pages to >> user space, but both of them have limits: >> 1)  Use vm_insert_pages >> which is like tcp getsockopt(TCP_ZEROCOPY_RECEIVE), but there're two >> restrictions: >>    1. anonymous pages can not be mmaped to user spacea. >>      ==> vm_insert_pages >>      ====> insert_pages >>      ======> insert_page_in_batch_locked >>      ========> validate_page_before_insert >>      In validate_page_before_insert(), it shows that anonymous page >> can not >>      be mapped to use space, we know that if issuing direct io to block >>      device, io request's sgl pages mostly comes from anonymous page. >>          if (PageAnon(page) || PageSlab(page) || page_has_type(page)) >>              return -EINVAL; >>      I'm not sure why there is such restriction? for safety reasons ? >> >>    2. warn_on triggered in __folio_mark_dirty >>      When calling zap_page_range in tcmu user space backstore when io >>      completes, there is a warn_on triggered in __folio_mark_dirty: >>         if (folio->mapping) {   /* Race with truncate? */ >>             WARN_ON_ONCE(warn && !folio_test_uptodate(folio)); >> >>      I'm not familiar with folio yet, but I think the reason is that >> when >>      issuing a buffered read to tcmu block device, it's page cache >> mapped >>      to user space, backstore write this page and pte will be >> dirtied. but >>      initially it's newly allocated, hence page_update flag not set. >>      In zap_pte_range(), there is such codes: >>         if (!PageAnon(page)) { >>             if (pte_dirty(ptent)) { >>                 force_flush = 1; >>                 set_page_dirty(page); >>             } >>     So this warn_on is reasonable. >>     Indeed what I want is just to map io request sgl pages to tcmu user >>     space backstore, then backstore can read or write data to mapped >> area, >>     I don't want to care about page or its mapping status, so I >> choose to >>     use remap_pfn_range. >> >> 2) Use remap_pfn_range() >>    remap_pfn_range works well, but it has somewhat obvious overhead. >> For a >>    512kb io request, it has 128 pages, and usually this 128 page's >> pfn are >>    not consecutive, so in worst cases, for a 512kb io request, I'd >> need to >>    issue 128 calls to remap_pfn_range, it's horrible. And in >> remap_pfn_range, >>    if x86 page attribute table feature is enabled, lookup_memtype >> called by >>    track_pfn_remap() also introduces obvious overhead. >> >> Finally in order to solve these problems, Xu Yu helps to implment a new >> helper, which accepts an array of pages as parameter, anonymous pages >> can >> be mapped to user space, pages would be treated as special >> pte(pte_special >> returns true), so vm_normal_page returns NULL, above folio warn_on won't >> trigger. >> >> Thanks. >> >> Xiaoguang Wang (2): >>    mm: export zap_page_range() >>    scsi: target: tcmu: Support zero copy >> >> Xu Yu (1): >>    mm/memory.c: introduce vm_insert_page(s)_mkspecial >> >>   drivers/target/target_core_user.c | 257 >> +++++++++++++++++++++++++++++++++----- >>   include/linux/mm.h                |   2 + >>   mm/memory.c                       | 183 +++++++++++++++++++++++++++ >>   3 files changed, 414 insertions(+), 28 deletions(-) >>