From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FBB9CED62B for ; Wed, 9 Oct 2024 08:53:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F7556B008A; Wed, 9 Oct 2024 04:53:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A7326B00CD; Wed, 9 Oct 2024 04:53:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3469C6B00CF; Wed, 9 Oct 2024 04:53:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0BE9E6B008A for ; Wed, 9 Oct 2024 04:53:00 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id C05CFA115D for ; Wed, 9 Oct 2024 08:52:55 +0000 (UTC) X-FDA: 82653448878.16.8C36AF6 Received: from out30-98.freemail.mail.aliyun.com (out30-98.freemail.mail.aliyun.com [115.124.30.98]) by imf12.hostedemail.com (Postfix) with ESMTP id 1676F40002 for ; Wed, 9 Oct 2024 08:52:54 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=JGUliwp3; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf12.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.98 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728463867; a=rsa-sha256; cv=none; b=M9uWXuvkTRYkq/cMCsHy3RzMpNrre+mL5+WcjXcW2SuFyvVmXIX4aDICLR+1fMcPmAzXI/ n1s/JgWqA3Dy4k5GAs+dop2bBBKyqXXKrRAo8JR3Fb/PdxQ/g3HxAzBmzmySJpXtorSLbT F6nb4ZXeY600/uhnexq2BmocfRBx6gI= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=JGUliwp3; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf12.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.98 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728463867; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6hlnktzgeoMJ3sSidIFua4WAgBDoiJZRoKn3xL20UPY=; b=RNVW76J/EttUxTYdWzSdehVw1p5eGspv+Td75+5K3iE6T+rO6noKBzA/wnk88TWG6DNl3c PApBANUmvN9Yre4dtFxoOjxHQ2b8weBkuF1MZcaykAskrLP+CeAujtNN9giVntILeowA2R Q8bsQhPxbPc4fLk2VLy4cGIrTiPGPV4= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1728463972; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=6hlnktzgeoMJ3sSidIFua4WAgBDoiJZRoKn3xL20UPY=; b=JGUliwp3VCku+9J6LbpukUeYCT5U/xXlkZvuwO0FWH7PPTtR9DwYTWMYOxSMsVZZyoIYNpEkwNJifPNgFIOFOaFHDCYuUjOu5OybElTt+gV8wBC8iUVdtVh641xijQYpyuWT38U6UJqPwge3k6+TNynLCUmTPJ2lBkWBzMtzS8w= Received: from 30.74.144.152(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WGiFgA9_1728463969) by smtp.aliyun-inc.com; Wed, 09 Oct 2024 16:52:50 +0800 Message-ID: <796d33c3-f97d-41ad-9ba7-99ade5dcfcee@linux.alibaba.com> Date: Wed, 9 Oct 2024 16:52:48 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] tmpfs: fault in smaller chunks if large folio allocation not allowed To: Kefeng Wang , Matthew Wilcox , "Pankaj Raghav (Samsung)" Cc: Andrew Morton , Hugh Dickins , Alexander Viro , Christian Brauner , Jan Kara , Anna Schumaker , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org References: <20240914140613.2334139-1-wangkefeng.wang@huawei.com> <20240920143654.1008756-1-wangkefeng.wang@huawei.com> <1d4f98aa-f57d-4801-8510-5c44e027c4e4@huawei.com> <1e5357de-3356-4ae7-bc69-b50edca3852b@linux.alibaba.com> <8c5d01b2-f070-4395-aa72-5ad56d6423e5@huawei.com> <314f1320-43fd-45d5-a80c-b8ea90ae4b1b@linux.alibaba.com> <2769e603-d35e-4f3e-83cf-509127b1797e@huawei.com> <72170ff2-f23d-4246-abe8-15270ad1bb39@linux.alibaba.com> <7d76fe98-4f7f-4f3d-9e8e-79d836f945cb@huawei.com> From: Baolin Wang In-Reply-To: <7d76fe98-4f7f-4f3d-9e8e-79d836f945cb@huawei.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 1676F40002 X-Stat-Signature: nfi3o8f15iw5nrrixrhwntm4sweu8rgo X-Rspam-User: X-HE-Tag: 1728463974-766773 X-HE-Meta: U2FsdGVkX192jQ9oemxcDYaO2iEnGYawmnuKDAYapmh3R6YRABTQT2a8EVWUfJMMHsIM/CF6GuMp+W241mk/TNmKRw0EHjeDqDozSPsUg6ml/2zjiesD99kIULEnksMqtgwYe5/PUwixH8503vOjpTaJNq3SXDigN342cN1hP5ApD228rl2iPP6NoLo3DYUpVOYkar8LPyjMMgLlIXaA1ab26gy+VDCt51bjMR05JJ1Mlll29uNVPjG/TVIPFBRy2UJi8CzRnN66FLxzoMkQhkdA8JXYYhxdza8VEl4Ppk2lsVa/QtgSqTDMnPH4mZnZj2BmksyL5gEAecy3To87ZdTAJt6xFK67nAvx3HldBtpAY8O8Jkjn8sgCRLumWJJ7jQzvJomZEVBb7UnIR6db+ukJLBypWN9RjOYU022IsIeZno8gaINhM2Ze3vvJUGLOPpXOA2Sm7nMuUpfN1Rw17MDr2xKy8w6z6cV82hkCF1cSbQV82V8ilrAGfrpC4FK0aD3ZXXbvIPWV8MGglE7xyL32fXCl7cywRXL+CdSriBFvtVQ8mrx7WYERlS4dVtdUV+qGJ0w3koY/Fttaeu0JkZbKKUUemPrCgGhUvJ/g4Rn4W8hZamnJwsliuzFcj2UexZ0ot0f7PuCdNisDkAy4bpact8ituv35rVQKpKVBtc6tpRI6whP+4TAUNMSRVFJZXvU6CPdIEy8tvWHgGZdgfW2OpvsG6MmHRAVhtuvOVaOiz56wW3Ta6kgwlUEsaPxaRxnM/jW6vQwhpsQLTQT+gfpNUIgr35bVSC/uCkUkY/DbL2HIzUj5bvgdH0U79l43pBJDxHxW40Kdq3JB0kYjUnkS2dPkiB7JuR9mkxpLXLjEJrgFCjXaCqmTK/fYUs4x4O+iuUjcByR5nQI0r14RhyFEgXqCmu66qmH3WMPHLTPnRFpytmYVc9SGHAy57WdpUd7IgY05OXVJh9Gb56E pK0M5/7p ENcwQkQptp+3lcyZ/XkRH2aRnCkccUwQsLZo3fDCndbbyZTFF6Bn7jXkBwltFXX3yrvtqsaD8Q/HLU7k6BZnCm1tzFVSTd7DlqMvNMD2Y9d6p2sN97i92m4915SkZfFuamIYL7SRSINncYEO3j7tAV1gE5Ly0p4G1u9Fk9tGW+po04NXr+gIVNJXMoqW03viqnCfV76WVh6ezhjlySkL+9yBTkw74A2g937DzeHHEJc8kI9k+lgOCCXP/E6AwUqkLWJJj2O/HVnZtTmrZzDVjifM6VQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/10/9 15:09, Kefeng Wang wrote: > > > On 2024/9/30 14:48, Baolin Wang wrote: >> >> >> On 2024/9/30 11:15, Kefeng Wang wrote: >>> >>> >>> On 2024/9/30 10:52, Baolin Wang wrote: >>>> >>>> >>>> On 2024/9/30 10:30, Kefeng Wang wrote: >>>>> >>>>> >>>>> On 2024/9/30 10:02, Baolin Wang wrote: >>>>>> >>>>>> >>>>>> On 2024/9/26 21:52, Matthew Wilcox wrote: >>>>>>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) >>>>>>> wrote: >>>>>>>>> So this is why I don't use mapping_set_folio_order_range() >>>>>>>>> here, but >>>>>>>>> correct me if I am wrong. >>>>>>>> >>>>>>>> Yeah, the inode is active here as the max folio size is decided >>>>>>>> based on >>>>>>>> the write size, so probably mapping_set_folio_order_range() will >>>>>>>> not be >>>>>>>> a safe option. >>>>>>> >>>>>>> You really are all making too much of this.  Here's the patch I >>>>>>> think we >>>>>>> need: >>>>>>> >>>>>>> +++ b/mm/shmem.c >>>>>>> @@ -2831,7 +2831,8 @@ static struct inode >>>>>>> *__shmem_get_inode(struct mnt_idmap *idmap, >>>>>>>          cache_no_acl(inode); >>>>>>>          if (sbinfo->noswap) >>>>>>>                  mapping_set_unevictable(inode->i_mapping); >>>>>>> -       mapping_set_large_folios(inode->i_mapping); >>>>>>> +       if (sbinfo->huge) >>>>>>> +               mapping_set_large_folios(inode->i_mapping); >>>>>>> >>>>>>>          switch (mode & S_IFMT) { >>>>>>>          default: >>>>>> >>>>>> IMHO, we no longer need the the 'sbinfo->huge' validation after >>>>>> adding support for large folios in the tmpfs write and fallocate >>>>>> paths [1]. >>> >>> Forget to mention, we still need to check sbinfo->huge, if mount with >>> huge=never, but we fault in large chunk, write is slower than without >>> 9aac777aaf94, the above changes or my patch could fix it. >> >> My patch will allow allocating large folios in the tmpfs write and >> fallocate paths though the 'huge' option is 'never'. > > Yes, indeed after checking your patch, > > The Writing intelligently from 'Bonnie -d /mnt/tmpfs/ -s 1024' based on > next-20241008, > > 1) huge=never >    the base:                                    2016438 K/Sec >    my v1/v2 or Matthew's patch :                2874504 K/Sec >    your patch with filemap_get_order() fix:     6330604 K/Sec > > 2) huge=always >    the write performance:                       7168917 K/Sec > > Since large folios supported in the tmpfs write, we do have better > performance shown above, that's great. Great. Thanks for testing. >> My initial thought for supporting large folio is that, if the 'huge' >> option is enabled, to maintain backward compatibility, we only allow >> 2M PMD-sized order allocations. If the 'huge' option is >> disabled(huge=never), we still allow large folio allocations based on >> the write length. >> >> Another choice is to allow the different sized large folio allocation >> based on the write length when the 'huge' option is enabled, rather >> than just the 2M PMD sized. But will force the huge orders off if >> 'huge' option is disabled. >> > > "huge=never  Do not allocate huge pages. This is the default." > From the document, it's better not to allocate large folio, but we need > some special handle for huge=never or runtime deny/force. Yes. I'm thinking of adding a new option (something like 'huge=mTHP') to allocate large folios based on the write size. I will resend the patchset, and we can discuss it there.