From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD0B5C25B77 for ; Thu, 9 May 2024 19:18:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 65A5A6B0096; Thu, 9 May 2024 15:18:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 60A036B0098; Thu, 9 May 2024 15:18:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4848C6B0099; Thu, 9 May 2024 15:18:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2BBC66B0096 for ; Thu, 9 May 2024 15:18:12 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D582CC1567 for ; Thu, 9 May 2024 19:18:11 +0000 (UTC) X-FDA: 82099817982.30.7562FC5 Received: from mailout2.w1.samsung.com (mailout2.w1.samsung.com [210.118.77.12]) by imf14.hostedemail.com (Postfix) with ESMTP id BF42F100007 for ; Thu, 9 May 2024 19:18:08 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=samsung.com header.s=mail20170921 header.b=b13m93u4; spf=pass (imf14.hostedemail.com: domain of da.gomez@samsung.com designates 210.118.77.12 as permitted sender) smtp.mailfrom=da.gomez@samsung.com; dmarc=pass (policy=none) header.from=samsung.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715282289; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vqONeDvcgdkJM1J9mpFNx3A6NOv5X1RScOsC2jgEsJA=; b=zz71yZ1CQqaqKgWzbsPGF+szh+eF9zNK6W/0Ny+N9RD3zS7DOI39/BXR1jmeN9H76lzSht yY+jEfj8fqSTBq1k2IxCLa1O/Tvjzzh14XHpSbAM/CjT7hPbVV6ZD3sVSlNbt5JItkiJqr h0Hrs7rZDR3ZBieilRxVT5SAgSiYeOs= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=samsung.com header.s=mail20170921 header.b=b13m93u4; spf=pass (imf14.hostedemail.com: domain of da.gomez@samsung.com designates 210.118.77.12 as permitted sender) smtp.mailfrom=da.gomez@samsung.com; dmarc=pass (policy=none) header.from=samsung.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715282289; a=rsa-sha256; cv=none; b=T4+xHnzhqO2mwu8IH1hwDU1EtSRyGjrRurltG/RscT/og2EQ/RZM1ghO9Cl0WKuKZEwmbf OZZ1bralPxbt9qhwD+nRimObooL5rh/Q6uxdYD1gztvJ5NB+a4DT8kKXhJ2hQkFaL1chmX EiGyDUtKHMLJ8KbuJfT8GbMibfjd+vY= Received: from eucas1p2.samsung.com (unknown [182.198.249.207]) by mailout2.w1.samsung.com (KnoxPortal) with ESMTP id 20240509191806euoutp02dc7061d999cea323689642e0d9e455f6~N6CbRTO3_1270112701euoutp02W for ; Thu, 9 May 2024 19:18:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout2.w1.samsung.com 20240509191806euoutp02dc7061d999cea323689642e0d9e455f6~N6CbRTO3_1270112701euoutp02W DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1715282286; bh=vqONeDvcgdkJM1J9mpFNx3A6NOv5X1RScOsC2jgEsJA=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=b13m93u48H/8WyyPPu93nEKul9vRAUVdoMhad8CI3Xv70N+D9KN3qUMwBB9F6aB9d Ah11/6YSMczWcIvZaUhQE401o8sxBqd4P+ftA7W4NHnaNDYES0n6mI1Cq3zIYdZYhs XMSOdoc9QiYnfSaJ8nuQZnb/1754Ao/AshrPMogA= Received: from eusmges3new.samsung.com (unknown [203.254.199.245]) by eucas1p2.samsung.com (KnoxPortal) with ESMTP id 20240509191806eucas1p27abe25dea6bd2327ec2a094e63311ab0~N6Ca8RkpG0373903739eucas1p2f; Thu, 9 May 2024 19:18:06 +0000 (GMT) Received: from eucas1p1.samsung.com ( [182.198.249.206]) by eusmges3new.samsung.com (EUCPMTA) with SMTP id B3.49.09620.E612D366; Thu, 9 May 2024 20:18:06 +0100 (BST) Received: from eusmtrp2.samsung.com (unknown [182.198.249.139]) by eucas1p2.samsung.com (KnoxPortal) with ESMTPA id 20240509191805eucas1p285e8681c5a48d9ec9261839016720a32~N6Cah-OiA0375903759eucas1p2b; Thu, 9 May 2024 19:18:05 +0000 (GMT) Received: from eusmgms2.samsung.com (unknown [182.198.249.180]) by eusmtrp2.samsung.com (KnoxPortal) with ESMTP id 20240509191805eusmtrp2c2eb326e47f85bda8b3bae4396fd63d3~N6CahZc4o2950329503eusmtrp2b; Thu, 9 May 2024 19:18:05 +0000 (GMT) X-AuditID: cbfec7f5-d31ff70000002594-10-663d216eb4a8 Received: from eusmtip1.samsung.com ( [203.254.199.221]) by eusmgms2.samsung.com (EUCPMTA) with SMTP id 66.35.09010.D612D366; Thu, 9 May 2024 20:18:05 +0100 (BST) Received: from CAMSVWEXC02.scsc.local (unknown [106.1.227.72]) by eusmtip1.samsung.com (KnoxPortal) with ESMTPA id 20240509191805eusmtip1236de4a4534946a21b94901ab2cfeec3~N6CaSeGio0706307063eusmtip1N; Thu, 9 May 2024 19:18:05 +0000 (GMT) Received: from CAMSVWEXC02.scsc.local (2002:6a01:e348::6a01:e348) by CAMSVWEXC02.scsc.local (2002:6a01:e348::6a01:e348) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 9 May 2024 20:18:04 +0100 Received: from CAMSVWEXC02.scsc.local ([::1]) by CAMSVWEXC02.scsc.local ([fe80::3c08:6c51:fa0a:6384%13]) with mapi id 15.00.1497.012; Thu, 9 May 2024 20:18:04 +0100 From: Daniel Gomez To: David Hildenbrand CC: Baolin Wang , "akpm@linux-foundation.org" , "hughd@google.com" , "willy@infradead.org" , "ioworker0@gmail.com" , "wangkefeng.wang@huawei.com" , "ying.huang@intel.com" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "ryan.roberts@arm.com" , "shy828301@gmail.com" , "ziy@nvidia.com" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 0/8] add mTHP support for anonymous shmem Thread-Topic: [PATCH 0/8] add mTHP support for anonymous shmem Thread-Index: AQHaoTxlCSlVdbHh/EKIyyNA4DF787GNKsyAgAAqDYCAACtYgIABt8sA Date: Thu, 9 May 2024 19:18:03 +0000 Message-ID: In-Reply-To: Accept-Language: en-US, en-GB Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [106.210.248.161] Content-Type: text/plain; charset="us-ascii" Content-ID: <480D98AC2B091542A5BBA7A1610C0D2D@scsc.local> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrLKsWRmVeSWpSXmKPExsWy7djPc7p5irZpBg9/sVp8vitkMWf9GjaL /3uPMVp8Xf+L2eLppz4Wi0W/jS0u75rDZnFvzX9Wi57dUxktFpxYzGjR+Pk+o8XvH0CJk7Mm s1jMPnqP3YHPY828NYweO2fdZfdYsKnUo+XIW1aPzSu0PBbvecnksenTJHaPEzN+s3jsfGjp 0dv8js3j/b6rbB6fN8kF8ERx2aSk5mSWpRbp2yVwZRw8f5ulYJ5xxfF1XYwNjJs0uhg5OSQE TCRWb17I3MXIxSEksIJRYtnlb6wQzhdGifevLkBlPjNKXFw4gxmm5fmePVCJ5YwSx2a9Z4er ur/sAhuEc5pRYtuElQiTF849wQjSzyagKbHv5CZ2EFtEQENiU9sGsCJmge8sEr3/T4ElhAVs JRad/8EGUWQn0bhwGguE7SbxYOZDVhCbRUBF4t7Jf2A2r4CvxKXNW8B6OYHq756YC2YzCshK PFr5C8xmFhCXuPVkPhPEE4ISi2bvgXpITOLfrodsELaOxNnrTxghbAOJrUv3sUDYyhLr37Ux QczRkViw+xNQPTuQbSnRVQoR1ZZYtvA1M8Q1ghInZz5hAXlLQmArl8SkrfegxrhIfDvUwA5h C0u8Or6FfQKjziwk181CsmEW3IZZSDbMQrJhASPrKkbx1NLi3PTUYuO81HK94sTc4tK8dL3k /NxNjMAUefrf8a87GFe8+qh3iJGJg/EQowQHs5IIb1WNdZoQb0piZVVqUX58UWlOavEhRmkO FiVxXtUU+VQhgfTEktTs1NSC1CKYLBMHp1QDk4hLSXw+T+vcpWsrPb/omF+u/928XPLA83sv F5V0RIXoSX8KO+tz9Wei+vbTL6Zemz/poXNRrmuNnvr5uF0aG2bu454Z3m+97javguq2N5eS lFd/NOz+t/yOubHEQSkV2c8ykkr3hKZnH54YI9/w9cT/uXN+npv2v1397glj85Jd1tMnnJO9 06MT7HxDV3kS96kdSutuiJxed8s8a5mgTF5qcefEpOUL89l0vyxhY/RSOZH2aZLLF9lnqq9z cx02b3j3LkXJbjF/kb7jM/O5DQeDOwQWMtT8ynt2Wm//g3Va9x4eDjpe/15EYr/Hd8ldDNb5 qycpTdPy2X3uy4M2Wd15fIuWyJls2dxVIhG9N0iJpTgj0VCLuag4EQBeQT7AAAQAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrKKsWRmVeSWpSXmKPExsVy+t/xu7q5irZpBitOKVt8vitkMWf9GjaL /3uPMVp8Xf+L2eLppz4Wi0W/jS0u75rDZnFvzX9Wi57dUxktFpxYzGjR+Pk+o8XvH0CJk7Mm s1jMPnqP3YHPY828NYweO2fdZfdYsKnUo+XIW1aPzSu0PBbvecnksenTJHaPEzN+s3jsfGjp 0dv8js3j/b6rbB6fN8kF8ETp2RTll5akKmTkF5fYKkUbWhjpGVpa6BmZWOoZGpvHWhmZKunb 2aSk5mSWpRbp2yXoZRw8f5ulYJ5xxfF1XYwNjJs0uhg5OSQETCSe79nD3MXIxSEksJRR4tan BhaIhIzExi9XWSFsYYk/17rYIIo+MkpsenWcBcI5zSjRsekaI4SzglGiedYnRpAWNgFNiX0n N7GD2CICGhKb2jaA7WAW+Moi8XvLJLAdwgK2EovO/2CDKLKTaFw4jQXCdpN4MPMh2G4WARWJ eyf/gdm8Ar4SlzZvYYfY1sIsMWnpQrANnEDNd0/MBbMZBWQlHq38BWYzC4hL3HoynwniCQGJ JXvOM0PYohIvH/+Dek5H4uz1J4wQtoHE1qX7oAGgLLH+XRsTxBwdiQW7PwEdyg5kW0p0lUJE tSWWLXzNDHGaoMTJmU9YJjDKzEKyeBaS5llwzbOQNM9C0ryAkXUVo0hqaXFuem6xkV5xYm5x aV66XnJ+7iZGYPrbduznlh2MK1991DvEyMTBeIhRgoNZSYS3qsY6TYg3JbGyKrUoP76oNCe1 +BCjKTDgJjJLiSbnAxNwXkm8oZmBqaGJmaWBqaWZsZI4r2dBR6KQQHpiSWp2ampBahFMHxMH p1QD09T89fvDjna/0vzya/m2a/kZUgdvVT1k2h/5jiltcUbgvcvfLA7P2PjvYdKXS3Jt62s/ 7rz8xeX+NZ+v+1yZjUO3czfnSh+xv+xr7iWhvOur152pewTWZy9y7ha6Jah5t+aLQtDyr/e2 PeFcc6peV2ouV+/VyMP/g4o23Ux57PdQ+9jExw+ufiuMu1+znuPTxR0l/7Jrxb6tsmx7JXU/ yLcmbvP6Gd/fe26TfMYalfBe6LDQtH9sEkas8v9fG1bWTtuhG2GQkrz+yfzo3idrtxa797et cfEynB7Y9eZ2y5Kra2976xV+eRTeVhNyUeb9uYNpsksNu3Z5vW0qWfpQw6TmboHexieB5783 e6QsynysxFKckWioxVxUnAgATpcvHggEAAA= X-CMS-MailID: 20240509191805eucas1p285e8681c5a48d9ec9261839016720a32 X-Msg-Generator: CA X-RootMTR: 20240508113934eucas1p13a3972f3f9955365f40155e084a7c7d5 X-EPHeader: CA CMS-TYPE: 201P X-CMS-RootMailID: 20240508113934eucas1p13a3972f3f9955365f40155e084a7c7d5 References: X-Stat-Signature: sqb7oj34jobfdg1i4azyehccr3jrkign X-Rspam-User: X-Rspamd-Queue-Id: BF42F100007 X-Rspamd-Server: rspam05 X-HE-Tag: 1715282288-1955 X-HE-Meta: U2FsdGVkX18BrrunngEEkL10ZvVZFRxmuSSjF8Qn1m4KGQaduOegpj5u54G3w+t/RyNRNdqP30s3nw60k4cBb3T50rpt+XFebVtiWZhK+gI1e//kNGyTMnH3+CQk+GKDk7BGVPv1B31MNnrzZo/8oZlJMMaM2xVkbSxKFsjKiSxi2vMKoF0vHpH9+G8POuStgSq0sWPevJfkEGeqLMPoW9CFzqY9j+sRSXH8w99hH1RmJXKFamfRiLhzWXsdkgvLTAXR9nXFU1gp3fHmyqPc3wRJc/pm8mgX5GUfa2woU9ijlo/kalK5WuUg+E2Vn1ybPGyPUwKoTl3wQ3RIN4CxyRylWWgSzSVQi47kVWfGM2N5CsS4iQPbueRMC4/ziSfDj3kPSs4sOxFV2gRVzuVkzGjY/oCD03MAVzkwTRfKTQVqrt27f5qbVIYIToUFwMVXizSE1eu2jLI3xhIJNZad5tOvsSCqeK0CS7XEj5Q//e/sbqSXsZv0cc9G0cyjer2xb7Xf0Rqx1G8avzNdFfTkvRkjVc6eIy89R5Efz0gkYwQfe1L57FNr8jdSuNgpTTwXfKVq3xMJAPcTuTsAReGuMgptgQjV+4l4C+EHa/XV4dBTc3Y1AEe6bHKeVzMSJ5fzTlJxFY/EnjuR3yQaifYRCxzjPSoW0S/dpfFRsvX3NtWAoNzrbN/WuW8giNoSeqPnFTxSJxcxrjiPB1UiN8adV4avnvAZqKy1oYlHAUGgG3vU9+6ypj2HAzcrml0Q16q2de8InSKVOJVsxRLC2fTbEsz6rjnDHpYJEdoASGnscY9QwBYgrbp/VCx/9X68P3CBTFXL7BlqyQEz2IKw9gb1eYocNBV2B+9HMD1cRxSQcWN1hzsoPy45ppZw9+DCS8EA9RDVctZD60l03Jd8llI79pQLsNJcYcCgnq+EATEGfwC7kmSH0wVXUkZGme7y1lUiWZ1i1yF6+eAFMUull3S 0YwKltuS dm2VJzBLgtBVCXeKhTFsrlus4yd7r9BsF+CjJ1aeA1N23mQDrC797ifYZZALeuKpvcz+ScXNqIRpLj1wKSkpxwHACR1s2s7gLRDAuWxQW1g5vy8cTnRiQi7BRLMHXFfIZ6utAn7LX+B8RSomENbm0fFiRT+ehSytv/L/K X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 08, 2024 at 07:03:57PM +0200, David Hildenbrand wrote: > On 08.05.24 16:28, Daniel Gomez wrote: > > On Wed, May 08, 2024 at 01:58:19PM +0200, David Hildenbrand wrote: > > > On 08.05.24 13:39, Daniel Gomez wrote: > > > > On Mon, May 06, 2024 at 04:46:24PM +0800, Baolin Wang wrote: > > > > > Anonymous pages have already been supported for multi-size (mTHP)= allocation > > > > > through commit 19eaf44954df, that can allow THP to be configured = through the > > > > > sysfs interface located at '/sys/kernel/mm/transparent_hugepage/h= ugepage-XXkb/enabled'. > > > > >=20 > > > > > However, the anonymous shared pages will ignore the anonymous mTH= P rule > > > > > configured through the sysfs interface, and can only use the PMD-= mapped > > > > > THP, that is not reasonable. Many implement anonymous page sharin= g through > > > > > mmap(MAP_SHARED | MAP_ANONYMOUS), especially in database usage sc= enarios, > > > > > therefore, users expect to apply an unified mTHP strategy for ano= nymous pages, > > > > > also including the anonymous shared pages, in order to enjoy the = benefits of > > > > > mTHP. For example, lower latency than PMD-mapped THP, smaller mem= ory bloat > > > > > than PMD-mapped THP, contiguous PTEs on ARM architecture to reduc= e TLB miss etc. > > > > >=20 > > > > > The primary strategy is similar to supporting anonymous mTHP. Int= roduce > > > > > a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_ena= bled', > > > > > which can have all the same values as the top-level > > > > > '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding = a new > > > > > additional "inherit" option. By default all sizes will be set to = "never" > > > > > except PMD size, which is set to "inherit". This ensures backward= compatibility > > > > > with the shmem enabled of the top level, meanwhile also allows in= dependent > > > > > control of shmem enabled for each mTHP. > > > >=20 > > > > I'm trying to understand the adoption of mTHP and how it fits into = the adoption > > > > of (large) folios that the kernel is moving towards. Can you, or an= yone involved > > > > here, explain this? How much do they overlap, and can we benefit fr= om having > > > > both? Is there any argument against the adoption of large folios he= re that I > > > > might have missed? > > >=20 > > > mTHP are implemented using large folios, just like traditional PMD-si= zed THP > > > are. (you really should explore the history of mTHP and how it all wo= rks > > > internally) > >=20 > > I'll check more in deep the code. By any chance are any of you going to= be at > > LSFMM this year? I have this session [1] scheduled for Wednesday and it= would > > be nice to get your feedback on it and if you see this working together= with > > mTHP/THP. > >=20 >=20 > I'll be around and will attend that session! But note that I am still > scratching my head what to do with "ordinary" shmem, especially because o= f > the weird way shmem behaves in contrast to real files (below). Some input > from Hugh might be very helpful. I'm looking forward to meet you there and have your feedback! >=20 > Example: you write() to a shmem file and populate a 2M THP. Then, nobody > touches that file for a long time. There are certainly other mmap() users > that could better benefit from that THP ... and without swap that THP wil= l > be trapped there possibly a long time (unless I am missing an important > piece of shmem THP design :) )? Sure, if we only have THP's it's nice, > that's just not the reality unfortunately. IIRC, that's one of the reason= s > why THP for shmem can be enabled/disabled. But again, still scratching my > head ... >=20 >=20 > Note that this patch set only tackles anonymous shmem (MAP_SHARED|MAP_ANO= N), > which is in 99.999% of all cases only accessed via page tables (memory > allocated during page faults). I think there are ways to grab the fd > (/proc/self/fd), but IIRC only corner cases read/write that. >=20 > So in that sense, anonymous shmem (this patch set) behaves mostly like > ordinary anonymous memory, and likely there is not much overlap with othe= r > "allocate large folios during read/write/fallocate" as in [1]. swap might > have an overlap. >=20 >=20 > The real confusion begins when we have ordinary shmem: some users never m= map > it and only read/write, some users never read/write it and only mmap it a= nd > some (less common?) users do both. >=20 > And shmem really is special: it looks like "just another file", but > memory-consumption and reclaim wise it behaves just like anonymous memory= . > It might be swappable ("usually very limited backing disk space available= ") > or it might not. >=20 > In a subthread here we are discussing what to do with that special > "shmem_enabled =3D force" mode ... and it's all complicated I think. >=20 > > [1] https://lore.kernel.org/all/4ktpayu66noklllpdpspa3vm5gbmb5boxskcj2q= 6qn7md3pwwt@kvlu64pqwjzl/ > >=20 > > >=20 > > > The biggest challenge with memory that cannot be evicted on memory pr= essure > > > to be reclaimed (in contrast to your ordinary files in the pagecache)= is > > > memory waste, well, and placement of large chunks of memory in genera= l, > > > during page faults. > > >=20 > > > In the worst case (no swap), you allocate a large chunk of memory onc= e and > > > it will stick around until freed: no reclaim of that memory. > >=20 > > I can see that path being triggered by some fstests but only for THP (w= here we > > can actually reclaim memory). >=20 > Is that when we punch-hole a partial THP and split it? I'd be interested = in > what that test does. The reclaim path I'm referring to is triggered when we reach max capacity (-ENOSPC) in shmem_alloc_and_add_folio(). We reclaim space by splitting lar= ge folios (regardless of their dirty or uptodate condition). One of the tests that hits this path is generic/100 (with huge option enabl= ed). - First, it creates a directory structure in $TEMP_DIR (/tmp). Dir size is around 26M. - Then, it tars it up into $TEMP_DIR/temp.tar. - Finally, untars the compressed file into $TEST_DIR (/media/test, which is= the huge tmpfs mountdir). What happens in generic/100 under the huge=3Dalways c= ase is that you fill up the dedicated space very quickly (this is 1G in xfstest= s for tmpfs) and then you start reclaiming. >=20 >=20 >=20 > --=20 > Cheers, >=20 > David / dhildenb > =