From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3D27E77188 for ; Tue, 14 Jan 2025 16:52:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 45D3C6B0092; Tue, 14 Jan 2025 11:52:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 40DF16B0093; Tue, 14 Jan 2025 11:52:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2D5A06B0095; Tue, 14 Jan 2025 11:52:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0F1BF6B0092 for ; Tue, 14 Jan 2025 11:52:09 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id AC58143244 for ; Tue, 14 Jan 2025 16:52:08 +0000 (UTC) X-FDA: 83006649936.20.6DB9F7F Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf15.hostedemail.com (Postfix) with ESMTP id B927DA000F for ; Tue, 14 Jan 2025 16:52:06 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf15.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736873527; a=rsa-sha256; cv=none; b=3qBy4g1DcfNf5ERh9qHRlL8JXOmiEtJchH1Mwg0TQqY82+huiOXfsK0XNq7JlWXew/1di3 mh88VUKBKmCM6igSAcWMZ/fdcVGhTgOSi61Gu+WVdgjdZcDFamFC+xHYs/XFWbIX+triYw gNNqB6WYvBZz6tSrQ+2uFXuJi3lMv7M= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf15.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736873527; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LaPqHP4/NpZcpi7UsYIE/Yi5+Nyg0noYYqkJ+PH3r2k=; b=tAcg09UTwIgxGu5orRAlMypTbIQAEH8hjr+dIj+4obQPQ+rkIGpEbdZ4LfHujDKIPeCO3T ozNIvOXomIMKJHXeld8ylDGEhLqyVWegCmcXbfsfESi/l7HGFnnJY8b1cFhHjoSWcfsIKy 27DKxn/ywN/ftP78kK76FKpVwf0QiVo= Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tXk8M-000000008UK-1uCf; Tue, 14 Jan 2025 11:51:18 -0500 Message-ID: Subject: Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap From: Rik van Riel To: Michal Hocko , Johannes Weiner Cc: Yosry Ahmed , Balbir Singh , Roman Gushchin , hakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Nhat Pham Date: Tue, 14 Jan 2025 11:51:18 -0500 In-Reply-To: References: <20241212115754.38f798b3@fangorn> <20241212183012.GB1026@cmpxchg.org> <20250114160955.GA1115056@cmpxchg.org> Autocrypt: addr=riel@surriel.com; prefer-encrypt=mutual; keydata=mQENBFIt3aUBCADCK0LicyCYyMa0E1lodCDUBf6G+6C5UXKG1jEYwQu49cc/gUBTTk33A eo2hjn4JinVaPF3zfZprnKMEGGv4dHvEOCPWiNhlz5RtqH3SKJllq2dpeMS9RqbMvDA36rlJIIo47 Z/nl6IA8MDhSqyqdnTY8z7LnQHqq16jAqwo7Ll9qALXz4yG1ZdSCmo80VPetBZZPw7WMjo+1hByv/ lvdFnLfiQ52tayuuC1r9x2qZ/SYWd2M4p/f5CLmvG9UcnkbYFsKWz8bwOBWKg1PQcaYHLx06sHGdY dIDaeVvkIfMFwAprSo5EFU+aes2VB2ZjugOTbkkW2aPSWTRsBhPHhV6dABEBAAG0HlJpayB2YW4gU mllbCA8cmllbEByZWRoYXQuY29tPokBHwQwAQIACQUCW5LcVgIdIAAKCRDOed6ShMTeg05SB/986o gEgdq4byrtaBQKFg5LWfd8e+h+QzLOg/T8mSS3dJzFXe5JBOfvYg7Bj47xXi9I5sM+I9Lu9+1XVb/ r2rGJrU1DwA09TnmyFtK76bgMF0sBEh1ECILYNQTEIemzNFwOWLZZlEhZFRJsZyX+mtEp/WQIygHV WjwuP69VJw+fPQvLOGn4j8W9QXuvhha7u1QJ7mYx4dLGHrZlHdwDsqpvWsW+3rsIqs1BBe5/Itz9o 6y9gLNtQzwmSDioV8KhF85VmYInslhv5tUtMEppfdTLyX4SUKh8ftNIVmH9mXyRCZclSoa6IMd635 Jq1Pj2/Lp64tOzSvN5Y9zaiCc5FucXtB9SaWsgdmFuIFJpZWwgPHJpZWxAc3VycmllbC5jb20+iQE +BBMBAgAoBQJSLd2lAhsjBQkSzAMABgsJCAcDAgYVCAIJCgsEFgIDAQIeAQIXgAAKCRDOed6ShMTe g4PpB/0ZivKYFt0LaB22ssWUrBoeNWCP1NY/lkq2QbPhR3agLB7ZXI97PF2z/5QD9Fuy/FD/jddPx KRTvFCtHcEzTOcFjBmf52uqgt3U40H9GM++0IM0yHusd9EzlaWsbp09vsAV2DwdqS69x9RPbvE/Ne fO5subhocH76okcF/aQiQ+oj2j6LJZGBJBVigOHg+4zyzdDgKM+jp0bvDI51KQ4XfxV593OhvkS3z 3FPx0CE7l62WhWrieHyBblqvkTYgJ6dq4bsYpqxxGJOkQ47WpEUx6onH+rImWmPJbSYGhwBzTo0Mm G1Nb1qGPG+mTrSmJjDRxrwf1zjmYqQreWVSFEt26tBpSaWsgdmFuIFJpZWwgPHJpZWxAZmIuY29tP okBPgQTAQIAKAUCW5LbiAIbIwUJEswDAAYLCQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQznneko TE3oOUEQgAsrGxjTC1bGtZyuvyQPcXclap11Ogib6rQywGYu6/Mnkbd6hbyY3wpdyQii/cas2S44N cQj8HkGv91JLVE24/Wt0gITPCH3rLVJJDGQxprHTVDs1t1RAbsbp0XTksZPCNWDGYIBo2aHDwErhI omYQ0Xluo1WBtH/UmHgirHvclsou1Ks9jyTxiPyUKRfae7GNOFiX99+ZlB27P3t8CjtSO831Ij0Ip QrfooZ21YVlUKw0Wy6Ll8EyefyrEYSh8KTm8dQj4O7xxvdg865TLeLpho5PwDRF+/mR3qi8CdGbkE c4pYZQO8UDXUN4S+pe0aTeTqlYw8rRHWF9TnvtpcNzZw== Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.54.1 (3.54.1-1.fc41) MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B927DA000F X-Stat-Signature: n5qyxtqdqifdpz5arkhdbmdn3t9qx8p5 X-Rspam-User: X-HE-Tag: 1736873526-23572 X-HE-Meta: U2FsdGVkX1+MyREKhj3cpkHJwMTMRLXuYjrhIUtdJ3OErQ8BLs3Nmcc0uc9AL674QRfzp3kLqHX9pZaMccBbrdHRKDvMTLaFerXbDHw40T0pJM7py3iW3BRhBIONbeGvN+vqE64qdxAP6Ywxr9DrEirOcA1jPBQpB0uVXwGzCuNjPHQ6w38lVE0q1MeM+Q5CdPdNje7gusBOia+giZJQrnaHM13uPz0Vq1Kc+j/mUoyoNIf/b+qqtgnm1VA3yzO98kPCOsnxpnzAGDkTuznmOLEt3udwmssIPpOLCwkpveQZjxERLrYwMQzuA9uLabVO7w4FTghuBUys+YKRbuu06NgWI+bf9iz/GaqVmvUPO8zrOUfDllBWwKCg9dD3HgzDV9HAqGlWtWOQAK0eWs34reJnbp1tX/4vA494c+16+6KthlTtRHteizSsABUil8xjdnXkVdhynXZIloZ+WSY0MN2z7S0+fgFjtHthjTCWmsXKOn1z1dVuLgrLfjSY3yyPh6vQWbgEqtwveMfWag8pia85QXmT1NXe3u2p29SL1x6SA0vO3drwmQ6ODuIGZEzrgsO3g1pMl8gSSVUYybLA3uZO6fauOinlW1xXb6SkieHyz819kYjnHtUKBAasguEeYMJjuZLJb4DaxFBeEa+FB9JEkRdMGWuvjH2KraNvdQ7PBg1EXSqiA1hYIgiLEZz7Om3+jxYzxCEIJJOODPxFRqY/3PASgAs6/9Qb0B7IEmuS1ttW1CrqFJw1023xi+D/xVJXqDTGfn++XAgFI/FDG+1Oz2vMxllS0cphek6YnFYjTIVyIfzBsz92FMwxYMR2nzY1fjFU5scwNDFCLY970BhHTJtHwAexpPfxsMGwwCXi1Sdew+e/jWeT3uVLipvzVHputKiCBLiFlDsqHzlnqMJsaEBkJCNkVxJ7FMoPpCAuR+Kvj/vnxt/s1BrvupAOdja5aYB9/hkYstEG3xv zJTFevJb EZqJfigcZlD4HSEsSwEpQkQouBMX4GC6fW5UAlTATmhx2jA9KFbBgetJle8y675pNbMjRKZ8uPzuLXW7Ry4frQClohNQAstf24YARt9dtbKTmSvcgBrlc+6PPKSTe3Zh6eIDXVsu6A+QBlD9mynFk4PdQOJDIYrZhnX8iawCqTH50VX2YZEFwqfa1smTnAP8P6YhpAkAWfBjcIbPdkXN2eobz2Y/sdhkXlguR3Bb1cz8d+e2MFta5tX0wqjVlYruZRaSLsssP8dIit+0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 2025-01-14 at 17:46 +0100, Michal Hocko wrote: > On Tue 14-01-25 11:09:55, Johannes Weiner wrote: >=20 > >=20 > > We managed to extract a stack trace of the livelocked task: > >=20 > > obj_cgroup_may_swap > > zswap_store > > swap_writepage > > shrink_folio_list > > shrink_lruvec > > shrink_node > > do_try_to_free_pages > > try_to_free_mem_cgroup_pages >=20 > OK, so this is the reclaim path and it fails due to reasons you > mention > below. This will retry several times until it hits mem_cgroup_oom > which > will bail in mem_cgroup_out_of_memory because of task_is_dying > (returns > true) and retry the charge + reclaim (as the oom killer hasn't done > anything) with passed_oom =3D true this time and eventually got to > nomem > path and returns ENOMEM. This should propaged -ENOMEM down the path >=20 > > charge_memcg > > mem_cgroup_swapin_charge_folio > > __read_swap_cache_async > > swapin_readahead > > do_swap_page > > handle_mm_fault > > do_user_addr_fault > > exc_page_fault > > asm_exc_page_fault > > __get_user >=20 > All the way here and return the failure to futex_cleanup which > doesn't > retry __get_user on the failure AFAICS (exit_robust_list). But I > might > be missing something, it's been quite some time since I've looked > into > futex code. Can you explain how -ENOMEM would get propagated down past the page fault handler? This isn't get_user_pages(), which can just pass -ENOMEM on to the caller. If there is code to pass -ENOMEM on past the page fault exception handler, I have not been able to find it. How does this work? >=20 > > futex_cleanup > > fuxtex_exit_release > > do_exit > > do_group_exit > > get_signal > > arch_do_signal_or_restart > > exit_to_user_mode_prepare > > syscall_exit_to_user_mode > > do_syscall > > entry_SYSCALL_64 > > syscall > >=20 > > Both memory.max and memory.zswap.max are hit. I don't see how this > > could ever make forward progress - the futex fault will retry until > > it > > succeeds. >=20 > I must be missing something but I do not see the retry, could you > point > me where this is happening please? >=20 --=20 All Rights Reversed.