From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB2A5C4167B for ; Mon, 30 Oct 2023 12:12:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 049F26B01A5; Mon, 30 Oct 2023 08:12:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F15B26B01A6; Mon, 30 Oct 2023 08:12:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB7306B01A7; Mon, 30 Oct 2023 08:12:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C8C7E6B01A5 for ; Mon, 30 Oct 2023 08:12:03 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9637012055D for ; Mon, 30 Oct 2023 12:12:03 +0000 (UTC) X-FDA: 81402014526.27.38A5F3B Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf22.hostedemail.com (Postfix) with ESMTP id 925A2C0020 for ; Mon, 30 Oct 2023 12:12:01 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=bbgPuYCQ; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=4MQfVbtv; dmarc=none; spf=pass (imf22.hostedemail.com: domain of jack@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698667921; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vBfz81xIWn4r9V4YAbxU176YDt6pDqO8Q0c1xsiNSTs=; b=htS7jJVOrq8wADyERcovbsurM/HNFHxkuS2uz7ZQwyYoVwP36nbZcOob9T1fUBCA7iodoJ rNOEb7sPwN2msv0JxYS7eKk4m8VoG/o/OC8lSyFPvXmD6/Sxn/TEtrtaExKHxTKelZ42d0 65LZKrut0iWDUUQptIeQ33j59itXQ+0= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=bbgPuYCQ; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=4MQfVbtv; dmarc=none; spf=pass (imf22.hostedemail.com: domain of jack@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=jack@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698667921; a=rsa-sha256; cv=none; b=fkFyu0DkhFv17WOp6R9o7JBWieZ3+NagN8RMVoAKZEUnH6jJrXijzSDkRCJ3ybyXkWOttT J8xdZcZ7HR1PpbCsdO3KiBD8NgiaxOAH4AUixKgblCUVPwEPLKy3s2HzDpAAECbJDG4e/C qzLbyHPsRhtu9aDnWebKg4L7tmA99ww= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 8236821CDF; Mon, 30 Oct 2023 12:11:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1698667919; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vBfz81xIWn4r9V4YAbxU176YDt6pDqO8Q0c1xsiNSTs=; b=bbgPuYCQGGZlBaxvY7K6c7r0QzzAtjpsbwE6JJduB2M0DDiVDo7O5nqrh23dtypzj8jy4S ri8dMVQ0fJ2WW64D3/iaLJu3C0V/C8QXwrYQCukK90CJuGQzGlrvTn9A5INIkC+aO8IGBu cN74Z0cg764V3OM2GgMJjejQ5tq4jzo= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1698667919; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=vBfz81xIWn4r9V4YAbxU176YDt6pDqO8Q0c1xsiNSTs=; b=4MQfVbtv7w5IREapUP0Q3pFfdXNWL0Q4KAMVdpbmDTULNkynXCNedp28A/iZTgIm7QsAY2 QMKbCk6OXqoQk9DQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 69F6F138F8; Mon, 30 Oct 2023 12:11:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id OH1YGY+dP2UIaAAAMHmgww (envelope-from ); Mon, 30 Oct 2023 12:11:59 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id DE261A05BC; Mon, 30 Oct 2023 13:11:58 +0100 (CET) Date: Mon, 30 Oct 2023 13:11:58 +0100 From: Jan Kara To: Mikulas Patocka Cc: Jan Kara , Vlastimil Babka , Marek =?utf-8?Q?Marczykowski-G=C3=B3recki?= , Andrew Morton , Matthew Wilcox , Michal Hocko , stable@vger.kernel.org, regressions@lists.linux.dev, Alasdair Kergon , Mike Snitzer , dm-devel@lists.linux.dev, linux-mm@kvack.org Subject: Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 Message-ID: <20231030121158.c2lamlhskgoj7kgk@quack3> References: <89320668-67a2-2a41-e577-a2f561e3dfdd@suse.cz> <818a23f2-c242-1c51-232d-d479c3bcbb6@redhat.com> <18a38935-3031-1f35-bc36-40406e2e6fd2@suse.cz> <20231030112844.g7b76cm2xxpovt6e@quack3> <7355fe90-5176-ea11-d6ed-a187c0146fdc@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7355fe90-5176-ea11-d6ed-a187c0146fdc@redhat.com> X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 925A2C0020 X-Stat-Signature: o8dkfiwfuupn4wucmzxntwohsc7y55ed X-HE-Tag: 1698667921-116944 X-HE-Meta: U2FsdGVkX1+XGYKo+SpJaIcs6SS6XUXWNV6XWVWtrBveNw8R2/mKtOz0C0pgHwFEAUxsHE1AHvDl1q5ki46Ks9ecLdAtDU9aP1VcJkOfGsXmraMp74z3ZMv04N/8+jcDLOKzhnLuHz2bk4dJvRMtADCd8rLyWb9mHqKmv54DSTOZo+SSK5cNuvS+YeHSGvuKoDGR/HP0wcVzBFQ5QCL0Cx0RygP9dXu7itmEoljpXX1qcfE7El0jeqhitpa9JbLDNRPPkB0zb2Vf8rj5V8oFdBVuxgDH4L1QRQuN8d+2A9sk6QIcU+7jjTBVhmzEB+tjlHWxOKwOXnQSAv8ZnMGVHEROd1E049JbNCJLD36+X0tmqJvYDjuCS+o9g9w6w0iQjN9PUmQFLvWlO4wdAP3TBGc31JrZUGkBUo+/jsNuLaYNL/3jRQXYbhJU4VG1iFdTjW4piCp56te0bdq+QpVL+I1HoHMyFxMSAeQoc7rfpbEL8FaTfvWAd6RrAy/c5vHiari77nhNmykflY/29WxPK+AIbC6W4FLmbW07du2892aElcNDZktn34YOvSrwquAaevLmIcHGnABhNb9F4pjHHaoR2lMBgKjgOEF3tcRqCEnJeVK7IisO+6mMTd4QwxgCY1G/CPs67jg75DtGyrOS+1n6mwqSGioX71IIQt7pOy6Y9xQjGwI3qnOfTRIJqjpk4V90dmAYO9pyQE0/8job/GU+BpyukFdO9FHP8Wcs3gVnYnN3mJ5Pj5qzldevnPa+Gj2kEvxXHXDJ4Wt+qteJhNpYwvakL3i3qiQdagHsab0UeBKhu9e+5/ssQ4DQNN7O+Dbjy4SdWuzRIHs1ioCQ1NdWmnS4NmmlHRQ8a15Rh3WtSVBwMM21Tl7IMwY3iSZJV/Rtg64cZ+767jrTUskHvYEM5nTFuSB+ysiBCMmZVFattADDVn3t+tUtJSQkQ8SVDiif0NnsfCIOR/VYn8/ hacy8bsi Qqpl1dLOqp3lrDVI9+jEAEPYUnSFri+ZlsO2L28N7bFg9Y1UzbddjlnXnpX/dwEtsIBLqkUi/oMD+sBoaFGlERFDuPCJVYvNQ3SiZMc+eNW1iWxRm+eDWR+EcWFC7mhcIWqAYeGqCAVcHWC4ZQA7KA2AFZAha90UpMT20gRMdGTwTN92IWB8gYIkoFUjwGweVNeiLFS/CdhSkaKTNNRVJjClOgfCKoa/YQVzZlm4vvm9eMXzdEfc16F1Q2U8Yr3+izkwoZvpyRGieKGiLjcPpn04sLPH+N5+FAM8oSdEF7eSyppw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon 30-10-23 12:49:01, Mikulas Patocka wrote: > On Mon, 30 Oct 2023, Jan Kara wrote: > > > >> What if we end up in "goto retry" more than once? I don't see a matching > > > > > > > > It is impossible. Before we jump to the retry label, we set > > > > __GFP_DIRECT_RECLAIM. mempool_alloc can't ever fail if > > > > __GFP_DIRECT_RECLAIM is present (it will just wait until some other task > > > > frees some objects into the mempool). > > > > > > Ah, missed that. And the traces don't show that we would be waiting for > > > that. I'm starting to think the allocation itself is really not the issue > > > here. Also I don't think it deprives something else of large order pages, as > > > per the sysrq listing they still existed. > > > > > > What I rather suspect is what happens next to the allocated bio such that it > > > works well with order-0 or up to costly_order pages, but there's some > > > problem causing a deadlock if the bio contains larger pages than that? > > > > Hum, so in all the backtraces presented we see that we are waiting for page > > writeback to complete but I don't see anything that would be preventing the > > bios from completing. Page writeback can submit quite large bios so it kind > > of makes sense that it trips up some odd behavior. Looking at the code > > I can see one possible problem in crypt_alloc_buffer() but it doesn't > > explain why reducing initial page order would help. Anyway: Are we > > guaranteed mempool has enough pages for arbitrarily large bio that can > > enter crypt_alloc_buffer()? I can see crypt_page_alloc() does limit the > > number of pages in the mempool to dm_crypt_pages_per_client plus I assume > > the percpu counter bias in cc->n_allocated_pages can limit the really > > available number of pages even further. So if a single bio is large enough > > to trip percpu_counter_read_positive(&cc->n_allocated_pages) >= > > dm_crypt_pages_per_client condition in crypt_page_alloc(), we can loop > > forever? But maybe this cannot happen for some reason... > > > > If this is not it, I think we need to find out why the writeback bios are > > not completeting. Probably I'd start with checking what is kcryptd, > > presumably responsible for processing these bios, doing. > > > > Honza > > cc->page_pool is initialized to hold BIO_MAX_VECS pages. crypt_map will > restrict the bio size to BIO_MAX_VECS (see dm_accept_partial_bio being > called from crypt_map). > > When we allocate a buffer in crypt_alloc_buffer, we try first allocation > without waiting, then we grab the mutex and we try allocation with > waiting. > > The mutex should prevent a deadlock when two processes allocate 128 pages > concurrently and wait for each other to free some pages. > > The limit to dm_crypt_pages_per_client only applies to pages allocated > from the kernel - when this limit is reached, we can still allocate from > the mempool, so it shoudn't cause deadlocks. Ah, ok, I missed the limitation of the bio size in crypt_map(). Thanks for explanation! So really the only advice I have now it to check what kcryptd is doing when the system is stuck. Because we didn't see it in any of the stacktrace dumps. Honza -- Jan Kara SUSE Labs, CR