From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDFDEC3DA49 for ; Sat, 20 Jul 2024 07:58:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5EB2A6B0082; Sat, 20 Jul 2024 03:58:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 59C436B0085; Sat, 20 Jul 2024 03:58:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 463AF6B0088; Sat, 20 Jul 2024 03:58:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 284CD6B0082 for ; Sat, 20 Jul 2024 03:58:13 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A4E5E1C1E62 for ; Sat, 20 Jul 2024 07:58:12 +0000 (UTC) X-FDA: 82359378024.20.8BC2AE9 Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) by imf07.hostedemail.com (Postfix) with ESMTP id C47D540008 for ; Sat, 20 Jul 2024 07:58:10 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=M5D5QS1w; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.46 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721462242; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Dq9mEwBaFXTGM+hiazChu7J5fn6+BqZk9HMW3uMVsrM=; b=qunsTwG9W7ccix4Ud8iE/oIDZf7Ts1SOkrIdQESqdf0xAOCTCNspeNkbh7Bj/gnNq3MpAX ISm4mDBokOtYil7QHUGcDXJpgYOfXgKeMxYOL+NxnWjkXsSRgJMK/ApT94T6JWOA0fRT69 nrrdSWdw8QV8KNlZ+eDOgWZraPVMu/U= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721462242; a=rsa-sha256; cv=none; b=pnf/oOuF8AjyCNdfpuVHrhGNhtvYQYKnYG+mdHgE1s50bVJhDQ/U7C6dnFOoCoGpm2KjJm hNCBAPMccCQF4Z9pKsYvUFztsfiF6QSeI0IMwyT+x64olIKFKNDYY/SRq5VNjNm/AipbGB 0h4D9GTSlZM0rv/qkhki8/QreoVJSfM= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=M5D5QS1w; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.46 as permitted sender) smtp.mailfrom=mjguzik@gmail.com Received: by mail-ed1-f46.google.com with SMTP id 4fb4d7f45d1cf-5a10bb7b237so1839868a12.0 for ; Sat, 20 Jul 2024 00:58:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721462289; x=1722067089; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Dq9mEwBaFXTGM+hiazChu7J5fn6+BqZk9HMW3uMVsrM=; b=M5D5QS1wYYC2SfWs052t96XJXqW6anOC9wJoGHqa7kjeQl10rYqLhMmiCnHCOOxYwc JC7q1xYVwWujhmFw6rmC3tl4bI7hTAc7y3s4Y3wXPdX0fu9K/WMTq8pJsODvAJTmF4si EQ4w5T/CnITXHlE0/8GYIU035dxEVTOdVcBnJWgeVxHAjwvkNqN+nYp9LCfP/MHZ4WY3 1jQcicLMXQMP1QqftKYXgLa+EAoDTyWHjxm4eZuDp/w60k2UZE5GnVgoNBn674AmcPJN rVjzXg1oOVwrb2yPa14tBys47EMpoVfKhkjR9DhIEJwQO5xQ3Jusj9meVZuRkHXk8S5P O03w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721462289; x=1722067089; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Dq9mEwBaFXTGM+hiazChu7J5fn6+BqZk9HMW3uMVsrM=; b=wzSudOF/jnNsOrGzt7Jzzjj5uED0Nj0r1YnjUNWgiLsrZIXvVecpJLDY3HdyUfreBh cXX5knIbSLZ72XyqP0NEbcN85Mfk0uAN5scMkqoO817H5e2y7C/etaMS6xlhdKI/6v+0 ThuJiGjqOsWjQibUN7ULUCfeaPEBVwnYao+hfgIodt7EE00aVcCz7aOss2HbQW3U3zpV I5nP4dZhHZwA6vzV5RVUBb3vAQVOuUo7QrQg+hNNHYp4R7F1IjP7ZQA9TiNeAK6KdHnx S5AjoAaJJrReUbx+aAGRBGIqdYIQ4A1nEO8Aunl61OasPPZBUxsMRRerJruspi42DHou NeRw== X-Forwarded-Encrypted: i=1; AJvYcCUmdR3fZWaW8CdOqf/eSVfQz4BM91T5ultgXHq+arOyF9BCIaFEcy82u5y4Qt2QLqrA1G7W7mxjnS/WqNfwllrWR0Q= X-Gm-Message-State: AOJu0YyhjzvTtq0b1rX6EfTwVftVMhEFy6A65b/hcxqHoABiNdADm3my t0drNU7C8RQSDh57t/BbDc6/G5oYTtNZof628b3DtZaUnOFCnzHH4/uVjm9C3ufOi+jR6HKuNUM 78ntm2NUeXQy0xDZoYqjg2moNEx0= X-Google-Smtp-Source: AGHT+IEhrGLWpJ1TYjgdsuTbvNS/K7q0osoWIkuu0IhKJUG2afbVyZKksQPBR2EXLlrhv9rayzo+j8S1NHDSTgxYE3k= X-Received: by 2002:a17:907:3e21:b0:a6f:49b1:dec5 with SMTP id a640c23a62f3a-a7a4c445037mr53146366b.46.1721462288778; Sat, 20 Jul 2024 00:58:08 -0700 (PDT) MIME-Version: 1.0 References: <1998d479-eb1a-4bc8-a11e-59f8dd71aadb@amd.com> <7a06a14e-44d5-450a-bd56-1c348c2951b6@amd.com> <893a263a-0038-4b4b-9031-72567b966f73@amd.com> In-Reply-To: From: Mateusz Guzik Date: Sat, 20 Jul 2024 09:57:56 +0200 Message-ID: Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system To: Yu Zhao Cc: Bharata B Rao , linux-mm@kvack.org, linux-kernel@vger.kernel.org, nikunj@amd.com, "Upadhyay, Neeraj" , Andrew Morton , David Hildenbrand , willy@infradead.org, vbabka@suse.cz, kinseyho@google.com, Mel Gorman Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C47D540008 X-Stat-Signature: yfsf534yyqupgebkrio9gzhbm7uc4qzx X-Rspam-User: X-HE-Tag: 1721462290-963080 X-HE-Meta: U2FsdGVkX180YqrBNIAIdLhps8xCUKhLTXVrM2SMIAsG+0dgUfmWOPInpaXfatcfBz31UrJrtUBgzOXXe5jKaBRtYYWcPqN6PTtPqmAvYZ3ReSmXw0KibkH6dtx5c1D7WyG+NalAaZZxpS2+DTgmUELiytOmY8NcRPqha28dbkSHklPx81G5OklAB99N9RxAMszrQR8i7lcoPS5rhT+Ypp1EXsJV+bbaINtcd9RiN8EB7queQsQFtnS5sqs8LqShUy4yM6+5ur99AEWC5Ku1kGwQsBtbN2FK1hJds3NHwbsSwraX/ABt/AOLgbmTx5U/7oV06DovaWT/8x+J3aZR3OkLsT2gWxcg0YRONUsiWzSfG/bUHNPXelbyQHnSrDdyvQgLr+jylBw7ND1k0TrHf/AN4FZS3Db7KgUqfxI54BWzCJd7i3ZlXEZf5kxFl40uFdik1t0/GLjlpvV3ihkN+inxDBJGVCah53Rtt8+bwGrw2HZldUyAUVqSOIXp5GKMKfBTXVYPC4retdJTpRqB6dZwf7YppxBqATVw/NZIGAt2r/uAAVCdN29QH3XRDtlDq1Dp3ehPYRDquiwbkM0zIhJzt19gNMKfwcbYT9KPhzJg6br6TpBCfxPb+mBw6T3/9lg6OO/+gkoSaTkpoafh3gFnMG0+QBWpOFtDDe6rmEL55LFnrXVX5U1ClKlNXncNiCwnCkBPWeHSas0u8YOzAUwf1jy0OAFHatJtF/okvd6sIxp3EQ+5tjTU8wf/bRY/pLK9h0SBCdc5pP9dLmkowKDeFtjk817eT/oSSkUyoftpnK+TOD/MJ6jjLl1QqIZsDl9fjJmYAMOhoRRWZ/VgEwLLZ154oeUSz2HzMpRmt7hpbSI8ddqyTHtPlT6a2DnE58LevrPSCm2DGrV5/OiUGLG+Rg3BZp59epItms2i8lHMVDPyypbDcNuJJLtFDud9nQ2SNPMczwrqcHHx8IW j6T0mIOJ /iORTkzttB6lHEf6AerFdnz3xC/tayQgQe03mQzE4qscGLDwxHhkaE+akm6tSYgMKfPa0koz4HI5Qi2ixNhirNp8uZNTvrsryPqSdqh6DWieRrk0Z8sY6b2f+RXSGd7Z6d9kE50WDg9Gsebp8nbtPKsskvefLM3zXUQ7+kJlFp45M+KvjXskAypcku1rKy3D2AiPiHnHfUloW98bgI7PKGR5Ay/FKflSRGy4YYlZs5FshBzAoDShDlCRz3ZerhMLRPMNJxC+8mw2L2FQH2AC4uklFt3VjVinlpqeExI839EwJxO6LlWV2ft6Q+YWEQ0K9DOMwB2liQvKsUdVk1Wcj3EtEN+T7Sa0Ub0RGSFctr40IGd6f+mevVxI3u4haRfBwcDEbt/k7b8rM/4sOlYzl82gq1iiS0PnHO1M/UFr2IMWC5Eo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 19, 2024 at 10:21=E2=80=AFPM Yu Zhao wrote: > I can't come up with any reasonable band-aid at this moment, i.e., > something not too ugly to work around a more fundamental scalability > problem. > > Before I give up: what type of dirty data was written back to the nvme > device? Was it page cache or swap? > With my corporate employee hat on, I would like to note a couple of three things. 1. there are definitely bugs here and someone(tm) should sort them out(R) however.... 2. the real goal is presumably to beat the kernel into shape where production kernels no longer suffer lockups running this workload on this hardware 3. the flamegraph (to be found in [1]) shows expensive debug enabled, notably for preemption count (search for preempt_count_sub to see) 4. I'm told the lruvec problem is being worked on (but no ETA) and I don't think the above justifies considering any hacks or otherwise putting more pressure on it It is plausible eliminating the aforementioned debug will be good enough. Apart from that I note percpu_counter_add_batch (+ irq debug) accounts for 5.8% cpu time. This will of course go down if irq tracing is disabled, but so happens I optimized this routine to be faster single-threaded (in particular by dodging the interrupt trip). The patch is hanging out in the mm tree [2] and is trivially applicable for testing. Even if none of the debug opts can get modified, this should drop percpu_counter_add_batch to 1.5% or so, which may or may not have a side effect of avoiding the lockup problem. [1]: https://lore.kernel.org/lkml/584ecb5e-b1fc-4b43-ba36-ad396d379fad@amd.= com/ [2]: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h= =3Dmm-everything&id=3D51d821654be4286b005ad2b7dc8b973d5008a2ec --=20 Mateusz Guzik