From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9731DCA9EC0 for ; Mon, 28 Oct 2019 23:18:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3464F208C0 for ; Mon, 28 Oct 2019 23:18:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3464F208C0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=davemloft.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 84FCC6B0005; Mon, 28 Oct 2019 19:18:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7FFF56B0006; Mon, 28 Oct 2019 19:18:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 715186B0007; Mon, 28 Oct 2019 19:18:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0204.hostedemail.com [216.40.44.204]) by kanga.kvack.org (Postfix) with ESMTP id 4796A6B0005 for ; Mon, 28 Oct 2019 19:18:20 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id D65228249980 for ; Mon, 28 Oct 2019 23:18:19 +0000 (UTC) X-FDA: 76094759118.09.hand66_65cc7e794b512 X-HE-Tag: hand66_65cc7e794b512 X-Filterd-Recvd-Size: 3603 Received: from shards.monkeyblade.net (shards.monkeyblade.net [23.128.96.9]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Mon, 28 Oct 2019 23:18:19 +0000 (UTC) Received: from localhost (unknown [IPv6:2601:601:9f00:1e2::d71]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) (Authenticated sender: davem-davemloft) by shards.monkeyblade.net (Postfix) with ESMTPSA id AEEE214BE72C3; Mon, 28 Oct 2019 16:18:17 -0700 (PDT) Date: Mon, 28 Oct 2019 16:18:17 -0700 (PDT) Message-Id: <20191028.161817.126838643568293118.davem@davemloft.net> To: tj@kernel.org Cc: netdev@vger.kernel.org, kernel-team@fb.com, linux-kernel@vger.kernel.org, josef@toxicpanda.com, eric.dumazet@gmail.com, jakub.kicinski@netronome.com, hannes@cmpxchg.org, linux-mm@kvack.org, mgorman@suse.de, akpm@linux-foundation.org Subject: Re: [PATCH v2] net: fix sk_page_frag() recursion from memory reclaim From: David Miller In-Reply-To: <20191024205027.GF3622521@devbig004.ftw2.facebook.com> References: <20191019170141.GQ18794@devbig004.ftw2.facebook.com> <20191024205027.GF3622521@devbig004.ftw2.facebook.com> X-Mailer: Mew version 6.8 on Emacs 26.1 Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.5.12 (shards.monkeyblade.net [149.20.54.216]); Mon, 28 Oct 2019 16:18:18 -0700 (PDT) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Tejun Heo Date: Thu, 24 Oct 2019 13:50:27 -0700 > sk_page_frag() optimizes skb_frag allocations by using per-task > skb_frag cache when it knows it's the only user. The condition is > determined by seeing whether the socket allocation mask allows > blocking - if the allocation may block, it obviously owns the task's > context and ergo exclusively owns current->task_frag. > > Unfortunately, this misses recursion through memory reclaim path. > Please take a look at the following backtrace. ... > In [0], tcp_send_msg_locked() was using current->page_frag when it > called sk_wmem_schedule(). It already calculated how many bytes can > be fit into current->page_frag. Due to memory pressure, > sk_wmem_schedule() called into memory reclaim path which called into > xfs and then IO issue path. Because the filesystem in question is > backed by nbd, the control goes back into the tcp layer - back into > tcp_sendmsg_locked(). > > nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes > sense - it's in the process of freeing memory and wants to be able to, > e.g., drop clean pages to make forward progress. However, this > confused sk_page_frag() called from [2]. Because it only tests > whether the allocation allows blocking which it does, it now thinks > current->page_frag can be used again although it already was being > used in [0]. > > After [2] used current->page_frag, the offset would be increased by > the used amount. When the control returns to [0], > current->page_frag's offset is increased and the previously calculated > number of bytes now may overrun the end of allocated memory leading to > silent memory corruptions. > > Fix it by adding gfpflags_normal_context() which tests sleepable && > !reclaim and use it to determine whether to use current->task_frag. > > v2: Eric didn't like gfp flags being tested twice. Introduce a new > helper gfpflags_normal_context() and combine the two tests. > > Signed-off-by: Tejun Heo Applied and queued up for -stable, thanks Tejun.