* [PATCH 02/20] net: vm deadlock avoidance core
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (2 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 18/20] netlink: add SOCK_VMIO support to AF_NETLINK Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 16/20] iscsi: add session context to ep_connect Peter Zijlstra
` (16 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Mike Christie, Trond Myklebust,
Pavel Machek
[-- Attachment #1: vm_deadlock_core.patch --]
[-- Type: text/plain, Size: 21414 bytes --]
In order to provide robust networked block devices there must be a guarantee
of progress. That is, the block device must never stall because of (physical)
OOM, because the device itself might be needed to get out of it (reclaim).
This means that the device queue must always be unplugable, this in turn means
that it must always find enough memory to build/send packets over the network
_and_ receive (level 7) ACKs for those packets.
The network stack has a huge capacity for buffering packets; waiting for
user-space to read them. There is a practical limit imposed to avoid DoS
scenarios. These two things make for a deadlock; what if the receive limit is
reached and all packets are buffered in non-critical sockets (those not serving
the network block device waiting for an ACK to free a page).
Memory pressure will add to that; what if there is simply no memory left to
receive packets in.
This patch provides a service to register sockets as critical; SOCK_VMIO
is a promise the socket will never block on receive. Along with with a memory
reserve that will service a limited number of packets this can guarantee a
limited service to these critical sockets.
When we make sure that packets allocated from the reserve will only service
critical sockets we will not lose the memory and can guarantee progress.
Since memory is tight and the reserve modest, we do not want to lose memory to
fragmentation effects. Hence a very simple allocator is used to guarantee that
the memory used for each packet is returned to the page allocator.
(Note on the name SOCK_VMIO; the basic problem is a circular dependency between
the network and virtual memory subsystems which needs to be broken. This does
make VM network IO - and only VM network IO - special, it does not generalize)
Converted protocols:
IPv4 & IPv6:
- icmp
- udp
- tcp
IPv4:
- igmp
Caveat: currently there is no support for higher order allocations. So
basically everything jumbo frame will fail for these situations. To mitigate
this one could add a tiny pool of pre-allocated 2nd-order pages to the
emergency allocator.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Daniel Phillips <phillips@google.com>
CC: David Miller <davem@davemloft.net>
CC: Mike Christie <michaelc@cs.wisc.edu>
CC: Trond Myklebust <trond.myklebust@fys.uio.no>
CC: Pavel Machek <pavel@ucw.cz>
---
include/linux/gfp.h | 3 +
include/linux/mmzone.h | 1
include/linux/skbuff.h | 13 +++++--
include/net/sock.h | 39 +++++++++++++++++++++
mm/page_alloc.c | 35 +++++++++++++++++--
net/core/skbuff.c | 85 +++++++++++++++++++++++++++++++++++++----------
net/core/sock.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++
net/ipv4/icmp.c | 3 +
net/ipv4/igmp.c | 3 +
net/ipv4/tcp_ipv4.c | 3 +
net/ipv4/udp.c | 8 +++-
net/ipv6/icmp.c | 3 +
net/ipv6/tcp_ipv6.c | 3 +
net/ipv6/udp.c | 3 +
14 files changed, 263 insertions(+), 27 deletions(-)
Index: linux-2.6/include/linux/gfp.h
===================================================================
--- linux-2.6.orig/include/linux/gfp.h
+++ linux-2.6/include/linux/gfp.h
@@ -46,6 +46,7 @@ struct vm_area_struct;
#define __GFP_ZERO ((__force gfp_t)0x8000u)/* Return zeroed page on success */
#define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
#define __GFP_HARDWALL ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */
+#define __GFP_EMERGENCY ((__force gfp_t)0x40000u) /* Use emergency reserves */
#define __GFP_BITS_SHIFT 20 /* Room for 20 __GFP_FOO bits */
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
@@ -54,7 +55,7 @@ struct vm_area_struct;
#define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
__GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
- __GFP_NOMEMALLOC|__GFP_HARDWALL)
+ __GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_EMERGENCY)
/* This equals 0, but use constants in case they ever change */
#define GFP_NOWAIT (GFP_ATOMIC & ~__GFP_HIGH)
Index: linux-2.6/include/linux/mmzone.h
===================================================================
--- linux-2.6.orig/include/linux/mmzone.h
+++ linux-2.6/include/linux/mmzone.h
@@ -421,6 +421,7 @@ int percpu_pagelist_fraction_sysctl_hand
void __user *, size_t *, loff_t *);
int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *, int,
struct file *, void __user *, size_t *, loff_t *);
+void adjust_memalloc_reserve(int pages);
#include <linux/topology.h>
/* Returns the number of the current Node. */
Index: linux-2.6/include/linux/skbuff.h
===================================================================
--- linux-2.6.orig/include/linux/skbuff.h
+++ linux-2.6/include/linux/skbuff.h
@@ -282,7 +282,8 @@ struct sk_buff {
nfctinfo:3;
__u8 pkt_type:3,
fclone:2,
- ipvs_property:1;
+ ipvs_property:1,
+ emergency:1;
__be16 protocol;
void (*destructor)(struct sk_buff *skb);
@@ -327,10 +328,13 @@ struct sk_buff {
#include <asm/system.h>
+#define SKB_ALLOC_FCLONE 0x01
+#define SKB_ALLOC_RX 0x02
+
extern void kfree_skb(struct sk_buff *skb);
extern void __kfree_skb(struct sk_buff *skb);
extern struct sk_buff *__alloc_skb(unsigned int size,
- gfp_t priority, int fclone);
+ gfp_t priority, int flags);
static inline struct sk_buff *alloc_skb(unsigned int size,
gfp_t priority)
{
@@ -340,7 +344,7 @@ static inline struct sk_buff *alloc_skb(
static inline struct sk_buff *alloc_skb_fclone(unsigned int size,
gfp_t priority)
{
- return __alloc_skb(size, priority, 1);
+ return __alloc_skb(size, priority, SKB_ALLOC_FCLONE);
}
extern struct sk_buff *alloc_skb_from_cache(kmem_cache_t *cp,
@@ -1101,7 +1105,8 @@ static inline void __skb_queue_purge(str
static inline struct sk_buff *__dev_alloc_skb(unsigned int length,
gfp_t gfp_mask)
{
- struct sk_buff *skb = alloc_skb(length + NET_SKB_PAD, gfp_mask);
+ struct sk_buff *skb =
+ __alloc_skb(length + NET_SKB_PAD, gfp_mask, SKB_ALLOC_RX);
if (likely(skb))
skb_reserve(skb, NET_SKB_PAD);
return skb;
Index: linux-2.6/include/net/sock.h
===================================================================
--- linux-2.6.orig/include/net/sock.h
+++ linux-2.6/include/net/sock.h
@@ -391,6 +391,7 @@ enum sock_flags {
SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */
SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
+ SOCK_VMIO, /* the VM depends on us - make sure we're serviced */
};
static inline void sock_copy_flags(struct sock *nsk, struct sock *osk)
@@ -413,6 +414,44 @@ static inline int sock_flag(struct sock
return test_bit(flag, &sk->sk_flags);
}
+static inline int sk_has_vmio(struct sock *sk)
+{
+ return sock_flag(sk, SOCK_VMIO);
+}
+
+#define MAX_PAGES_PER_PACKET 2
+#define MAX_FRAGMENTS ((65536 + 1500 - 1) / 1500)
+/*
+ * Set an upper limit on the number of pages used for RX skbs.
+ */
+#define RX_RESERVE_PAGES (64 * MAX_PAGES_PER_PACKET)
+
+/*
+ * Guestimate the per request queue TX upper bound.
+ */
+#define TX_RESERVE_PAGES \
+ (4 * MAX_FRAGMENTS * MAX_PAGES_PER_PACKET)
+
+extern atomic_t vmio_socks;
+extern atomic_t emergency_rx_pages_used;
+
+static inline int sk_vmio_socks(void)
+{
+ return atomic_read(&vmio_socks);
+}
+
+extern void * sk_emergency_rx_alloc(size_t size, gfp_t gfp_mask);
+
+static inline void sk_emergency_rx_free(void *page, size_t size)
+{
+ free_page((unsigned long)page);
+ atomic_dec(&emergency_rx_pages_used);
+}
+
+extern void sk_adjust_memalloc(int socks, int tx_reserve_pages);
+extern int sk_set_vmio(struct sock *sk);
+extern int sk_clear_vmio(struct sock *sk);
+
static inline void sk_acceptq_removed(struct sock *sk)
{
sk->sk_ack_backlog--;
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -83,6 +83,7 @@ EXPORT_SYMBOL(zone_table);
static char *zone_names[MAX_NR_ZONES] = { "DMA", "DMA32", "Normal", "HighMem" };
static DEFINE_SPINLOCK(min_free_lock);
int min_free_kbytes = 1024;
+int var_free_kbytes;
unsigned long __meminitdata nr_kernel_pages;
unsigned long __meminitdata nr_all_pages;
@@ -971,8 +972,8 @@ restart:
/* This allocation should allow future memory freeing. */
- if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
- && !in_interrupt()) {
+ if ((((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
+ && !in_interrupt()) || (gfp_mask & __GFP_EMERGENCY)) {
if (!(gfp_mask & __GFP_NOMEMALLOC)) {
nofail_alloc:
/* go through the zonelist yet again, ignoring mins */
@@ -2197,7 +2198,8 @@ static void setup_per_zone_lowmem_reserv
*/
static void __setup_per_zone_pages_min(void)
{
- unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
+ unsigned pages_min = (min_free_kbytes + var_free_kbytes)
+ >> (PAGE_SHIFT - 10);
unsigned long lowmem_pages = 0;
struct zone *zone;
unsigned long flags;
@@ -2258,6 +2260,33 @@ void setup_per_zone_pages_min(void)
spin_unlock_irqrestore(&min_free_lock, flags);
}
+/**
+ * adjust_memalloc_reserve - adjust the memalloc reserve
+ * @pages: number of pages to add
+ *
+ * It adds a number of pages to the memalloc reserve; if
+ * the number was positive it kicks kswapd into action to
+ * satisfy the higher watermarks.
+ *
+ * NOTE: there is only a single caller, hence no locking.
+ */
+void adjust_memalloc_reserve(int pages)
+{
+ var_free_kbytes += pages << (PAGE_SHIFT - 10);
+ BUG_ON(var_free_kbytes < 0);
+ setup_per_zone_pages_min();
+ if (pages > 0) {
+ struct zone *zone;
+ for_each_zone(zone)
+ wakeup_kswapd(zone, 0);
+ }
+ if (pages)
+ printk(KERN_DEBUG "Emergency reserve: %d\n",
+ var_free_kbytes);
+}
+
+EXPORT_SYMBOL_GPL(adjust_memalloc_reserve);
+
/*
* Initialise min_free_kbytes.
*
Index: linux-2.6/net/core/skbuff.c
===================================================================
--- linux-2.6.orig/net/core/skbuff.c
+++ linux-2.6/net/core/skbuff.c
@@ -139,28 +139,30 @@ EXPORT_SYMBOL(skb_truesize_bug);
* Buffers may only be allocated from interrupts using a @gfp_mask of
* %GFP_ATOMIC.
*/
-struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
- int fclone)
+struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, int flags)
{
kmem_cache_t *cache;
struct skb_shared_info *shinfo;
struct sk_buff *skb;
u8 *data;
- cache = fclone ? skbuff_fclone_cache : skbuff_head_cache;
+ size = SKB_DATA_ALIGN(size);
+ cache = (flags & SKB_ALLOC_FCLONE)
+ ? skbuff_fclone_cache : skbuff_head_cache;
/* Get the HEAD */
skb = kmem_cache_alloc(cache, gfp_mask & ~__GFP_DMA);
if (!skb)
- goto out;
+ goto noskb;
/* Get the DATA. Size must match skb_add_mtu(). */
- size = SKB_DATA_ALIGN(size);
data = ____kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
if (!data)
goto nodata;
+allocated:
memset(skb, 0, offsetof(struct sk_buff, truesize));
+ skb->emergency = !cache;
skb->truesize = size + sizeof(struct sk_buff);
atomic_set(&skb->users, 1);
skb->head = data;
@@ -177,7 +179,7 @@ struct sk_buff *__alloc_skb(unsigned int
shinfo->ip6_frag_id = 0;
shinfo->frag_list = NULL;
- if (fclone) {
+ if (flags & SKB_ALLOC_FCLONE) {
struct sk_buff *child = skb + 1;
atomic_t *fclone_ref = (atomic_t *) (child + 1);
@@ -185,13 +187,34 @@ struct sk_buff *__alloc_skb(unsigned int
atomic_set(fclone_ref, 1);
child->fclone = SKB_FCLONE_UNAVAILABLE;
+ child->emergency = skb->emergency;
}
out:
return skb;
+
nodata:
kmem_cache_free(cache, skb);
skb = NULL;
- goto out;
+noskb:
+ /* Attempt emergency allocation when RX skb. */
+ if (!(flags & SKB_ALLOC_RX) || !sk_vmio_socks())
+ goto out;
+
+ skb = sk_emergency_rx_alloc(kmem_cache_size(cache),
+ gfp_mask | __GFP_EMERGENCY);
+ if (!skb)
+ goto out;
+
+ data = sk_emergency_rx_alloc(size + sizeof(struct skb_shared_info),
+ gfp_mask | __GFP_EMERGENCY);
+ if (!data) {
+ sk_emergency_rx_free(skb, kmem_cache_size(cache));
+ skb = NULL;
+ goto out;
+ }
+
+ cache = NULL;
+ goto allocated;
}
/**
@@ -267,7 +290,7 @@ struct sk_buff *__netdev_alloc_skb(struc
{
struct sk_buff *skb;
- skb = alloc_skb(length + NET_SKB_PAD, gfp_mask);
+ skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, SKB_ALLOC_RX);
if (likely(skb)) {
skb_reserve(skb, NET_SKB_PAD);
skb->dev = dev;
@@ -315,7 +338,12 @@ static void skb_release_data(struct sk_b
if (skb_shinfo(skb)->frag_list)
skb_drop_fraglist(skb);
- kfree(skb->head);
+ if (skb->emergency)
+ sk_emergency_rx_free(skb->head,
+ (skb->end - skb->head) +
+ sizeof(struct skb_shared_info));
+ else
+ kfree(skb->head);
}
}
@@ -324,24 +352,26 @@ static void skb_release_data(struct sk_b
*/
void kfree_skbmem(struct sk_buff *skb)
{
- struct sk_buff *other;
+ struct kmem_cache *cache = skbuff_head_cache;
+ struct sk_buff *free = skb;
atomic_t *fclone_ref;
skb_release_data(skb);
switch (skb->fclone) {
case SKB_FCLONE_UNAVAILABLE:
- kmem_cache_free(skbuff_head_cache, skb);
- break;
+ goto free;
case SKB_FCLONE_ORIG:
+ cache = skbuff_fclone_cache;
fclone_ref = (atomic_t *) (skb + 2);
if (atomic_dec_and_test(fclone_ref))
- kmem_cache_free(skbuff_fclone_cache, skb);
- break;
+ goto free;
+ return;
case SKB_FCLONE_CLONE:
+ cache = skbuff_fclone_cache;
fclone_ref = (atomic_t *) (skb + 1);
- other = skb - 1;
+ free = skb - 1;
/* The clone portion is available for
* fast-cloning again.
@@ -349,9 +379,15 @@ void kfree_skbmem(struct sk_buff *skb)
skb->fclone = SKB_FCLONE_UNAVAILABLE;
if (atomic_dec_and_test(fclone_ref))
- kmem_cache_free(skbuff_fclone_cache, other);
- break;
+ goto free;
+ return;
};
+
+free:
+ if (skb->emergency)
+ sk_emergency_rx_free(free, kmem_cache_size(cache));
+ else
+ kmem_cache_free(cache, free);
}
/**
@@ -435,6 +471,12 @@ struct sk_buff *skb_clone(struct sk_buff
atomic_t *fclone_ref = (atomic_t *) (n + 1);
n->fclone = SKB_FCLONE_CLONE;
atomic_inc(fclone_ref);
+ } else if (skb->emergency) {
+ n = sk_emergency_rx_alloc(kmem_cache_size(skbuff_head_cache),
+ gfp_mask | __GFP_EMERGENCY);
+ if (!n)
+ return NULL;
+ n->fclone = SKB_FCLONE_UNAVAILABLE;
} else {
n = kmem_cache_alloc(skbuff_head_cache, gfp_mask);
if (!n)
@@ -470,6 +512,7 @@ struct sk_buff *skb_clone(struct sk_buff
#if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
C(ipvs_property);
#endif
+ C(emergency);
C(protocol);
n->destructor = NULL;
#ifdef CONFIG_NETFILTER
@@ -690,7 +733,13 @@ int pskb_expand_head(struct sk_buff *skb
size = SKB_DATA_ALIGN(size);
- data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
+ if (skb->emergency) {
+ data = sk_emergency_rx_alloc(size + sizeof(struct skb_shared_info),
+ gfp_mask | __GFP_EMERGENCY);
+ if (!data)
+ goto nodata;
+ } else
+ data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
if (!data)
goto nodata;
Index: linux-2.6/net/ipv4/icmp.c
===================================================================
--- linux-2.6.orig/net/ipv4/icmp.c
+++ linux-2.6/net/ipv4/icmp.c
@@ -938,6 +938,9 @@ int icmp_rcv(struct sk_buff *skb)
goto error;
}
+ if (unlikely(skb->emergency))
+ goto drop;
+
if (!pskb_pull(skb, sizeof(struct icmphdr)))
goto error;
Index: linux-2.6/net/ipv4/tcp_ipv4.c
===================================================================
--- linux-2.6.orig/net/ipv4/tcp_ipv4.c
+++ linux-2.6/net/ipv4/tcp_ipv4.c
@@ -1093,6 +1093,9 @@ int tcp_v4_rcv(struct sk_buff *skb)
if (!sk)
goto no_tcp_socket;
+ if (unlikely(skb->emergency && !sk_has_vmio(sk)))
+ goto discard_and_relse;
+
process:
if (sk->sk_state == TCP_TIME_WAIT)
goto do_time_wait;
Index: linux-2.6/net/ipv4/udp.c
===================================================================
--- linux-2.6.orig/net/ipv4/udp.c
+++ linux-2.6/net/ipv4/udp.c
@@ -1136,7 +1136,12 @@ int udp_rcv(struct sk_buff *skb)
sk = udp_v4_lookup(saddr, uh->source, daddr, uh->dest, skb->dev->ifindex);
if (sk != NULL) {
- int ret = udp_queue_rcv_skb(sk, skb);
+ int ret;
+
+ if (unlikely(skb->emergency && !sk_has_vmio(sk)))
+ goto drop_noncritical;
+
+ ret = udp_queue_rcv_skb(sk, skb);
sock_put(sk);
/* a return value > 0 means to resubmit the input, but
@@ -1147,6 +1152,7 @@ int udp_rcv(struct sk_buff *skb)
return 0;
}
+drop_noncritical:
if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
goto drop;
nf_reset(skb);
Index: linux-2.6/net/core/sock.c
===================================================================
--- linux-2.6.orig/net/core/sock.c
+++ linux-2.6/net/core/sock.c
@@ -195,6 +195,93 @@ __u32 sysctl_rmem_default = SK_RMEM_MAX;
/* Maximal space eaten by iovec or ancilliary data plus some space */
int sysctl_optmem_max = sizeof(unsigned long)*(2*UIO_MAXIOV + 512);
+static DEFINE_SPINLOCK(memalloc_lock);
+
+atomic_t vmio_socks;
+atomic_t emergency_rx_pages_used;
+
+/**
+ * sk_adjust_memalloc - adjust the global memalloc reserve for critical RX
+ * @socks: number of new %SOCK_VMIO sockets
+ * @tx_resserve_pages: number of pages to (un)reserve for TX
+ *
+ * This function adjusts the memalloc reserve based on system demand.
+ * The RX reserve is a limit, and only added once, not for each socket.
+ *
+ * NOTE:
+ * @tx_reserve_pages is an upper-bound of memory used for TX hence
+ * we need not account the pages like we do for %RX_RESERVE_PAGES.
+ */
+void sk_adjust_memalloc(int socks, int tx_reserve_pages)
+{
+ unsigned long flags;
+ int reserve = tx_reserve_pages;
+ int nr_socks;
+
+ spin_lock_irqsave(&memalloc_lock, flags);
+ if (socks) {
+ nr_socks = atomic_add_return(socks, &vmio_socks);
+ BUG_ON(nr_socks < 0);
+
+ if (nr_socks - socks == 0)
+ reserve += RX_RESERVE_PAGES;
+ if (nr_socks == 0)
+ reserve -= RX_RESERVE_PAGES;
+ }
+ adjust_memalloc_reserve(reserve);
+ spin_unlock_irqrestore(&memalloc_lock, flags);
+}
+EXPORT_SYMBOL_GPL(sk_adjust_memalloc);
+
+/**
+ * sk_set_vmio - sets %SOCK_VMIO
+ * @sk: socket to set it on
+ *
+ * Set %SOCK_VMIO on a socket and increase the memalloc reserve
+ * accordingly.
+ */
+int sk_set_vmio(struct sock *sk)
+{
+ int set = sock_flag(sk, SOCK_VMIO);
+ if (!set) {
+ sk_adjust_memalloc(1, 0);
+ sock_set_flag(sk, SOCK_VMIO);
+ sk->sk_allocation |= __GFP_EMERGENCY;
+ }
+ return !set;
+}
+EXPORT_SYMBOL_GPL(sk_set_vmio);
+
+int sk_clear_vmio(struct sock *sk)
+{
+ int set = sock_flag(sk, SOCK_VMIO);
+ if (set) {
+ sk_adjust_memalloc(-1, 0);
+ sock_reset_flag(sk, SOCK_VMIO);
+ sk->sk_allocation &= ~__GFP_EMERGENCY;
+ }
+ return set;
+}
+EXPORT_SYMBOL_GPL(sk_clear_vmio);
+
+void * sk_emergency_rx_alloc(size_t size, gfp_t gfp_mask)
+{
+ void * page = NULL;
+
+ if (size > PAGE_SIZE)
+ return page;
+
+ if (atomic_add_unless(&emergency_rx_pages_used, 1, RX_RESERVE_PAGES)) {
+ page = (void *)__get_free_page(gfp_mask);
+ if (!page) {
+ WARN_ON(1);
+ atomic_dec(&emergency_rx_pages_used);
+ }
+ }
+
+ return page;
+}
+
static int sock_set_timeout(long *timeo_p, char __user *optval, int optlen)
{
struct timeval tv;
@@ -881,6 +968,7 @@ void sk_free(struct sock *sk)
struct sk_filter *filter;
struct module *owner = sk->sk_prot_creator->owner;
+ sk_clear_vmio(sk);
if (sk->sk_destruct)
sk->sk_destruct(sk);
Index: linux-2.6/net/ipv6/icmp.c
===================================================================
--- linux-2.6.orig/net/ipv6/icmp.c
+++ linux-2.6/net/ipv6/icmp.c
@@ -599,6 +599,9 @@ static int icmpv6_rcv(struct sk_buff **p
ICMP6_INC_STATS_BH(idev, ICMP6_MIB_INMSGS);
+ if (unlikely(skb->emergency))
+ goto discard_it;
+
saddr = &skb->nh.ipv6h->saddr;
daddr = &skb->nh.ipv6h->daddr;
Index: linux-2.6/net/ipv6/tcp_ipv6.c
===================================================================
--- linux-2.6.orig/net/ipv6/tcp_ipv6.c
+++ linux-2.6/net/ipv6/tcp_ipv6.c
@@ -1216,6 +1216,9 @@ static int tcp_v6_rcv(struct sk_buff **p
if (!sk)
goto no_tcp_socket;
+ if (unlikely(skb->emergency && !sk_has_vmio(sk)))
+ goto discard_and_relse;
+
process:
if (sk->sk_state == TCP_TIME_WAIT)
goto do_time_wait;
Index: linux-2.6/net/ipv6/udp.c
===================================================================
--- linux-2.6.orig/net/ipv6/udp.c
+++ linux-2.6/net/ipv6/udp.c
@@ -499,6 +499,9 @@ static int udpv6_rcv(struct sk_buff **ps
sk = udp_v6_lookup(saddr, uh->source, daddr, uh->dest, dev->ifindex);
if (sk == NULL) {
+ if (unlikely(skb->emergency && !sk_has_vmio(sk)))
+ goto discard;
+
if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb))
goto discard;
Index: linux-2.6/net/ipv4/igmp.c
===================================================================
--- linux-2.6.orig/net/ipv4/igmp.c
+++ linux-2.6/net/ipv4/igmp.c
@@ -927,6 +927,9 @@ int igmp_rcv(struct sk_buff *skb)
return 0;
}
+ if (unlikely(skb->emergency))
+ goto drop;
+
if (!pskb_may_pull(skb, sizeof(struct igmphdr)))
goto drop;
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 13/20] nbd: use swapdev hook to make swap deadlock free
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (6 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 19/20] mm: a process flags to avoid blocking allocations Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 01/20] mm: serialize access to min_free_kbytes Peter Zijlstra
` (12 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Pavel Machek
[-- Attachment #1: nbd_vmio.patch --]
[-- Type: text/plain, Size: 3273 bytes --]
Set SOCK_VMIO on the NBD socket and make sure the request_fn always
runs with PF_MEMALLOC when used as a swapper, this ensures we can
always flush the requests.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Daniel Phillips <phillips@google.com>
CC: Pavel Machek <pavel@ucw.cz>
---
drivers/block/nbd.c | 36 +++++++++++++++++++++++++++++++-----
1 file changed, 31 insertions(+), 5 deletions(-)
Index: linux-2.6/drivers/block/nbd.c
===================================================================
--- linux-2.6.orig/drivers/block/nbd.c 2006-09-07 18:44:12.000000000 +0200
+++ linux-2.6/drivers/block/nbd.c 2006-09-07 18:44:22.000000000 +0200
@@ -139,7 +139,6 @@ static int sock_xmit(struct socket *sock
spin_unlock_irqrestore(¤t->sighand->siglock, flags);
do {
- sock->sk->sk_allocation = GFP_NOIO;
iov.iov_base = buf;
iov.iov_len = size;
msg.msg_name = NULL;
@@ -406,10 +405,13 @@ static void nbd_clear_que(struct nbd_dev
static void do_nbd_request(request_queue_t * q)
{
struct request *req;
+ unsigned long pflags = current->flags;
+ struct nbd_device *lo = q->queuedata;
+
+ if (lo->sock && sk_has_vmio(lo->sock->sk))
+ current->flags |= PF_MEMALLOC;
while ((req = elv_next_request(q)) != NULL) {
- struct nbd_device *lo;
-
blkdev_dequeue_request(req);
dprintk(DBG_BLKDEV, "%s: request %p: dequeued (flags=%lx)\n",
req->rq_disk->disk_name, req, req->flags);
@@ -417,8 +419,6 @@ static void do_nbd_request(request_queue
if (!(req->flags & REQ_CMD))
goto error_out;
- lo = req->rq_disk->private_data;
-
BUG_ON(lo->magic != LO_MAGIC);
nbd_cmd(req) = NBD_CMD_READ;
@@ -472,6 +472,7 @@ error_out:
* plug the device to close it.
*/
blk_plug_device(q);
+ current->flags = pflags;
return;
}
@@ -530,6 +531,7 @@ static int nbd_ioctl(struct inode *inode
if (S_ISSOCK(inode->i_mode)) {
lo->file = file;
lo->sock = SOCKET_I(inode);
+ lo->sock->sk->sk_allocation = GFP_NOIO;
error = 0;
} else {
fput(file);
@@ -599,10 +601,33 @@ static int nbd_ioctl(struct inode *inode
return -EINVAL;
}
+static int nbd_swapdev(struct gendisk *disk, int enable)
+{
+ struct nbd_device *lo = disk->private_data;
+
+ if (!lo->sock)
+ return -ENODEV;
+
+ if (enable) {
+ sk_adjust_memalloc(0, TX_RESERVE_PAGES);
+ if (!sk_set_vmio(lo->sock->sk))
+ printk(KERN_WARNING
+ "failed to set SOCK_VMIO on NBD socket\n");
+ } else {
+ if (!sk_clear_vmio(lo->sock->sk))
+ printk(KERN_WARNING
+ "failed to clear SOCK_VMIO on NBD socket\n");
+ sk_adjust_memalloc(0, -TX_RESERVE_PAGES);
+ }
+
+ return 0;
+}
+
static struct block_device_operations nbd_fops =
{
.owner = THIS_MODULE,
.ioctl = nbd_ioctl,
+ .swapdev = nbd_swapdev,
};
/*
@@ -638,6 +663,7 @@ static int __init nbd_init(void)
put_disk(disk);
goto out;
}
+ disk->queue->queuedata = &nbd_dev[i];
blk_queue_max_segment_size(disk->queue, PAGE_SIZE);
blk_queue_max_hw_segments(disk->queue, 1);
blk_queue_max_phys_segments(disk->queue, 1);
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 07/20] nfs: add a comment explaining the use of PG_private in the NFS client
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (17 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 06/20] nfs: teach the NFS client how to treat PG_swapcache pages Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 17/20] scsi: propagate the swapdev hook into the scsi stack Peter Zijlstra
2006-09-12 16:37 ` [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Linus Torvalds
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Trond Myklebust
[-- Attachment #1: nfs_PG_private_comment.patch --]
[-- Type: text/plain, Size: 1058 bytes --]
Add a little comment explaining the use of PG_private in the NFS client.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Trond Myklebust <trond.myklebust@fys.uio.no>
---
fs/nfs/write.c | 5 +++++
1 file changed, 5 insertions(+)
Index: linux-2.6/fs/nfs/write.c
===================================================================
--- linux-2.6.orig/fs/nfs/write.c
+++ linux-2.6/fs/nfs/write.c
@@ -417,6 +417,11 @@ static int nfs_inode_add_request(struct
if (nfs_have_delegation(inode, FMODE_WRITE))
nfsi->change_attr++;
}
+ /*
+ * The PG_private bit is unfortunately needed if we want to fix the
+ * hole in the mmap semantics. If we do not set it, then the VM will
+ * fail to call the "releasepage" address ops.
+ */
SetPagePrivate(req->wb_page);
nfsi->npages++;
atomic_inc(&req->wb_count);
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7)
@ 2006-09-12 15:25 Peter Zijlstra
2006-09-12 15:25 ` [PATCH 20/20] iscsi: support for swapping over iSCSI Peter Zijlstra
` (20 more replies)
0 siblings, 21 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra
--
Yet another instance of my networked swap patches.
The patch-set consists of four parts:
- patches 1-2; the basic 'framework' for deadlock avoidance
- patches 3-9; implement swap over NFS
- patches 10-13; implement swap over NBD
- patches 14-20; implement swap over iSCSI
The iSCSI work depends on their .19 tree and does need some more work,
but does work in its current state.
As stated in previous posts, NFS and iSCSI survive service failures and
reconnect properly during heavy swapping.
Linus, when I mentioned swap over network to you in Ottawa, you said it was
a valid use case, that people actually do and want this. Can you agree with
the approach taken in these patches?
Peter
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 01/20] mm: serialize access to min_free_kbytes
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (7 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 13/20] nbd: use swapdev hook to make swap deadlock free Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 11/20] nbd: request_fn fixup Peter Zijlstra
` (11 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra
[-- Attachment #1: setup_per_zone_pages_min.patch --]
[-- Type: text/plain, Size: 2158 bytes --]
There is a small race between the procfs caller and the memory hotplug caller
of setup_per_zone_pages_min(). Not a big deal, but the next patch will add yet
another caller. Time to close the gap.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
mm/page_alloc.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -81,6 +81,7 @@ struct zone *zone_table[1 << ZONETABLE_S
EXPORT_SYMBOL(zone_table);
static char *zone_names[MAX_NR_ZONES] = { "DMA", "DMA32", "Normal", "HighMem" };
+static DEFINE_SPINLOCK(min_free_lock);
int min_free_kbytes = 1024;
unsigned long __meminitdata nr_kernel_pages;
@@ -2190,11 +2191,11 @@ static void setup_per_zone_lowmem_reserv
}
/*
- * setup_per_zone_pages_min - called when min_free_kbytes changes. Ensures
+ * __setup_per_zone_pages_min - called when min_free_kbytes changes. Ensures
* that the pages_{min,low,high} values for each zone are set correctly
* with respect to min_free_kbytes.
*/
-void setup_per_zone_pages_min(void)
+static void __setup_per_zone_pages_min(void)
{
unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
unsigned long lowmem_pages = 0;
@@ -2248,6 +2249,15 @@ void setup_per_zone_pages_min(void)
calculate_totalreserve_pages();
}
+void setup_per_zone_pages_min(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&min_free_lock, flags);
+ __setup_per_zone_pages_min();
+ spin_unlock_irqrestore(&min_free_lock, flags);
+}
+
/*
* Initialise min_free_kbytes.
*
@@ -2283,7 +2293,7 @@ static int __init init_per_zone_pages_mi
min_free_kbytes = 128;
if (min_free_kbytes > 65536)
min_free_kbytes = 65536;
- setup_per_zone_pages_min();
+ __setup_per_zone_pages_min();
setup_per_zone_lowmem_reserve();
return 0;
}
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 06/20] nfs: teach the NFS client how to treat PG_swapcache pages
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (16 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 12/20] nbd: limit blk_queue Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 07/20] nfs: add a comment explaining the use of PG_private in the NFS client Peter Zijlstra
` (2 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Trond Myklebust
[-- Attachment #1: nfs_swapcache.patch --]
[-- Type: text/plain, Size: 8842 bytes --]
Replace all relevant occurences of page->index and page->mapping in the NFS
client with the new page_file_index() and page_file_mapping() functions.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Trond Myklebust <trond.myklebust@fys.uio.no>
---
fs/nfs/file.c | 6 +++---
fs/nfs/pagelist.c | 8 ++++----
fs/nfs/read.c | 10 +++++-----
fs/nfs/write.c | 28 ++++++++++++++--------------
4 files changed, 26 insertions(+), 26 deletions(-)
Index: linux-2.6/fs/nfs/file.c
===================================================================
--- linux-2.6.orig/fs/nfs/file.c
+++ linux-2.6/fs/nfs/file.c
@@ -303,17 +303,17 @@ static int nfs_commit_write(struct file
static void nfs_invalidate_page(struct page *page, unsigned long offset)
{
- struct inode *inode = page->mapping->host;
+ struct inode *inode = page_file_mapping(page)->host;
/* Cancel any unstarted writes on this page */
if (offset == 0)
- nfs_sync_inode_wait(inode, page->index, 1, FLUSH_INVALIDATE);
+ nfs_sync_inode_wait(inode, page_file_index(page), 1, FLUSH_INVALIDATE);
}
static int nfs_release_page(struct page *page, gfp_t gfp)
{
if (gfp & __GFP_FS)
- return !nfs_wb_page(page->mapping->host, page);
+ return !nfs_wb_page(page_file_mapping(page)->host, page);
else
/*
* Avoid deadlock on nfs_wait_on_request().
Index: linux-2.6/fs/nfs/pagelist.c
===================================================================
--- linux-2.6.orig/fs/nfs/pagelist.c
+++ linux-2.6/fs/nfs/pagelist.c
@@ -82,11 +82,11 @@ nfs_create_request(struct nfs_open_conte
* update_nfs_request below if the region is not locked. */
req->wb_page = page;
atomic_set(&req->wb_complete, 0);
- req->wb_index = page->index;
+ req->wb_index = page_file_index(page);
page_cache_get(page);
BUG_ON(PagePrivate(page));
BUG_ON(!PageLocked(page));
- BUG_ON(page->mapping->host != inode);
+ BUG_ON(page_file_mapping(page)->host != inode);
req->wb_offset = offset;
req->wb_pgbase = offset;
req->wb_bytes = count;
@@ -271,7 +271,7 @@ nfs_coalesce_requests(struct list_head *
* nfs_scan_lock_dirty - Scan the radix tree for dirty requests
* @nfsi: NFS inode
* @dst: Destination list
- * @idx_start: lower bound of page->index to scan
+ * @idx_start: lower bound of page_file_index(page) to scan
* @npages: idx_start + npages sets the upper bound to scan.
*
* Moves elements from one of the inode request lists.
@@ -328,7 +328,7 @@ out:
* @nfsi: NFS inode
* @head: One of the NFS inode request lists
* @dst: Destination list
- * @idx_start: lower bound of page->index to scan
+ * @idx_start: lower bound of page_file_index(page) to scan
* @npages: idx_start + npages sets the upper bound to scan.
*
* Moves elements from one of the inode request lists.
Index: linux-2.6/fs/nfs/read.c
===================================================================
--- linux-2.6.orig/fs/nfs/read.c
+++ linux-2.6/fs/nfs/read.c
@@ -86,9 +86,9 @@ unsigned int nfs_page_length(struct inod
if (i_size <= 0)
return 0;
idx = (i_size - 1) >> PAGE_CACHE_SHIFT;
- if (page->index > idx)
+ if (page_file_index(page) > idx)
return 0;
- if (page->index != idx)
+ if (page_file_index(page) != idx)
return PAGE_CACHE_SIZE;
return 1 + ((i_size - 1) & (PAGE_CACHE_SIZE - 1));
}
@@ -595,11 +595,11 @@ int nfs_readpage_result(struct rpc_task
int nfs_readpage(struct file *file, struct page *page)
{
struct nfs_open_context *ctx;
- struct inode *inode = page->mapping->host;
+ struct inode *inode = page_file_mapping(page)->host;
int error;
dprintk("NFS: nfs_readpage (%p %ld@%lu)\n",
- page, PAGE_CACHE_SIZE, page->index);
+ page, PAGE_CACHE_SIZE, page_file_index(page));
nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
nfs_add_stats(inode, NFSIOS_READPAGES, 1);
@@ -647,7 +647,7 @@ static int
readpage_async_filler(void *data, struct page *page)
{
struct nfs_readdesc *desc = (struct nfs_readdesc *)data;
- struct inode *inode = page->mapping->host;
+ struct inode *inode = page_file_mapping(page)->host;
struct nfs_page *new;
unsigned int len;
Index: linux-2.6/fs/nfs/write.c
===================================================================
--- linux-2.6.orig/fs/nfs/write.c
+++ linux-2.6/fs/nfs/write.c
@@ -145,13 +145,13 @@ void nfs_writedata_release(void *wdata)
/* Adjust the file length if we're writing beyond the end */
static void nfs_grow_file(struct page *page, unsigned int offset, unsigned int count)
{
- struct inode *inode = page->mapping->host;
+ struct inode *inode = page_file_mapping(page)->host;
loff_t end, i_size = i_size_read(inode);
unsigned long end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
- if (i_size > 0 && page->index < end_index)
+ if (i_size > 0 && page_file_index(page) < end_index)
return;
- end = ((loff_t)page->index << PAGE_CACHE_SHIFT) + ((loff_t)offset+count);
+ end = page_offset(page) + ((loff_t)offset+count);
if (i_size >= end)
return;
nfs_inc_stats(inode, NFSIOS_EXTENDWRITE);
@@ -174,11 +174,11 @@ static void nfs_mark_uptodate(struct pag
return;
}
- end_offs = i_size_read(page->mapping->host) - 1;
+ end_offs = i_size_read(page_file_mapping(page)->host) - 1;
if (end_offs < 0)
return;
/* Is this the last page? */
- if (page->index != (unsigned long)(end_offs >> PAGE_CACHE_SHIFT))
+ if (page_file_index(page) != (unsigned long)(end_offs >> PAGE_CACHE_SHIFT))
return;
/* This is the last page: set PG_uptodate if we cover the entire
* extent of the data, then zero the rest of the page.
@@ -293,7 +293,7 @@ static int wb_priority(struct writeback_
int nfs_writepage(struct page *page, struct writeback_control *wbc)
{
struct nfs_open_context *ctx;
- struct inode *inode = page->mapping->host;
+ struct inode *inode = page_file_mapping(page)->host;
unsigned long end_index;
unsigned offset = PAGE_CACHE_SIZE;
loff_t i_size = i_size_read(inode);
@@ -320,14 +320,14 @@ int nfs_writepage(struct page *page, str
nfs_wb_page_priority(inode, page, priority);
/* easy case */
- if (page->index < end_index)
+ if (page_file_index(page) < end_index)
goto do_it;
/* things got complicated... */
offset = i_size & (PAGE_CACHE_SIZE-1);
/* OK, are we completely out? */
err = 0; /* potential race with truncate - ignore */
- if (page->index >= end_index+1 || !offset)
+ if (page_file_index(page) >= end_index+1 || !offset)
goto out;
do_it:
ctx = nfs_find_open_context(inode, NULL, FMODE_WRITE);
@@ -599,7 +599,7 @@ static void nfs_cancel_commit_list(struc
* nfs_scan_dirty - Scan an inode for dirty requests
* @inode: NFS inode to scan
* @dst: destination list
- * @idx_start: lower bound of page->index to scan.
+ * @idx_start: lower bound of page_file_index(page) to scan.
* @npages: idx_start + npages sets the upper bound to scan.
*
* Moves requests from the inode's dirty page list.
@@ -625,7 +625,7 @@ nfs_scan_dirty(struct inode *inode, stru
* nfs_scan_commit - Scan an inode for commit requests
* @inode: NFS inode to scan
* @dst: destination list
- * @idx_start: lower bound of page->index to scan.
+ * @idx_start: lower bound of page_file_index(page) to scan.
* @npages: idx_start + npages sets the upper bound to scan.
*
* Moves requests from the inode's 'commit' request list.
@@ -706,14 +706,14 @@ static struct nfs_page * nfs_update_requ
end = offset + bytes;
- if (nfs_wait_on_write_congestion(page->mapping, server->flags & NFS_MOUNT_INTR))
+ if (nfs_wait_on_write_congestion(page_file_mapping(page), server->flags & NFS_MOUNT_INTR))
return ERR_PTR(-ERESTARTSYS);
for (;;) {
/* Loop over all inode entries and see if we find
* A request for the page we wish to update
*/
spin_lock(&nfsi->req_lock);
- req = _nfs_find_request(inode, page->index);
+ req = _nfs_find_request(inode, page_file_index(page));
if (req) {
if (!nfs_lock_request_dontget(req)) {
int error;
@@ -784,7 +784,7 @@ static struct nfs_page * nfs_update_requ
int nfs_flush_incompatible(struct file *file, struct page *page)
{
struct nfs_open_context *ctx = (struct nfs_open_context *)file->private_data;
- struct inode *inode = page->mapping->host;
+ struct inode *inode = page_file_mapping(page)->host;
struct nfs_page *req;
int status = 0;
/*
@@ -795,7 +795,7 @@ int nfs_flush_incompatible(struct file *
* Also do the same if we find a request from an existing
* dropped page.
*/
- req = nfs_find_request(inode, page->index);
+ req = nfs_find_request(inode, page_file_index(page));
if (req) {
if (req->wb_page != page || ctx != req->wb_context)
status = nfs_wb_page(inode, page);
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 12/20] nbd: limit blk_queue
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (15 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 03/20] mm: add support for non block device backed swap files Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 22:47 ` Jens Axboe
2006-09-12 15:25 ` [PATCH 06/20] nfs: teach the NFS client how to treat PG_swapcache pages Peter Zijlstra
` (3 subsequent siblings)
20 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Pavel Machek
[-- Attachment #1: nbd_queue.patch --]
[-- Type: text/plain, Size: 1116 bytes --]
Limit each request to 1 page, so that the request throttling also limits the
number of in-flight pages.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Daniel Phillips <phillips@google.com>
CC: Pavel Machek <pavel@ucw.cz>
---
drivers/block/nbd.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
Index: linux-2.6/drivers/block/nbd.c
===================================================================
--- linux-2.6.orig/drivers/block/nbd.c 2006-09-07 18:43:41.000000000 +0200
+++ linux-2.6/drivers/block/nbd.c 2006-09-07 18:44:12.000000000 +0200
@@ -638,6 +638,9 @@ static int __init nbd_init(void)
put_disk(disk);
goto out;
}
+ blk_queue_max_segment_size(disk->queue, PAGE_SIZE);
+ blk_queue_max_hw_segments(disk->queue, 1);
+ blk_queue_max_phys_segments(disk->queue, 1);
}
if (register_blkdev(NBD_MAJOR, "nbd")) {
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 18/20] netlink: add SOCK_VMIO support to AF_NETLINK
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
2006-09-12 15:25 ` [PATCH 20/20] iscsi: support for swapping over iSCSI Peter Zijlstra
2006-09-12 15:25 ` [PATCH 10/20] mm: block device swap notification Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 02/20] net: vm deadlock avoidance core Peter Zijlstra
` (17 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Mike Christie
[-- Attachment #1: netlink_vmio.patch --]
[-- Type: text/plain, Size: 4849 bytes --]
Modify the netlink code so that SOCK_VMIO has the desired effect on the
user-space side of the connection.
Modify sys_{send,recv}msg to use sk->sk_allocation instead of GFP_KERNEL,
this should not change existing behaviour because the default of
sk->sk_allocation is GFP_KERNEL, and no user-space exposed socket would
have it any different at this time.
This change allows the system calls to succeed for SOCK_VMIO sockets
(who have sk->sk_allocation |= GFP_EMERGENCY) even under extreme memory
pressure.
Since netlink_sendmsg is used to transfer msgs from user- to kernel-space
treat the skb allocation there as a receive allocation.
Also export netlink_lookup, this is needed to locate the kernel side struct
sock object associated with the user-space netlink socket.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: David Miller <davem@davemloft.net>
CC: Mike Christie <michaelc@cs.wisc.edu>
---
include/linux/netlink.h | 1 +
net/compat.c | 2 +-
net/netlink/af_netlink.c | 8 +++++---
net/socket.c | 6 +++---
4 files changed, 10 insertions(+), 7 deletions(-)
Index: linux-2.6/net/netlink/af_netlink.c
===================================================================
--- linux-2.6.orig/net/netlink/af_netlink.c
+++ linux-2.6/net/netlink/af_netlink.c
@@ -199,7 +199,7 @@ netlink_unlock_table(void)
wake_up(&nl_table_wait);
}
-static __inline__ struct sock *netlink_lookup(int protocol, u32 pid)
+__inline__ struct sock *netlink_lookup(int protocol, u32 pid)
{
struct nl_pid_hash *hash = &nl_table[protocol].hash;
struct hlist_head *head;
@@ -1147,7 +1147,7 @@ static int netlink_sendmsg(struct kiocb
if (len > sk->sk_sndbuf - 32)
goto out;
err = -ENOBUFS;
- skb = alloc_skb(len, GFP_KERNEL);
+ skb = __alloc_skb(len, GFP_KERNEL, SKB_ALLOC_RX);
if (skb==NULL)
goto out;
@@ -1178,7 +1178,8 @@ static int netlink_sendmsg(struct kiocb
if (dst_group) {
atomic_inc(&skb->users);
- netlink_broadcast(sk, skb, dst_pid, dst_group, GFP_KERNEL);
+ netlink_broadcast(sk, skb, dst_pid, dst_group,
+ sk->sk_allocation);
}
err = netlink_unicast(sk, skb, dst_pid, msg->msg_flags&MSG_DONTWAIT);
@@ -1788,6 +1789,7 @@ panic:
core_initcall(netlink_proto_init);
+EXPORT_SYMBOL(netlink_lookup);
EXPORT_SYMBOL(netlink_ack);
EXPORT_SYMBOL(netlink_run_queue);
EXPORT_SYMBOL(netlink_queue_skip);
Index: linux-2.6/net/socket.c
===================================================================
--- linux-2.6.orig/net/socket.c
+++ linux-2.6/net/socket.c
@@ -1790,7 +1790,7 @@ asmlinkage long sys_sendmsg(int fd, stru
err = -ENOMEM;
iov_size = msg_sys.msg_iovlen * sizeof(struct iovec);
if (msg_sys.msg_iovlen > UIO_FASTIOV) {
- iov = sock_kmalloc(sock->sk, iov_size, GFP_KERNEL);
+ iov = sock_kmalloc(sock->sk, iov_size, sock->sk->sk_allocation);
if (!iov)
goto out_put;
}
@@ -1818,7 +1818,7 @@ asmlinkage long sys_sendmsg(int fd, stru
} else if (ctl_len) {
if (ctl_len > sizeof(ctl))
{
- ctl_buf = sock_kmalloc(sock->sk, ctl_len, GFP_KERNEL);
+ ctl_buf = sock_kmalloc(sock->sk, ctl_len, sock->sk->sk_allocation);
if (ctl_buf == NULL)
goto out_freeiov;
}
@@ -1891,7 +1891,7 @@ asmlinkage long sys_recvmsg(int fd, stru
err = -ENOMEM;
iov_size = msg_sys.msg_iovlen * sizeof(struct iovec);
if (msg_sys.msg_iovlen > UIO_FASTIOV) {
- iov = sock_kmalloc(sock->sk, iov_size, GFP_KERNEL);
+ iov = sock_kmalloc(sock->sk, iov_size, sock->sk->sk_allocation);
if (!iov)
goto out_put;
}
Index: linux-2.6/include/linux/netlink.h
===================================================================
--- linux-2.6.orig/include/linux/netlink.h
+++ linux-2.6/include/linux/netlink.h
@@ -150,6 +150,7 @@ struct netlink_skb_parms
#define NETLINK_CREDS(skb) (&NETLINK_CB((skb)).creds)
+extern struct sock *netlink_lookup(int protocol, __u32 pid);
extern struct sock *netlink_kernel_create(int unit, unsigned int groups, void (*input)(struct sock *sk, int len), struct module *module);
extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err);
extern int netlink_has_listeners(struct sock *sk, unsigned int group);
Index: linux-2.6/net/compat.c
===================================================================
--- linux-2.6.orig/net/compat.c
+++ linux-2.6/net/compat.c
@@ -170,7 +170,7 @@ int cmsghdr_from_user_compat_to_kern(str
* from the user.
*/
if (kcmlen > stackbuf_size)
- kcmsg_base = kcmsg = sock_kmalloc(sk, kcmlen, GFP_KERNEL);
+ kcmsg_base = kcmsg = sock_kmalloc(sk, kcmlen, sk->sk_allocation);
if (kcmsg == NULL)
return -ENOBUFS;
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 03/20] mm: add support for non block device backed swap files
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (14 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 14/20] uml: enable scsi and add iscsi config Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 12/20] nbd: limit blk_queue Peter Zijlstra
` (4 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Trond Myklebust
[-- Attachment #1: swapfile.patch --]
[-- Type: text/plain, Size: 8043 bytes --]
A new addres_space_operations method is added:
int swapfile(struct address_space *, int)
When during sys_swapon() this method is found and returns no error the
swapper_space.a_ops will proxy to sis->swap_file->f_mapping->a_ops.
The swapfile method will be used to communicate to the address_space that the
VM relies on it, and the address_space should take adequate measures (like
reserving memory for mempools or the like).
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Trond Myklebust <trond.myklebust@fys.uio.no>
---
fs/buffer.c | 2 -
include/linux/fs.h | 1
include/linux/swap.h | 4 +++
init/Kconfig | 5 ++++
mm/page_io.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++
mm/swap_state.c | 6 +++++
mm/swapfile.c | 27 ++++++++++++++++++++++
7 files changed, 103 insertions(+), 2 deletions(-)
Index: linux-2.6/include/linux/swap.h
===================================================================
--- linux-2.6.orig/include/linux/swap.h
+++ linux-2.6/include/linux/swap.h
@@ -115,6 +115,7 @@ enum {
SWP_USED = (1 << 0), /* is slot in swap_info[] used? */
SWP_WRITEOK = (1 << 1), /* ok to write to this swap? */
SWP_ACTIVE = (SWP_USED | SWP_WRITEOK),
+ SWP_FILE = (1 << 2), /* file swap area */
/* add others here before... */
SWP_SCANNING = (1 << 8), /* refcount in scan_swap_map */
};
@@ -212,6 +213,9 @@ extern void swap_unplug_io_fn(struct bac
/* linux/mm/page_io.c */
extern int swap_readpage(struct file *, struct page *);
extern int swap_writepage(struct page *page, struct writeback_control *wbc);
+extern void swap_sync_page(struct page *page);
+extern int swap_set_page_dirty(struct page *page);
+extern int swap_releasepage(struct page *page, gfp_t gfp_mask);
extern int rw_swap_page_sync(int, swp_entry_t, struct page *);
/* linux/mm/swap_state.c */
Index: linux-2.6/init/Kconfig
===================================================================
--- linux-2.6.orig/init/Kconfig
+++ linux-2.6/init/Kconfig
@@ -100,6 +100,11 @@ config SWAP
used to provide more virtual memory than the actual RAM present
in your computer. If unsure say Y.
+config SWAP_FILE
+ bool "Support for paging to/from non block device files"
+ depends on SWAP
+ default n
+
config SYSVIPC
bool "System V IPC"
---help---
Index: linux-2.6/mm/page_io.c
===================================================================
--- linux-2.6.orig/mm/page_io.c
+++ linux-2.6/mm/page_io.c
@@ -17,6 +17,7 @@
#include <linux/bio.h>
#include <linux/swapops.h>
#include <linux/writeback.h>
+#include <linux/buffer_head.h>
#include <asm/pgtable.h>
static struct bio *get_swap_bio(gfp_t gfp_flags, pgoff_t index,
@@ -91,6 +92,14 @@ int swap_writepage(struct page *page, st
unlock_page(page);
goto out;
}
+#ifdef CONFIG_SWAP_FILE
+ {
+ struct swap_info_struct *sis = page_swap_info(page);
+ if (sis->flags & SWP_FILE)
+ return sis->swap_file->f_mapping->
+ a_ops->writepage(page, wbc);
+ }
+#endif
bio = get_swap_bio(GFP_NOIO, page_private(page), page,
end_swap_bio_write);
if (bio == NULL) {
@@ -116,6 +125,14 @@ int swap_readpage(struct file *file, str
BUG_ON(!PageLocked(page));
ClearPageUptodate(page);
+#ifdef CONFIG_SWAP_FILE
+ {
+ struct swap_info_struct *sis = page_swap_info(page);
+ if (sis->flags & SWP_FILE)
+ return sis->swap_file->f_mapping->
+ a_ops->readpage(sis->swap_file, page);
+ }
+#endif
bio = get_swap_bio(GFP_KERNEL, page_private(page), page,
end_swap_bio_read);
if (bio == NULL) {
@@ -129,6 +146,49 @@ out:
return ret;
}
+#ifdef CONFIG_SWAP_FILE
+void swap_sync_page(struct page *page)
+{
+ struct swap_info_struct *sis = page_swap_info(page);
+
+ if (sis->flags & SWP_FILE) {
+ const struct address_space_operations * a_ops =
+ sis->swap_file->f_mapping->a_ops;
+ if (a_ops->sync_page)
+ a_ops->sync_page(page);
+ } else
+ block_sync_page(page);
+}
+
+int swap_set_page_dirty(struct page *page)
+{
+ struct swap_info_struct *sis = page_swap_info(page);
+
+ if (sis->flags & SWP_FILE) {
+ const struct address_space_operations * a_ops =
+ sis->swap_file->f_mapping->a_ops;
+ if (a_ops->set_page_dirty)
+ return a_ops->set_page_dirty(page);
+ return __set_page_dirty_buffers(page);
+ }
+
+ return __set_page_dirty_nobuffers(page);
+}
+
+int swap_releasepage(struct page *page, gfp_t gfp_mask)
+{
+ struct swap_info_struct *sis = page_swap_info(page);
+ const struct address_space_operations * a_ops =
+ sis->swap_file->f_mapping->a_ops;
+
+ if ((sis->flags & SWP_FILE) && a_ops->releasepage)
+ return a_ops->releasepage(page, gfp_mask);
+
+ BUG();
+ return 0;
+}
+#endif
+
#ifdef CONFIG_SOFTWARE_SUSPEND
/*
* A scruffy utility function to read or write an arbitrary swap page
Index: linux-2.6/mm/swap_state.c
===================================================================
--- linux-2.6.orig/mm/swap_state.c
+++ linux-2.6/mm/swap_state.c
@@ -26,8 +26,14 @@
*/
static const struct address_space_operations swap_aops = {
.writepage = swap_writepage,
+#ifdef CONFIG_SWAP_FILE
+ .sync_page = swap_sync_page,
+ .set_page_dirty = swap_set_page_dirty,
+ .releasepage = swap_releasepage,
+#else
.sync_page = block_sync_page,
.set_page_dirty = __set_page_dirty_nobuffers,
+#endif
.migratepage = migrate_page,
};
Index: linux-2.6/mm/swapfile.c
===================================================================
--- linux-2.6.orig/mm/swapfile.c
+++ linux-2.6/mm/swapfile.c
@@ -411,7 +411,12 @@ void free_swap_and_cache(swp_entry_t ent
if (page) {
int one_user;
+#ifdef CONFIG_SWAP_FILE
+ if (PagePrivate(page))
+ page_mapping(page)->a_ops->releasepage(page, 0);
+#else
BUG_ON(PagePrivate(page));
+#endif
one_user = (page_count(page) == 2);
/* Only cache user (+us), or swap space full? Free it! */
/* Also recheck PageSwapCache after page is locked (above) */
@@ -944,6 +949,13 @@ static void destroy_swap_extents(struct
list_del(&se->list);
kfree(se);
}
+#ifdef CONFIG_SWAP_FILE
+ if (sis->flags & SWP_FILE) {
+ sis->flags &= ~SWP_FILE;
+ sis->swap_file->f_mapping->a_ops->
+ swapfile(sis->swap_file->f_mapping, 0);
+ }
+#endif
}
/*
@@ -1036,6 +1048,19 @@ static int setup_swap_extents(struct swa
goto done;
}
+#ifdef CONFIG_SWAP_FILE
+ if (sis->swap_file->f_mapping->a_ops->swapfile) {
+ ret = sis->swap_file->f_mapping->a_ops->
+ swapfile(sis->swap_file->f_mapping, 1);
+ if (!ret) {
+ sis->flags |= SWP_FILE;
+ ret = add_swap_extent(sis, 0, sis->max, 0);
+ *span = sis->pages;
+ }
+ goto done;
+ }
+#endif
+
blkbits = inode->i_blkbits;
blocks_per_page = PAGE_SIZE >> blkbits;
@@ -1592,7 +1617,7 @@ asmlinkage long sys_swapon(const char __
mutex_lock(&swapon_mutex);
spin_lock(&swap_lock);
- p->flags = SWP_ACTIVE;
+ p->flags |= SWP_WRITEOK;
nr_swap_pages += nr_good_pages;
total_swap_pages += nr_good_pages;
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -382,6 +382,7 @@ struct address_space_operations {
/* migrate the contents of a page to the specified target */
int (*migratepage) (struct address_space *,
struct page *, struct page *);
+ int (*swapfile)(struct address_space *, int);
};
struct backing_dev_info;
Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -1567,7 +1567,7 @@ static void discard_buffer(struct buffer
*/
int try_to_release_page(struct page *page, gfp_t gfp_mask)
{
- struct address_space * const mapping = page->mapping;
+ struct address_space * const mapping = page_mapping(page);
BUG_ON(!PageLocked(page));
if (PageWriteback(page))
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 05/20] uml: rename arch/um remove_mapping()
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (11 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 15/20] iscsi: kernel side tcp connect Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 09/20] nfs: make swap on NFS robust Peter Zijlstra
` (7 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Jeff Dike
[-- Attachment #1: uml_remove_mapping.patch --]
[-- Type: text/plain, Size: 1488 bytes --]
Now that 'include/linux/mm.h' includes 'include/linux/swap.h', the global
remove_mapping() definition clashes with the arch/um one.
Rename the arch/um one.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Jeff Dike <jdike@addtoit.com>
---
arch/um/kernel/physmem.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
Index: linux-2.6/arch/um/kernel/physmem.c
===================================================================
--- linux-2.6.orig/arch/um/kernel/physmem.c
+++ linux-2.6/arch/um/kernel/physmem.c
@@ -160,7 +160,7 @@ int physmem_subst_mapping(void *virt, in
static int physmem_fd = -1;
-static void remove_mapping(struct phys_desc *desc)
+static void um_remove_mapping(struct phys_desc *desc)
{
void *virt = desc->virt;
int err;
@@ -184,7 +184,7 @@ int physmem_remove_mapping(void *virt)
if(desc == NULL)
return(0);
- remove_mapping(desc);
+ um_remove_mapping(desc);
return(1);
}
@@ -205,7 +205,7 @@ void physmem_forget_descriptor(int fd)
page = list_entry(ele, struct phys_desc, list);
offset = page->offset;
addr = page->virt;
- remove_mapping(page);
+ um_remove_mapping(page);
err = os_seek_file(fd, offset);
if(err)
panic("physmem_forget_descriptor - failed to seek "
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 04/20] mm: methods for teaching filesystems about PG_swapcache pages
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (9 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 11/20] nbd: request_fn fixup Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 15/20] iscsi: kernel side tcp connect Peter Zijlstra
` (9 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Trond Myklebust
[-- Attachment #1: page_file_methods.patch --]
[-- Type: text/plain, Size: 6892 bytes --]
In order to teach filesystems to handle swap cache pages, two new page
functions are introduced:
pgoff_t page_file_index(struct page *);
struct address_space *page_file_mapping(struct page *);
page_file_index - gives the offset of this page in the file in PAGE_CACHE_SIZE
blocks. Like page->index is for mapped pages, this function also gives the
correct index for PG_swapcache pages.
page_file_mapping - gives the mapping backing the actual page; that is for
swap cache pages it will give swap_file->f_mapping.
page_offset() is modified to use page_file_index(), so that it will give the
expected result, even for PG_swapcache pages.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Trond Myklebust <trond.myklebust@fys.uio.no>
---
include/linux/mm.h | 30 ++++++++++++++++++++++++++++++
include/linux/pagemap.h | 2 +-
include/linux/swap.h | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/swapops.h | 44 --------------------------------------------
4 files changed, 79 insertions(+), 45 deletions(-)
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -15,6 +15,7 @@
#include <linux/fs.h>
#include <linux/mutex.h>
#include <linux/debug_locks.h>
+#include <linux/swap.h>
struct mempolicy;
struct anon_vma;
@@ -579,6 +580,22 @@ static inline struct address_space *page
return mapping;
}
+static inline
+struct swap_info_struct * page_swap_info(struct page *page)
+{
+ swp_entry_t swap = { .val = page_private(page) };
+ BUG_ON(!PageSwapCache(page));
+ return get_swap_info_struct(swp_type(swap));
+}
+
+static inline
+struct address_space *page_file_mapping(struct page *page)
+{
+ if (unlikely(PageSwapCache(page)))
+ return page_swap_info(page)->swap_file->f_mapping;
+ return page->mapping;
+}
+
static inline int PageAnon(struct page *page)
{
return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
@@ -596,6 +613,19 @@ static inline pgoff_t page_index(struct
}
/*
+ * Return the file index of the page. Regular pagecache pages use ->index
+ * whereas swapcache pages use swp_offset(->private)
+ */
+static inline pgoff_t page_file_index(struct page *page)
+{
+ if (unlikely(PageSwapCache(page))) {
+ swp_entry_t swap = { .val = page_private(page) };
+ return swp_offset(swap);
+ }
+ return page->index;
+}
+
+/*
* The atomic page->_mapcount, like _count, starts from -1:
* so that transitions both from it and to it can be tracked,
* using atomic_inc_and_test and atomic_add_negative(-1).
Index: linux-2.6/include/linux/pagemap.h
===================================================================
--- linux-2.6.orig/include/linux/pagemap.h
+++ linux-2.6/include/linux/pagemap.h
@@ -118,7 +118,7 @@ extern void __remove_from_page_cache(str
*/
static inline loff_t page_offset(struct page *page)
{
- return ((loff_t)page->index) << PAGE_CACHE_SHIFT;
+ return ((loff_t)page_file_index(page)) << PAGE_CACHE_SHIFT;
}
static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
Index: linux-2.6/include/linux/swap.h
===================================================================
--- linux-2.6.orig/include/linux/swap.h
+++ linux-2.6/include/linux/swap.h
@@ -75,6 +75,50 @@ typedef struct {
} swp_entry_t;
/*
+ * swapcache pages are stored in the swapper_space radix tree. We want to
+ * get good packing density in that tree, so the index should be dense in
+ * the low-order bits.
+ *
+ * We arrange the `type' and `offset' fields so that `type' is at the five
+ * high-order bits of the swp_entry_t and `offset' is right-aligned in the
+ * remaining bits.
+ *
+ * swp_entry_t's are *never* stored anywhere in their arch-dependent format.
+ */
+#define SWP_TYPE_SHIFT(e) (sizeof(e.val) * 8 - MAX_SWAPFILES_SHIFT)
+#define SWP_OFFSET_MASK(e) ((1UL << SWP_TYPE_SHIFT(e)) - 1)
+
+/*
+ * Store a type+offset into a swp_entry_t in an arch-independent format
+ */
+static inline swp_entry_t swp_entry(unsigned long type, pgoff_t offset)
+{
+ swp_entry_t ret;
+
+ ret.val = (type << SWP_TYPE_SHIFT(ret)) |
+ (offset & SWP_OFFSET_MASK(ret));
+ return ret;
+}
+
+/*
+ * Extract the `type' field from a swp_entry_t. The swp_entry_t is in
+ * arch-independent format
+ */
+static inline unsigned swp_type(swp_entry_t entry)
+{
+ return (entry.val >> SWP_TYPE_SHIFT(entry));
+}
+
+/*
+ * Extract the `offset' field from a swp_entry_t. The swp_entry_t is in
+ * arch-independent format
+ */
+static inline pgoff_t swp_offset(swp_entry_t entry)
+{
+ return entry.val & SWP_OFFSET_MASK(entry);
+}
+
+/*
* current->reclaim_state points to one of these when a task is running
* memory reclaim
*/
@@ -322,6 +366,10 @@ static inline int valid_swaphandles(swp_
return 0;
}
+static inline struct swap_info_struct *get_swap_info_struct(unsigned type)
+{
+ return NULL;
+}
#define can_share_swap_page(p) (page_mapcount(p) == 1)
static inline int move_to_swap_cache(struct page *page, swp_entry_t entry)
Index: linux-2.6/include/linux/swapops.h
===================================================================
--- linux-2.6.orig/include/linux/swapops.h
+++ linux-2.6/include/linux/swapops.h
@@ -1,48 +1,4 @@
/*
- * swapcache pages are stored in the swapper_space radix tree. We want to
- * get good packing density in that tree, so the index should be dense in
- * the low-order bits.
- *
- * We arrange the `type' and `offset' fields so that `type' is at the five
- * high-order bits of the swp_entry_t and `offset' is right-aligned in the
- * remaining bits.
- *
- * swp_entry_t's are *never* stored anywhere in their arch-dependent format.
- */
-#define SWP_TYPE_SHIFT(e) (sizeof(e.val) * 8 - MAX_SWAPFILES_SHIFT)
-#define SWP_OFFSET_MASK(e) ((1UL << SWP_TYPE_SHIFT(e)) - 1)
-
-/*
- * Store a type+offset into a swp_entry_t in an arch-independent format
- */
-static inline swp_entry_t swp_entry(unsigned long type, pgoff_t offset)
-{
- swp_entry_t ret;
-
- ret.val = (type << SWP_TYPE_SHIFT(ret)) |
- (offset & SWP_OFFSET_MASK(ret));
- return ret;
-}
-
-/*
- * Extract the `type' field from a swp_entry_t. The swp_entry_t is in
- * arch-independent format
- */
-static inline unsigned swp_type(swp_entry_t entry)
-{
- return (entry.val >> SWP_TYPE_SHIFT(entry));
-}
-
-/*
- * Extract the `offset' field from a swp_entry_t. The swp_entry_t is in
- * arch-independent format
- */
-static inline pgoff_t swp_offset(swp_entry_t entry)
-{
- return entry.val & SWP_OFFSET_MASK(entry);
-}
-
-/*
* Convert the arch-dependent pte representation of a swp_entry_t into an
* arch-independent swp_entry_t.
*/
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 08/20] nfs: enable swap on NFS
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (4 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 16/20] iscsi: add session context to ep_connect Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 19/20] mm: a process flags to avoid blocking allocations Peter Zijlstra
` (14 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Trond Myklebust
[-- Attachment #1: nfs_swapfile.patch --]
[-- Type: text/plain, Size: 1263 bytes --]
Now that NFS can handle swap cache pages, add a swapfile method to allow
swapping over NFS.
NOTE: this dummy method is obviously not enough to make it safe.
A more complete version of the nfs_swapfile() function will be presented
later in the series.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Trond Myklebust <trond.myklebust@fys.uio.no>
---
fs/nfs/file.c | 6 ++++++
1 file changed, 6 insertions(+)
Index: linux-2.6/fs/nfs/file.c
===================================================================
--- linux-2.6.orig/fs/nfs/file.c
+++ linux-2.6/fs/nfs/file.c
@@ -321,6 +321,11 @@ static int nfs_release_page(struct page
return 0;
}
+static int nfs_swapfile(struct address_space *mapping, int enable)
+{
+ return 0;
+}
+
const struct address_space_operations nfs_file_aops = {
.readpage = nfs_readpage,
.readpages = nfs_readpages,
@@ -334,6 +339,7 @@ const struct address_space_operations nf
#ifdef CONFIG_NFS_DIRECTIO
.direct_IO = nfs_direct_IO,
#endif
+ .swapfile = nfs_swapfile,
};
/*
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 11/20] nbd: request_fn fixup
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (8 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 01/20] mm: serialize access to min_free_kbytes Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 22:47 ` Jens Axboe
2006-09-12 15:25 ` [PATCH 04/20] mm: methods for teaching filesystems about PG_swapcache pages Peter Zijlstra
` (10 subsequent siblings)
20 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Pavel Machek
[-- Attachment #1: nbd_fix.patch --]
[-- Type: text/plain, Size: 2382 bytes --]
Dropping the queue_lock opens up a nasty race, fix this race by
plugging the device when we're done.
Also includes a small cleanup.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Pavel Machek <pavel@ucw.cz>
---
drivers/block/nbd.c | 67 ++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 49 insertions(+), 18 deletions(-)
Index: linux-2.6/drivers/block/nbd.c
===================================================================
--- linux-2.6.orig/drivers/block/nbd.c 2006-09-07 17:20:52.000000000 +0200
+++ linux-2.6/drivers/block/nbd.c 2006-09-07 17:35:05.000000000 +0200
@@ -97,20 +97,24 @@ static const char *nbdcmd_to_ascii(int c
}
#endif /* NDEBUG */
-static void nbd_end_request(struct request *req)
+static void __nbd_end_request(struct request *req)
{
int uptodate = (req->errors == 0) ? 1 : 0;
- request_queue_t *q = req->q;
- unsigned long flags;
dprintk(DBG_BLKDEV, "%s: request %p: %s\n", req->rq_disk->disk_name,
req, uptodate? "done": "failed");
- spin_lock_irqsave(q->queue_lock, flags);
- if (!end_that_request_first(req, uptodate, req->nr_sectors)) {
+ if (!end_that_request_first(req, uptodate, req->nr_sectors))
end_that_request_last(req, uptodate);
- }
- spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
+static void nbd_end_request(struct request *req)
+{
+ request_queue_t *q = req->q;
+
+ spin_lock_irq(q->queue_lock);
+ __nbd_end_request(req);
+ spin_unlock_irq(q->queue_lock);
}
/*
@@ -435,10 +439,8 @@ static void do_nbd_request(request_queue
mutex_unlock(&lo->tx_lock);
printk(KERN_ERR "%s: Attempted send on closed socket\n",
lo->disk->disk_name);
- req->errors++;
- nbd_end_request(req);
spin_lock_irq(q->queue_lock);
- continue;
+ goto error_out;
}
lo->active_req = req;
@@ -463,10 +465,13 @@ static void do_nbd_request(request_queue
error_out:
req->errors++;
- spin_unlock(q->queue_lock);
- nbd_end_request(req);
- spin_lock(q->queue_lock);
+ __nbd_end_request(req);
}
+ /*
+ * q->queue_lock has been dropped, this opens up a race
+ * plug the device to close it.
+ */
+ blk_plug_device(q);
return;
}
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 10/20] mm: block device swap notification
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
2006-09-12 15:25 ` [PATCH 20/20] iscsi: support for swapping over iSCSI Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 18/20] netlink: add SOCK_VMIO support to AF_NETLINK Peter Zijlstra
` (18 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, James E.J. Bottomley,
Mike Christie, Pavel Machek
[-- Attachment #1: swapdev.patch --]
[-- Type: text/plain, Size: 1907 bytes --]
Some block devices need to do some extra work when used as swap device.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: James E.J. Bottomley <James.Bottomley@SteelEye.com>
CC: Mike Christie <michaelc@cs.wisc.edu>
CC: Pavel Machek <pavel@ucw.cz>
---
include/linux/fs.h | 1 +
mm/swapfile.c | 7 +++++++
2 files changed, 8 insertions(+)
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -1017,6 +1017,7 @@ struct block_device_operations {
int (*media_changed) (struct gendisk *);
int (*revalidate_disk) (struct gendisk *);
int (*getgeo)(struct block_device *, struct hd_geometry *);
+ int (*swapdev)(struct gendisk *, int enable);
struct module *owner;
};
Index: linux-2.6/mm/swapfile.c
===================================================================
--- linux-2.6.orig/mm/swapfile.c
+++ linux-2.6/mm/swapfile.c
@@ -1273,6 +1273,8 @@ asmlinkage long sys_swapoff(const char _
inode = mapping->host;
if (S_ISBLK(inode->i_mode)) {
struct block_device *bdev = I_BDEV(inode);
+ if (bdev->bd_disk->fops->swapdev)
+ bdev->bd_disk->fops->swapdev(bdev->bd_disk, 0);
set_blocksize(bdev, p->old_block_size);
bd_release(bdev);
} else {
@@ -1481,6 +1483,11 @@ asmlinkage long sys_swapdev(const char __
if (error < 0)
goto bad_swap;
p->bdev = bdev;
+ if (bdev->bd_disk->fops->swapdev) {
+ error = bdev->bd_disk->fops->swapdev(bdev->bd_disk, 1);
+ if (error < 0)
+ goto bad_swap;
+ }
} else if (S_ISREG(inode->i_mode)) {
p->bdev = inode->i_sb->s_bdev;
mutex_lock(&inode->i_mutex);
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 17/20] scsi: propagate the swapdev hook into the scsi stack
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (18 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 07/20] nfs: add a comment explaining the use of PG_private in the NFS client Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 16:37 ` [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Linus Torvalds
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, James E.J. Bottomley,
Mike Christie
[-- Attachment #1: scsi_swapdev.patch --]
[-- Type: text/plain, Size: 1874 bytes --]
Allow scsi devices to receive the swapdev notification.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: James E.J. Bottomley <James.Bottomley@SteelEye.com>
CC: Mike Christie <michaelc@cs.wisc.edu>
---
drivers/scsi/sd.c | 13 +++++++++++++
include/scsi/scsi_host.h | 7 +++++++
2 files changed, 20 insertions(+)
Index: linux-2.6/drivers/scsi/sd.c
===================================================================
--- linux-2.6.orig/drivers/scsi/sd.c
+++ linux-2.6/drivers/scsi/sd.c
@@ -892,6 +892,18 @@ static long sd_compat_ioctl(struct file
}
#endif
+static int sd_swapdev(struct gendisk *disk, int enable)
+{
+ int error = 0;
+ struct scsi_disk *sdkp = scsi_disk(disk);
+ struct scsi_device *sdp = sdkp->device;
+
+ if (sdp->host->hostt->swapdev)
+ error = sdp->host->hostt->swapdev(sdp, enable);
+
+ return error;
+}
+
static struct block_device_operations sd_fops = {
.owner = THIS_MODULE,
.open = sd_open,
@@ -903,6 +915,7 @@ static struct block_device_operations sd
#endif
.media_changed = sd_media_changed,
.revalidate_disk = sd_revalidate_disk,
+ .swapdev = sd_swapdev,
};
/**
Index: linux-2.6/include/scsi/scsi_host.h
===================================================================
--- linux-2.6.orig/include/scsi/scsi_host.h
+++ linux-2.6/include/scsi/scsi_host.h
@@ -288,6 +288,13 @@ struct scsi_host_template {
int (*suspend)(struct scsi_device *, pm_message_t state);
/*
+ * Notify that this device is used for swapping.
+ *
+ * Status: OPTIONAL
+ */
+ int (*swapdev)(struct scsi_device *, int enable);
+
+ /*
* Name of proc directory
*/
char *proc_name;
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 14/20] uml: enable scsi and add iscsi config
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (13 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 09/20] nfs: make swap on NFS robust Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 03/20] mm: add support for non block device backed swap files Peter Zijlstra
` (5 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Jeff Dike, Mike Christie
[-- Attachment #1: uml_iscsi.patch --]
[-- Type: text/plain, Size: 2658 bytes --]
Enable iSCSI on UML, dunno why SCSI was deemed broken, it works like a charm.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Jeff Dike <jdike@addtoit.com>
CC: Mike Christie <michaelc@cs.wisc.edu>
---
arch/um/Kconfig | 16 --------------
arch/um/Kconfig.scsi | 58 ---------------------------------------------------
2 files changed, 1 insertion(+), 73 deletions(-)
Index: linux-2.6/arch/um/Kconfig
===================================================================
--- linux-2.6.orig/arch/um/Kconfig
+++ linux-2.6/arch/um/Kconfig
@@ -285,21 +285,7 @@ source "crypto/Kconfig"
source "lib/Kconfig"
-menu "SCSI support"
-depends on BROKEN
-
-config SCSI
- tristate "SCSI support"
-
-# This gives us free_dma, which scsi.c wants.
-config GENERIC_ISA_DMA
- bool
- depends on SCSI
- default y
-
-source "arch/um/Kconfig.scsi"
-
-endmenu
+source "drivers/scsi/Kconfig"
source "drivers/md/Kconfig"
Index: linux-2.6/arch/um/Kconfig.scsi
===================================================================
--- linux-2.6.orig/arch/um/Kconfig.scsi
+++ /dev/null
@@ -1,58 +0,0 @@
-comment "SCSI support type (disk, tape, CD-ROM)"
- depends on SCSI
-
-config BLK_DEV_SD
- tristate "SCSI disk support"
- depends on SCSI
-
-config SD_EXTRA_DEVS
- int "Maximum number of SCSI disks that can be loaded as modules"
- depends on BLK_DEV_SD
- default "40"
-
-config CHR_DEV_ST
- tristate "SCSI tape support"
- depends on SCSI
-
-config BLK_DEV_SR
- tristate "SCSI CD-ROM support"
- depends on SCSI
-
-config BLK_DEV_SR_VENDOR
- bool "Enable vendor-specific extensions (for SCSI CDROM)"
- depends on BLK_DEV_SR
-
-config SR_EXTRA_DEVS
- int "Maximum number of CDROM devices that can be loaded as modules"
- depends on BLK_DEV_SR
- default "2"
-
-config CHR_DEV_SG
- tristate "SCSI generic support"
- depends on SCSI
-
-comment "Some SCSI devices (e.g. CD jukebox) support multiple LUNs"
- depends on SCSI
-
-#if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
-config SCSI_DEBUG_QUEUES
- bool "Enable extra checks in new queueing code"
- depends on SCSI
-
-#fi
-config SCSI_MULTI_LUN
- bool "Probe all LUNs on each SCSI device"
- depends on SCSI
-
-config SCSI_CONSTANTS
- bool "Verbose SCSI error reporting (kernel size +=12K)"
- depends on SCSI
-
-config SCSI_LOGGING
- bool "SCSI logging facility"
- depends on SCSI
-
-config SCSI_DEBUG
- tristate "SCSI debugging host simulator (EXPERIMENTAL)"
- depends on SCSI
-
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 15/20] iscsi: kernel side tcp connect
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (10 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 04/20] mm: methods for teaching filesystems about PG_swapcache pages Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 05/20] uml: rename arch/um remove_mapping() Peter Zijlstra
` (8 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Mike Christie, Peter Zijlstra
[-- Attachment #1: iscsi_ep_connect.patch --]
[-- Type: text/plain, Size: 5271 bytes --]
Move tcp connection code from user- into kernel-space.
This makes it possible to do TCP reconnect deadlock free.
(This patch requires userspace changes too)
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
drivers/scsi/iscsi_tcp.c | 108 ++++++++++++++++++++++++++++++++++++-----------
1 file changed, 83 insertions(+), 25 deletions(-)
Index: linux-2.6/drivers/scsi/iscsi_tcp.c
===================================================================
--- linux-2.6.orig/drivers/scsi/iscsi_tcp.c 2006-09-07 16:00:16.000000000 +0200
+++ linux-2.6/drivers/scsi/iscsi_tcp.c 2006-09-07 19:32:56.000000000 +0200
@@ -35,6 +35,8 @@
#include <linux/kfifo.h>
#include <linux/scatterlist.h>
#include <linux/mutex.h>
+#include <linux/syscalls.h>
+#include <linux/file.h>
#include <net/tcp.h>
#include <scsi/scsi_cmnd.h>
#include <scsi/scsi_host.h>
@@ -1062,21 +1064,6 @@ iscsi_conn_set_callbacks(struct iscsi_co
write_unlock_bh(&sk->sk_callback_lock);
}
-static void
-iscsi_conn_restore_callbacks(struct iscsi_tcp_conn *tcp_conn)
-{
- struct sock *sk = tcp_conn->sock->sk;
-
- /* restore socket callbacks, see also: iscsi_conn_set_callbacks() */
- write_lock_bh(&sk->sk_callback_lock);
- sk->sk_user_data = NULL;
- sk->sk_data_ready = tcp_conn->old_data_ready;
- sk->sk_state_change = tcp_conn->old_state_change;
- sk->sk_write_space = tcp_conn->old_write_space;
- sk->sk_no_check = 0;
- write_unlock_bh(&sk->sk_callback_lock);
-}
-
/**
* iscsi_send - generic send routine
* @sk: kernel's socket
@@ -1741,6 +1728,77 @@ iscsi_tcp_ctask_xmit(struct iscsi_conn *
return rc;
}
+static int
+iscsi_tcp_ep_connect(struct sockaddr *dst_addr, int non_blocking,
+ uint64_t *ep_handle)
+{
+ struct socket *sock;
+ int rc, size, arg = 1, window = 524288;
+
+ rc = sock_create_kern(dst_addr->sa_family, SOCK_STREAM, IPPROTO_TCP,
+ &sock);
+ if (rc < 0) {
+ printk(KERN_ERR "Could not create socket %d.\n", rc);
+ return rc;
+ }
+ sock->sk->sk_allocation = GFP_ATOMIC;
+/*
+ rc = sock->ops->setsockopt(sock, IPPROTO_TCP, TCP_NODELAY,
+ (char __user *)&arg, sizeof(arg));
+ if (rc) {
+ printk(KERN_ERR "Could not set TCP_NODELAY %d\n", rc);
+ goto release_sock;
+ }
+*/
+ /* should set like nfs */
+ sock_setsockopt(sock, SOL_SOCKET, SO_RCVBUF,
+ (char __user *)&window, sizeof(window));
+ sock_setsockopt(sock, SOL_SOCKET, SO_SNDBUF,
+ (char __user *)&window, sizeof(window));
+
+ if (dst_addr->sa_family == PF_INET)
+ size = sizeof(struct sockaddr_in);
+ else if (dst_addr->sa_family == PF_INET6)
+ size = sizeof(struct sockaddr_in6);
+ else {
+ rc = -EINVAL;
+ goto release_sock;
+ }
+
+ /* TODO we cannot block here */
+ rc = sock->ops->connect(sock, (struct sockaddr *)dst_addr, size,
+ 0 /*O_NONBLOCK*/);
+ if (rc == -EINPROGRESS)
+ rc = 0;
+ else if (rc) {
+ printk(KERN_ERR "Could not connect %d\n", rc);
+ goto release_sock;
+ }
+
+ rc = sock_map_fd(sock);
+ if (rc < 0)
+ goto release_sock;
+ *ep_handle = (uint64_t)rc;
+ return 0;
+
+release_sock:
+ sock_release(sock);
+ return rc;
+}
+
+static int
+iscsi_tcp_ep_poll(uint64_t ep_handle, int timeout_ms)
+{
+ /* we cheated and blocked on the connect (TODO must fix) */
+ return 1;
+}
+
+static void
+iscsi_tcp_ep_disconnect(uint64_t ep_handle)
+{
+ sys_close(ep_handle);
+}
+
static struct iscsi_cls_conn *
iscsi_tcp_conn_create(struct iscsi_cls_session *cls_session, uint32_t conn_idx)
{
@@ -1795,11 +1853,7 @@ iscsi_tcp_release_conn(struct iscsi_conn
if (!tcp_conn->sock)
return;
- sock_hold(tcp_conn->sock->sk);
- iscsi_conn_restore_callbacks(tcp_conn);
- sock_put(tcp_conn->sock->sk);
-
- sock_release(tcp_conn->sock);
+ fput(tcp_conn->sock->file);
tcp_conn->sock = NULL;
conn->recv_lock = NULL;
}
@@ -1856,10 +1910,13 @@ iscsi_tcp_conn_bind(struct iscsi_cls_ses
printk(KERN_ERR "iscsi_tcp: sockfd_lookup failed %d\n", err);
return -EEXIST;
}
+ get_file(sock->file);
err = iscsi_conn_bind(cls_session, cls_conn, is_leading);
- if (err)
+ if (err) {
+ fput(sock->file);
return err;
+ }
/* bind iSCSI connection and socket */
tcp_conn->sock = sock;
@@ -2041,13 +2098,11 @@ iscsi_tcp_conn_get_param(struct iscsi_cl
sk = tcp_conn->sock->sk;
if (sk->sk_family == PF_INET) {
inet = inet_sk(sk);
- len = sprintf(buf, "%u.%u.%u.%u\n",
+ len = sprintf(buf, NIPQUAD_FMT "\n",
NIPQUAD(inet->daddr));
} else {
np = inet6_sk(sk);
- len = sprintf(buf,
- "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n",
- NIP6(np->daddr));
+ len = sprintf(buf, NIP6_FMT "\n", NIP6(np->daddr));
}
mutex_unlock(&conn->xmitmutex);
break;
@@ -2185,6 +2240,9 @@ static struct iscsi_transport iscsi_tcp_
.get_session_param = iscsi_session_get_param,
.start_conn = iscsi_conn_start,
.stop_conn = iscsi_tcp_conn_stop,
+ .ep_connect = iscsi_tcp_ep_connect,
+ .ep_poll = iscsi_tcp_ep_poll,
+ .ep_disconnect = iscsi_tcp_ep_disconnect,
/* IO */
.send_pdu = iscsi_conn_send_pdu,
.get_stats = iscsi_conn_get_stats,
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 09/20] nfs: make swap on NFS robust
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (12 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 05/20] uml: rename arch/um remove_mapping() Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 14/20] uml: enable scsi and add iscsi config Peter Zijlstra
` (6 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Trond Myklebust
[-- Attachment #1: nfs_vmio.patch --]
[-- Type: text/plain, Size: 5641 bytes --]
Provide a proper a_ops->swapfile() implementation for NFS. This will set the
NFS socket to SOCK_VMIO and run socket reconnect under PF_MEMALLOC as well
as reset SOCK_VMIO before engaging the protocol ->connect() method.
PF_MEMALLOC should allow the allocation of struct socket and related objects
and the early (re)setting of SOCK_VMIO should allow us to receive the packets
required for the TCP connection buildup.
(swapping continues over a server reset during a large (4k) ping flood)
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Trond Myklebust <trond.myklebust@fys.uio.no>
---
fs/nfs/file.c | 2 -
include/linux/sunrpc/xprt.h | 5 +++-
net/sunrpc/sched.c | 4 +--
net/sunrpc/xprtsock.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 54 insertions(+), 4 deletions(-)
Index: linux-2.6/fs/nfs/file.c
===================================================================
--- linux-2.6.orig/fs/nfs/file.c
+++ linux-2.6/fs/nfs/file.c
@@ -323,7 +323,7 @@ static int nfs_release_page(struct page
static int nfs_swapfile(struct address_space *mapping, int enable)
{
- return 0;
+ return xs_swapper(NFS_CLIENT(mapping->host)->cl_xprt, enable);
}
const struct address_space_operations nfs_file_aops = {
Index: linux-2.6/net/sunrpc/xprtsock.c
===================================================================
--- linux-2.6.orig/net/sunrpc/xprtsock.c
+++ linux-2.6/net/sunrpc/xprtsock.c
@@ -1014,6 +1014,7 @@ static void xs_udp_connect_worker(void *
{
struct rpc_xprt *xprt = (struct rpc_xprt *) args;
struct socket *sock = xprt->sock;
+ unsigned long pflags = current->flags;
int err, status = -EIO;
if (xprt->shutdown || xprt->addr.sin_port == 0)
@@ -1021,6 +1022,9 @@ static void xs_udp_connect_worker(void *
dprintk("RPC: xs_udp_connect_worker for xprt %p\n", xprt);
+ if (xprt->swapper)
+ current->flags |= PF_MEMALLOC;
+
/* Start by resetting any existing state */
xs_close(xprt);
@@ -1054,6 +1058,9 @@ static void xs_udp_connect_worker(void *
xprt->sock = sock;
xprt->inet = sk;
+ if (xprt->swapper)
+ sk_set_vmio(sk);
+
write_unlock_bh(&sk->sk_callback_lock);
}
xs_udp_do_set_buffer_size(xprt);
@@ -1061,6 +1068,7 @@ static void xs_udp_connect_worker(void *
out:
xprt_wake_pending_tasks(xprt, status);
xprt_clear_connecting(xprt);
+ current->flags = pflags;
}
/*
@@ -1097,11 +1105,15 @@ static void xs_tcp_connect_worker(void *
{
struct rpc_xprt *xprt = (struct rpc_xprt *)args;
struct socket *sock = xprt->sock;
+ unsigned long pflags = current->flags;
int err, status = -EIO;
if (xprt->shutdown || xprt->addr.sin_port == 0)
goto out;
+ if (xprt->swapper)
+ current->flags |= PF_MEMALLOC;
+
dprintk("RPC: xs_tcp_connect_worker for xprt %p\n", xprt);
if (!xprt->sock) {
@@ -1148,6 +1160,10 @@ static void xs_tcp_connect_worker(void *
write_unlock_bh(&sk->sk_callback_lock);
}
+
+ if (xprt->swapper)
+ sk_set_vmio(xprt->inet);
+
/* Tell the socket layer to start connecting... */
xprt->stat.connect_count++;
xprt->stat.connect_start = jiffies;
@@ -1174,6 +1190,7 @@ out:
xprt_wake_pending_tasks(xprt, status);
out_clear:
xprt_clear_connecting(xprt);
+ current->flags = pflags;
}
/**
@@ -1369,3 +1386,33 @@ int xs_setup_tcp(struct rpc_xprt *xprt,
return 0;
}
+
+#define RPC_BUF_RESERVE_PAGES (1) /* XXX: how many concurrent rpc buffers? */
+#define RPC_RESERVE_PAGES (RPC_BUF_RESERVE_PAGES + TX_RESERVE_PAGES)
+
+/**
+ * xs_swapper - Tag this transport as being used for swap.
+ * @xprt: transport to tag
+ * @enable: enable/disable
+ *
+ */
+int xs_swapper(struct rpc_xprt *xprt, int enable)
+{
+ int err = 0;
+
+ if (enable) {
+ /*
+ * keep one extra sock reference so the reserve won't dip
+ * when the socket gets reconnected.
+ */
+ sk_adjust_memalloc(1, RPC_RESERVE_PAGES);
+ sk_set_vmio(xprt->inet);
+ xprt->swapper = 1;
+ } else if (xprt->swapper) {
+ xprt->swapper = 0;
+ sk_clear_vmio(xprt->inet);
+ sk_adjust_memalloc(-1, -RPC_RESERVE_PAGES);
+ }
+
+ return err;
+}
Index: linux-2.6/include/linux/sunrpc/xprt.h
===================================================================
--- linux-2.6.orig/include/linux/sunrpc/xprt.h
+++ linux-2.6/include/linux/sunrpc/xprt.h
@@ -147,7 +147,9 @@ struct rpc_xprt {
unsigned int max_reqs; /* total slots */
unsigned long state; /* transport state */
unsigned char shutdown : 1, /* being shut down */
- resvport : 1; /* use a reserved port */
+ resvport : 1, /* use a reserved port */
+ swapper : 1; /* we're swapping over this
+ transport */
/*
* XID
@@ -261,6 +263,7 @@ void xprt_disconnect(struct rpc_xprt *
*/
int xs_setup_udp(struct rpc_xprt *xprt, struct rpc_timeout *to);
int xs_setup_tcp(struct rpc_xprt *xprt, struct rpc_timeout *to);
+int xs_swapper(struct rpc_xprt *xprt, int enable);
/*
* Reserved bit positions in xprt->state
Index: linux-2.6/net/sunrpc/sched.c
===================================================================
--- linux-2.6.orig/net/sunrpc/sched.c
+++ linux-2.6/net/sunrpc/sched.c
@@ -736,8 +736,8 @@ void * rpc_malloc(struct rpc_task *task,
struct rpc_rqst *req = task->tk_rqstp;
gfp_t gfp;
- if (task->tk_flags & RPC_TASK_SWAPPER)
- gfp = GFP_ATOMIC;
+ if (RPC_IS_SWAPPER(task))
+ gfp = GFP_ATOMIC | __GFP_EMERGENCY;
else
gfp = GFP_NOFS;
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 19/20] mm: a process flags to avoid blocking allocations
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (5 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 08/20] nfs: enable swap on NFS Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 13/20] nbd: use swapdev hook to make swap deadlock free Peter Zijlstra
` (13 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Mike Christie
[-- Attachment #1: pf_mem_nowait.patch --]
[-- Type: text/plain, Size: 1816 bytes --]
PF_MEM_NOWAIT - will make allocations fail before blocking. This is usefull
to convert process behaviour to non-blocking.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Mike Christie <michaelc@cs.wisc.edu>
---
include/linux/sched.h | 1 +
mm/page_alloc.c | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1056,6 +1056,7 @@ static inline void put_task_struct(struc
#define PF_SPREAD_SLAB 0x02000000 /* Spread some slab caches over cpuset */
#define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */
#define PF_MUTEX_TESTER 0x20000000 /* Thread belongs to the rt mutex tester */
+#define PF_MEM_NOWAIT 0x40000000 /* Make allocations fail instead of block */
/*
* Only the _current_ task can read/write to tsk->flags, but other
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -912,11 +912,11 @@ struct page * fastcall
__alloc_pages(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist)
{
- const gfp_t wait = gfp_mask & __GFP_WAIT;
+ struct task_struct *p = current;
+ const int wait = (gfp_mask & __GFP_WAIT) && !(p->flags & PF_MEM_NOWAIT);
struct zone **z;
struct page *page;
struct reclaim_state reclaim_state;
- struct task_struct *p = current;
int do_retry;
int alloc_flags;
int did_some_progress;
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 16/20] iscsi: add session context to ep_connect
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (3 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 02/20] net: vm deadlock avoidance core Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 08/20] nfs: enable swap on NFS Peter Zijlstra
` (15 subsequent siblings)
20 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Mike Christie
[-- Attachment #1: iscsi_ep_connect_session.patch --]
[-- Type: text/plain, Size: 4208 bytes --]
In order to do a proper reconnect we need to know if we're a swapper.
Only the session context can tell us that.
(This patch breaks the NETLINK_ISCSI ABI, userspace also needs a change)
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Mike Christie <michaelc@cs.wisc.edu>
---
drivers/infiniband/ulp/iser/iscsi_iser.c | 3 ++-
drivers/scsi/iscsi_tcp.c | 3 ++-
drivers/scsi/scsi_transport_iscsi.c | 4 +++-
include/scsi/iscsi_if.h | 1 +
include/scsi/scsi_transport_iscsi.h | 3 ++-
5 files changed, 10 insertions(+), 4 deletions(-)
Index: linux-2.6/drivers/scsi/iscsi_tcp.c
===================================================================
--- linux-2.6.orig/drivers/scsi/iscsi_tcp.c 2006-09-07 19:32:56.000000000 +0200
+++ linux-2.6/drivers/scsi/iscsi_tcp.c 2006-09-07 19:34:07.000000000 +0200
@@ -1729,7 +1729,8 @@ iscsi_tcp_ctask_xmit(struct iscsi_conn *
}
static int
-iscsi_tcp_ep_connect(struct sockaddr *dst_addr, int non_blocking,
+iscsi_tcp_ep_connect(struct iscsi_cls_session *cls_session,
+ struct sockaddr *dst_addr, int non_blocking,
uint64_t *ep_handle)
{
struct socket *sock;
Index: linux-2.6/drivers/scsi/scsi_transport_iscsi.c
===================================================================
--- linux-2.6.orig/drivers/scsi/scsi_transport_iscsi.c 2006-09-07 19:32:37.000000000 +0200
+++ linux-2.6/drivers/scsi/scsi_transport_iscsi.c 2006-09-07 19:34:07.000000000 +0200
@@ -914,6 +914,7 @@ iscsi_if_transport_ep(struct iscsi_trans
struct iscsi_uevent *ev, int msg_type)
{
struct sockaddr *dst_addr;
+ struct iscsi_cls_session *session;
int rc = 0;
switch (msg_type) {
@@ -922,7 +923,8 @@ iscsi_if_transport_ep(struct iscsi_trans
return -EINVAL;
dst_addr = (struct sockaddr *)((char*)ev + sizeof(*ev));
- rc = transport->ep_connect(dst_addr,
+ session = iscsi_session_lookup(ev->u.ep_connect.sid);
+ rc = transport->ep_connect(session, dst_addr,
ev->u.ep_connect.non_blocking,
&ev->r.ep_connect_ret.handle);
break;
Index: linux-2.6/include/scsi/iscsi_if.h
===================================================================
--- linux-2.6.orig/include/scsi/iscsi_if.h 2006-09-07 19:32:37.000000000 +0200
+++ linux-2.6/include/scsi/iscsi_if.h 2006-09-07 19:34:07.000000000 +0200
@@ -117,6 +117,7 @@ struct iscsi_uevent {
} get_stats;
struct msg_transport_connect {
uint32_t non_blocking;
+ uint32_t sid;
} ep_connect;
struct msg_transport_poll {
uint64_t ep_handle;
Index: linux-2.6/include/scsi/scsi_transport_iscsi.h
===================================================================
--- linux-2.6.orig/include/scsi/scsi_transport_iscsi.h 2006-09-07 19:32:37.000000000 +0200
+++ linux-2.6/include/scsi/scsi_transport_iscsi.h 2006-09-07 19:34:07.000000000 +0200
@@ -120,7 +120,8 @@ struct iscsi_transport {
int (*xmit_mgmt_task) (struct iscsi_conn *conn,
struct iscsi_mgmt_task *mtask);
void (*session_recovery_timedout) (struct iscsi_cls_session *session);
- int (*ep_connect) (struct sockaddr *dst_addr, int non_blocking,
+ int (*ep_connect) (struct iscsi_cls_session *session,
+ struct sockaddr *dst_addr, int non_blocking,
uint64_t *ep_handle);
int (*ep_poll) (uint64_t ep_handle, int timeout_ms);
void (*ep_disconnect) (uint64_t ep_handle);
Index: linux-2.6/drivers/infiniband/ulp/iser/iscsi_iser.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/ulp/iser/iscsi_iser.c 2006-09-07 19:32:37.000000000 +0200
+++ linux-2.6/drivers/infiniband/ulp/iser/iscsi_iser.c 2006-09-07 19:34:07.000000000 +0200
@@ -490,7 +490,8 @@ iscsi_iser_conn_get_stats(struct iscsi_c
}
static int
-iscsi_iser_ep_connect(struct sockaddr *dst_addr, int non_blocking,
+iscsi_iser_ep_connect(struct iscsi_cls_session *cls_session,
+ struct sockaddr *dst_addr, int non_blocking,
__u64 *ep_handle)
{
int err;
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 20/20] iscsi: support for swapping over iSCSI.
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
@ 2006-09-12 15:25 ` Peter Zijlstra
2006-09-13 20:50 ` Mike Christie
2006-09-12 15:25 ` [PATCH 10/20] mm: block device swap notification Peter Zijlstra
` (19 subsequent siblings)
20 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-12 15:25 UTC (permalink / raw)
To: linux-mm, linux-kernel, netdev
Cc: Linus Torvalds, Andrew Morton, David Miller, Rik van Riel,
Daniel Phillips, Peter Zijlstra, Mike Christie
[-- Attachment #1: iscsi_vmio.patch --]
[-- Type: text/plain, Size: 10286 bytes --]
Implement sht->swapdev() for iSCSI. This method takes care of reserving
the extra memory needed and marking all relevant sockets with SOCK_VMIO.
When used for swapping, TCP socket creation is done under GFP_MEMALLOC and
the TCP connect is done with SOCK_VMIO to ensure their success. Also the
netlink userspace interface is marked SOCK_VMIO, this will ensure that even
under pressure we can still communicate with the daemon (which runs as
mlockall() and needs no additional memory to operate).
Netlink requests are handled under the new PF_MEM_NOWAIT when a swapper is
present. This ensures that the netlink socket will not block. User-space will
need to retry failed requests.
The TCP receive path is handled under PF_MEMALLOC for SOCK_VMIO sockets.
This makes sure we do not block the critical socket, and that we do not
fail to process incomming data.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Mike Christie <michaelc@cs.wisc.edu>
---
drivers/scsi/iscsi_tcp.c | 103 +++++++++++++++++++++++++++++++-----
drivers/scsi/scsi_transport_iscsi.c | 23 +++++++-
include/scsi/libiscsi.h | 1
include/scsi/scsi_transport_iscsi.h | 2
4 files changed, 113 insertions(+), 16 deletions(-)
Index: linux-2.6/drivers/scsi/iscsi_tcp.c
===================================================================
--- linux-2.6.orig/drivers/scsi/iscsi_tcp.c
+++ linux-2.6/drivers/scsi/iscsi_tcp.c
@@ -42,6 +42,7 @@
#include <scsi/scsi_host.h>
#include <scsi/scsi.h>
#include <scsi/scsi_transport_iscsi.h>
+#include <scsi/scsi_device.h>
#include "iscsi_tcp.h"
@@ -845,9 +846,13 @@ iscsi_tcp_data_recv(read_descriptor_t *r
int rc;
struct iscsi_conn *conn = rd_desc->arg.data;
struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
- int processed;
+ int processed = 0;
char pad[ISCSI_PAD_LEN];
struct scatterlist sg;
+ unsigned long pflags = current->flags;
+
+ if (sk_has_vmio(tcp_conn->sock->sk))
+ current->flags |= PF_MEMALLOC;
/*
* Save current SKB and its offset in the corresponding
@@ -866,7 +871,7 @@ more:
if (unlikely(conn->suspend_rx)) {
debug_tcp("conn %d Rx suspended!\n", conn->id);
- return 0;
+ goto out;
}
if (tcp_conn->in_progress == IN_PROGRESS_WAIT_HEADER ||
@@ -877,7 +882,7 @@ more:
goto nomore;
else {
iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
- return 0;
+ goto out;
}
}
@@ -891,7 +896,7 @@ more:
tcp_conn->in_progress = IN_PROGRESS_DATA_RECV;
} else if (rc) {
iscsi_conn_failure(conn, rc);
- return 0;
+ goto out;
}
}
@@ -905,7 +910,7 @@ more:
if (rc == -EAGAIN)
goto again;
iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
- return 0;
+ goto out;
}
memcpy(&recv_digest, conn->data, sizeof(uint32_t));
@@ -914,7 +919,7 @@ more:
"0x%x != 0x%x\n", recv_digest,
tcp_conn->in.datadgst);
iscsi_conn_failure(conn, ISCSI_ERR_DATA_DGST);
- return 0;
+ goto out;
} else {
debug_tcp("iscsi_tcp: data digest match!"
"0x%x == 0x%x\n", recv_digest,
@@ -934,7 +939,7 @@ more:
if (rc == -EAGAIN)
goto again;
iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
- return 0;
+ goto out;
}
tcp_conn->in.copy -= tcp_conn->in.padding;
tcp_conn->in.offset += tcp_conn->in.padding;
@@ -969,6 +974,8 @@ more:
nomore:
processed = tcp_conn->in.offset - offset;
BUG_ON(processed == 0);
+out:
+ current->flags = pflags;
return processed;
again:
@@ -979,7 +986,7 @@ again:
BUG_ON(processed > len);
conn->rxdata_octets += processed;
- return processed;
+ goto out;
}
static void
@@ -1735,14 +1742,26 @@ iscsi_tcp_ep_connect(struct iscsi_cls_se
{
struct socket *sock;
int rc, size, arg = 1, window = 524288;
+ int swapper = 0;
+ unsigned long pflags = current->flags;
+
+ if (cls_session) {
+ struct iscsi_session *session;
+ session = class_to_transport_session(cls_session);
+ swapper = session->swapper;
+ }
+
+ if (swapper)
+ pflags |= PF_MEMALLOC;
rc = sock_create_kern(dst_addr->sa_family, SOCK_STREAM, IPPROTO_TCP,
&sock);
if (rc < 0) {
printk(KERN_ERR "Could not create socket %d.\n", rc);
- return rc;
+ goto out;
}
- sock->sk->sk_allocation = GFP_ATOMIC;
+ sock->sk->sk_allocation = GFP_NOIO; // XXX GFP_ATOMIC;
+
/*
rc = sock->ops->setsockopt(sock, IPPROTO_TCP, TCP_NODELAY,
(char __user *)&arg, sizeof(arg));
@@ -1766,6 +1785,9 @@ iscsi_tcp_ep_connect(struct iscsi_cls_se
goto release_sock;
}
+ if (swapper)
+ sk_set_vmio(sock->sk);
+
/* TODO we cannot block here */
rc = sock->ops->connect(sock, (struct sockaddr *)dst_addr, size,
0 /*O_NONBLOCK*/);
@@ -1780,11 +1802,14 @@ iscsi_tcp_ep_connect(struct iscsi_cls_se
if (rc < 0)
goto release_sock;
*ep_handle = (uint64_t)rc;
- return 0;
+ rc = 0;
+out:
+ current->flags = pflags;
+ return rc;
release_sock:
sock_release(sock);
- return rc;
+ goto out;
}
static int
@@ -1926,10 +1951,11 @@ iscsi_tcp_conn_bind(struct iscsi_cls_ses
sk = sock->sk;
sk->sk_reuse = 1;
sk->sk_sndtimeo = 15 * HZ; /* FIXME: make it configurable */
- sk->sk_allocation = GFP_ATOMIC;
/* FIXME: disable Nagle's algorithm */
+ BUG_ON(!sk_has_vmio(sk) && conn->session->swapper);
+
/*
* Intercept TCP callbacks for sendfile like receive
* processing.
@@ -2187,6 +2213,56 @@ static void iscsi_tcp_session_destroy(st
iscsi_session_teardown(cls_session);
}
+#define NETLINK_RESERVE_PAGES (5 + 2 * (5 + 31))
+#define ISCSI_RESERVE_PAGES (NETLINK_RESERVE_PAGES + TX_RESERVE_PAGES)
+
+static int iscsi_swapdev(struct scsi_device *sdev, int enable)
+{
+ int error = 0;
+ struct Scsi_Host *host;
+ struct iscsi_session *session;
+ struct iscsi_conn *conn;
+ struct sock *sk;
+ int daemon_pid;
+
+ host = sdev->host;
+ session = iscsi_hostdata(host->hostdata);
+ session->swapper = !!enable;
+ daemon_pid = iscsi_if_daemon_pid(session->tt);
+
+ if (enable) {
+ sk_adjust_memalloc(1, ISCSI_RESERVE_PAGES);
+ sk = netlink_lookup(NETLINK_ISCSI, 0);
+ if (sk)
+ sk_set_vmio(sk);
+ sk = netlink_lookup(NETLINK_ISCSI, daemon_pid);
+ if (sk)
+ sk_set_vmio(sk);
+ }
+
+ spin_lock(&session->lock);
+ list_for_each_entry(conn, &session->connections, item) {
+ struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+ if (enable)
+ sk_set_vmio(tcp_conn->sock->sk);
+ else
+ sk_clear_vmio(tcp_conn->sock->sk);
+ }
+ spin_unlock(&session->lock);
+
+ if (!enable) {
+ sk = netlink_lookup(NETLINK_ISCSI, daemon_pid);
+ if (sk)
+ sk_clear_vmio(sk);
+ sk = netlink_lookup(NETLINK_ISCSI, 0);
+ if (sk)
+ sk_clear_vmio(sk);
+ sk_adjust_memalloc(-1, -ISCSI_RESERVE_PAGES);
+ }
+
+ return error;
+}
+
static struct scsi_host_template iscsi_sht = {
.name = "iSCSI Initiator over TCP/IP",
.queuecommand = iscsi_queuecommand,
@@ -2199,6 +2275,7 @@ static struct scsi_host_template iscsi_s
.use_clustering = DISABLE_CLUSTERING,
.proc_name = "iscsi_tcp",
.this_id = -1,
+ .swapdev = iscsi_swapdev,
};
static struct iscsi_transport iscsi_tcp_transport = {
Index: linux-2.6/include/scsi/libiscsi.h
===================================================================
--- linux-2.6.orig/include/scsi/libiscsi.h
+++ linux-2.6/include/scsi/libiscsi.h
@@ -245,6 +245,7 @@ struct iscsi_session {
int mgmtpool_max; /* size of mgmt array */
struct iscsi_mgmt_task **mgmt_cmds; /* Original mgmt arr */
struct iscsi_queue mgmtpool; /* Mgmt PDU's pool */
+ int swapper; /* we are used to swap on */
};
/*
Index: linux-2.6/drivers/scsi/scsi_transport_iscsi.c
===================================================================
--- linux-2.6.orig/drivers/scsi/scsi_transport_iscsi.c
+++ linux-2.6/drivers/scsi/scsi_transport_iscsi.c
@@ -496,6 +496,13 @@ iscsi_if_transport_lookup(struct iscsi_t
return NULL;
}
+int iscsi_if_daemon_pid(struct iscsi_transport *tt)
+{
+ return iscsi_if_transport_lookup(tt)->daemon_pid;
+}
+
+EXPORT_SYMBOL_GPL(iscsi_if_daemon_pid);
+
static int
iscsi_broadcast_skb(struct sk_buff *skb, gfp_t gfp)
{
@@ -608,7 +615,7 @@ iscsi_if_send_reply(int pid, int seq, in
int flags = multi ? NLM_F_MULTI : 0;
int t = done ? NLMSG_DONE : type;
- skb = alloc_skb(len, GFP_KERNEL);
+ skb = alloc_skb(len, nls->sk_allocation);
/*
* FIXME:
* user is supposed to react on iferror == -ENOMEM;
@@ -970,6 +977,7 @@ iscsi_if_recv_msg(struct sk_buff *skb, s
struct iscsi_cls_session *session;
struct iscsi_cls_conn *conn;
unsigned long flags;
+ int pid;
priv = iscsi_if_transport_lookup(iscsi_ptr(ev->transport_handle));
if (!priv)
@@ -979,7 +987,15 @@ iscsi_if_recv_msg(struct sk_buff *skb, s
if (!try_module_get(transport->owner))
return -EINVAL;
- priv->daemon_pid = NETLINK_CREDS(skb)->pid;
+ pid = NETLINK_CREDS(skb)->pid;
+ if (priv->daemon_pid > 0 && priv->daemon_pid != pid) {
+ if (sk_has_vmio(nls)) {
+ struct sock * sk = netlink_lookup(NETLINK_ISCSI, pid);
+ BUG_ON(!sk);
+ WARN_ON(!sk_set_vmio(sk));
+ }
+ }
+ priv->daemon_pid = pid;
switch (nlh->nlmsg_type) {
case ISCSI_UEVENT_CREATE_SESSION:
@@ -1094,7 +1110,10 @@ iscsi_if_rx(struct sock *sk, int len)
if (rlen > skb->len)
rlen = skb->len;
+ if (sk_has_vmio(sk))
+ current->flags |= PF_MEM_NOWAIT;
err = iscsi_if_recv_msg(skb, nlh);
+ current->flags &= ~PF_MEM_NOWAIT;
if (err) {
ev->type = ISCSI_KEVENT_IF_ERROR;
ev->iferror = err;
Index: linux-2.6/include/scsi/scsi_transport_iscsi.h
===================================================================
--- linux-2.6.orig/include/scsi/scsi_transport_iscsi.h
+++ linux-2.6/include/scsi/scsi_transport_iscsi.h
@@ -218,8 +218,8 @@ extern int iscsi_destroy_session(struct
extern struct iscsi_cls_conn *iscsi_create_conn(struct iscsi_cls_session *sess,
uint32_t cid);
extern int iscsi_destroy_conn(struct iscsi_cls_conn *conn);
+extern int iscsi_if_daemon_pid(struct iscsi_transport *tt);
extern void iscsi_unblock_session(struct iscsi_cls_session *session);
extern void iscsi_block_session(struct iscsi_cls_session *session);
-
#endif
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7)
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
` (19 preceding siblings ...)
2006-09-12 15:25 ` [PATCH 17/20] scsi: propagate the swapdev hook into the scsi stack Peter Zijlstra
@ 2006-09-12 16:37 ` Linus Torvalds
2006-09-12 23:58 ` Nate Diller
20 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2006-09-12 16:37 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mm, linux-kernel, netdev, Andrew Morton, David Miller,
Rik van Riel, Daniel Phillips
On Tue, 12 Sep 2006, Peter Zijlstra wrote:
>
> Linus, when I mentioned swap over network to you in Ottawa, you said it was
> a valid use case, that people actually do and want this. Can you agree with
> the approach taken in these patches?
Well, in all honesty, I don't think I really said "valid", but that I said
that some crazy people want to do it, and that we should try to allow them
their foibles.
So I'd be nervous to do any _guarantees_. I think that good VM policies
should make it be something that works in general (the dirty mapping
limits in particular), but I'd be a bit nervous about anybody taking it
_too_ seriously. Crazy people are still crazy, they just might be right
under certain reasonably-well-controlled circumstances.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 12/20] nbd: limit blk_queue
2006-09-12 15:25 ` [PATCH 12/20] nbd: limit blk_queue Peter Zijlstra
@ 2006-09-12 22:47 ` Jens Axboe
0 siblings, 0 replies; 37+ messages in thread
From: Jens Axboe @ 2006-09-12 22:47 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mm, linux-kernel, netdev, Linus Torvalds, Andrew Morton,
David Miller, Rik van Riel, Daniel Phillips, Pavel Machek
On Tue, Sep 12 2006, Peter Zijlstra wrote:
> Limit each request to 1 page, so that the request throttling also limits the
> number of in-flight pages.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Daniel Phillips <phillips@google.com>
> CC: Pavel Machek <pavel@ucw.cz>
> ---
> drivers/block/nbd.c | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/drivers/block/nbd.c
> ===================================================================
> --- linux-2.6.orig/drivers/block/nbd.c 2006-09-07 18:43:41.000000000 +0200
> +++ linux-2.6/drivers/block/nbd.c 2006-09-07 18:44:12.000000000 +0200
> @@ -638,6 +638,9 @@ static int __init nbd_init(void)
> put_disk(disk);
> goto out;
> }
> + blk_queue_max_segment_size(disk->queue, PAGE_SIZE);
> + blk_queue_max_hw_segments(disk->queue, 1);
> + blk_queue_max_phys_segments(disk->queue, 1);
Another bandaid. What happens if nr_requests number of pages is still
too many for a system? You just moved whatever problem you had, you
didn't solve anything.
--
Jens Axboe
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 11/20] nbd: request_fn fixup
2006-09-12 15:25 ` [PATCH 11/20] nbd: request_fn fixup Peter Zijlstra
@ 2006-09-12 22:47 ` Jens Axboe
2006-09-13 0:21 ` Jeff Garzik
0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2006-09-12 22:47 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mm, linux-kernel, netdev, Linus Torvalds, Andrew Morton,
David Miller, Rik van Riel, Daniel Phillips, Pavel Machek
On Tue, Sep 12 2006, Peter Zijlstra wrote:
> @@ -463,10 +465,13 @@ static void do_nbd_request(request_queue
>
> error_out:
> req->errors++;
> - spin_unlock(q->queue_lock);
> - nbd_end_request(req);
> - spin_lock(q->queue_lock);
> + __nbd_end_request(req);
> }
> + /*
> + * q->queue_lock has been dropped, this opens up a race
> + * plug the device to close it.
> + */
> + blk_plug_device(q);
> return;
> }
This looks wrong, I wonder if this only fixes things for you because it
happens to reinvoke the request handler after the timeout occurs? Your
comment doesn't really describe what you think is going on, please
describe in detail what you think is happening here that the plugging
supposedly solves.
Generally the block device rule is that once you are invoked due to an
unplug (or whatever) event, it is the responsibility of the block device
to run the queue until it's done. So if you bail out of queue handling
for whatever reason (might be resource starvation in hard- or software),
you must make sure to reenter queue handling since the device will not
get replugged while it has requests pending. Unless you run into some
software resource shortage, running of the queue is done
deterministically when you know resources are available (ie an io
completes). The device plugging itself is only ever done when you
encounter a shortage outside of your control (memory shortage, for
instance) _and_ you don't already have pending work where you can invoke
queueing from again.
--
Jens Axboe
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7)
2006-09-12 16:37 ` [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Linus Torvalds
@ 2006-09-12 23:58 ` Nate Diller
0 siblings, 0 replies; 37+ messages in thread
From: Nate Diller @ 2006-09-12 23:58 UTC (permalink / raw)
To: Linus Torvalds
Cc: Peter Zijlstra, linux-mm, linux-kernel, netdev, Andrew Morton,
David Miller, Rik van Riel, Daniel Phillips
On 9/12/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Tue, 12 Sep 2006, Peter Zijlstra wrote:
> >
> > Linus, when I mentioned swap over network to you in Ottawa, you said it was
> > a valid use case, that people actually do and want this. Can you agree with
> > the approach taken in these patches?
>
> Well, in all honesty, I don't think I really said "valid", but that I said
> that some crazy people want to do it, and that we should try to allow them
> their foibles.
>
> So I'd be nervous to do any _guarantees_. I think that good VM policies
> should make it be something that works in general (the dirty mapping
> limits in particular), but I'd be a bit nervous about anybody taking it
> _too_ seriously. Crazy people are still crazy, they just might be right
> under certain reasonably-well-controlled circumstances.
(oops, forgot to cc: the list)
Personally, I'm a little unhappy with the added complexity here, I'm
not convinced that this extra feature is worth it. In particular,
adding to the address_space_operations, the block_device_operations,
and creating a new swap index/offset interface just for this seems
questionable. I feel like interface bloat should be reserved for
features that have widespread use and benefit.
Not that I'm opposed to this feature, just that I think this patch is
too invasive interface-wise.
NATE
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 11/20] nbd: request_fn fixup
2006-09-12 22:47 ` Jens Axboe
@ 2006-09-13 0:21 ` Jeff Garzik
2006-09-13 6:14 ` Jens Axboe
0 siblings, 1 reply; 37+ messages in thread
From: Jeff Garzik @ 2006-09-13 0:21 UTC (permalink / raw)
To: Jens Axboe
Cc: Peter Zijlstra, linux-mm, linux-kernel, netdev, Linus Torvalds,
Andrew Morton, David Miller, Rik van Riel, Daniel Phillips,
Pavel Machek
Jens Axboe wrote:
> Generally the block device rule is that once you are invoked due to an
> unplug (or whatever) event, it is the responsibility of the block device
> to run the queue until it's done. So if you bail out of queue handling
> for whatever reason (might be resource starvation in hard- or software),
> you must make sure to reenter queue handling since the device will not
> get replugged while it has requests pending. Unless you run into some
> software resource shortage, running of the queue is done
> deterministically when you know resources are available (ie an io
> completes). The device plugging itself is only ever done when you
> encounter a shortage outside of your control (memory shortage, for
> instance) _and_ you don't already have pending work where you can invoke
> queueing from again.
Or he could employ the blk_{start,stop}_queue() functions, if that model
is easier for the driver (and brain).
Jeff
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 11/20] nbd: request_fn fixup
2006-09-13 0:21 ` Jeff Garzik
@ 2006-09-13 6:14 ` Jens Axboe
0 siblings, 0 replies; 37+ messages in thread
From: Jens Axboe @ 2006-09-13 6:14 UTC (permalink / raw)
To: Jeff Garzik
Cc: Peter Zijlstra, linux-mm, linux-kernel, netdev, Linus Torvalds,
Andrew Morton, David Miller, Rik van Riel, Daniel Phillips,
Pavel Machek
On Tue, Sep 12 2006, Jeff Garzik wrote:
> Jens Axboe wrote:
> >Generally the block device rule is that once you are invoked due to an
> >unplug (or whatever) event, it is the responsibility of the block device
> >to run the queue until it's done. So if you bail out of queue handling
> >for whatever reason (might be resource starvation in hard- or software),
> >you must make sure to reenter queue handling since the device will not
> >get replugged while it has requests pending. Unless you run into some
> >software resource shortage, running of the queue is done
> >deterministically when you know resources are available (ie an io
> >completes). The device plugging itself is only ever done when you
> >encounter a shortage outside of your control (memory shortage, for
> >instance) _and_ you don't already have pending work where you can invoke
> >queueing from again.
>
> Or he could employ the blk_{start,stop}_queue() functions, if that model
> is easier for the driver (and brain).
Definitely, yes.
--
Jens Axboe
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.
2006-09-12 15:25 ` [PATCH 20/20] iscsi: support for swapping over iSCSI Peter Zijlstra
@ 2006-09-13 20:50 ` Mike Christie
2006-09-14 6:17 ` Peter Zijlstra
0 siblings, 1 reply; 37+ messages in thread
From: Mike Christie @ 2006-09-13 20:50 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mm, linux-kernel, netdev, Linus Torvalds, Andrew Morton,
David Miller, Rik van Riel, Daniel Phillips
Peter Zijlstra wrote:
> Implement sht->swapdev() for iSCSI. This method takes care of reserving
> the extra memory needed and marking all relevant sockets with SOCK_VMIO.
>
> When used for swapping, TCP socket creation is done under GFP_MEMALLOC and
> the TCP connect is done with SOCK_VMIO to ensure their success. Also the
> netlink userspace interface is marked SOCK_VMIO, this will ensure that even
> under pressure we can still communicate with the daemon (which runs as
> mlockall() and needs no additional memory to operate).
>
> Netlink requests are handled under the new PF_MEM_NOWAIT when a swapper is
> present. This ensures that the netlink socket will not block. User-space will
> need to retry failed requests.
>
> The TCP receive path is handled under PF_MEMALLOC for SOCK_VMIO sockets.
> This makes sure we do not block the critical socket, and that we do not
> fail to process incomming data.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> CC: Mike Christie <michaelc@cs.wisc.edu>
> ---
> drivers/scsi/iscsi_tcp.c | 103 +++++++++++++++++++++++++++++++-----
> drivers/scsi/scsi_transport_iscsi.c | 23 +++++++-
> include/scsi/libiscsi.h | 1
> include/scsi/scsi_transport_iscsi.h | 2
> 4 files changed, 113 insertions(+), 16 deletions(-)
>
> Index: linux-2.6/drivers/scsi/iscsi_tcp.c
> ===================================================================
> --- linux-2.6.orig/drivers/scsi/iscsi_tcp.c
> +++ linux-2.6/drivers/scsi/iscsi_tcp.c
> @@ -42,6 +42,7 @@
> #include <scsi/scsi_host.h>
> #include <scsi/scsi.h>
> #include <scsi/scsi_transport_iscsi.h>
> +#include <scsi/scsi_device.h>
>
> #include "iscsi_tcp.h"
>
> @@ -845,9 +846,13 @@ iscsi_tcp_data_recv(read_descriptor_t *r
> int rc;
> struct iscsi_conn *conn = rd_desc->arg.data;
> struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
> - int processed;
> + int processed = 0;
> char pad[ISCSI_PAD_LEN];
> struct scatterlist sg;
> + unsigned long pflags = current->flags;
> +
> + if (sk_has_vmio(tcp_conn->sock->sk))
> + current->flags |= PF_MEMALLOC;
>
Is this too late or not needed or what is it for? This function gets run
from the network layer's softirq and at this point we have a skbuff with
data that we want to process. The iscsi layer also does not allocate
memory for read or write IO in this path.
I think we would want to set this flag at a lower level. Something
closer to where the skbuf is allocated?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.
2006-09-13 20:50 ` Mike Christie
@ 2006-09-14 6:17 ` Peter Zijlstra
2006-09-14 19:22 ` Mike Christie
0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-14 6:17 UTC (permalink / raw)
To: Mike Christie
Cc: linux-mm, linux-kernel, netdev, Linus Torvalds, Andrew Morton,
David Miller, Rik van Riel, Daniel Phillips
On Wed, 2006-09-13 at 15:50 -0500, Mike Christie wrote:
> Peter Zijlstra wrote:
> > Implement sht->swapdev() for iSCSI. This method takes care of reserving
> > the extra memory needed and marking all relevant sockets with SOCK_VMIO.
> >
> > When used for swapping, TCP socket creation is done under GFP_MEMALLOC and
> > the TCP connect is done with SOCK_VMIO to ensure their success. Also the
> > netlink userspace interface is marked SOCK_VMIO, this will ensure that even
> > under pressure we can still communicate with the daemon (which runs as
> > mlockall() and needs no additional memory to operate).
> >
> > Netlink requests are handled under the new PF_MEM_NOWAIT when a swapper is
> > present. This ensures that the netlink socket will not block. User-space will
> > need to retry failed requests.
> >
> > The TCP receive path is handled under PF_MEMALLOC for SOCK_VMIO sockets.
> > This makes sure we do not block the critical socket, and that we do not
> > fail to process incomming data.
> >
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > CC: Mike Christie <michaelc@cs.wisc.edu>
> > ---
> > drivers/scsi/iscsi_tcp.c | 103 +++++++++++++++++++++++++++++++-----
> > drivers/scsi/scsi_transport_iscsi.c | 23 +++++++-
> > include/scsi/libiscsi.h | 1
> > include/scsi/scsi_transport_iscsi.h | 2
> > 4 files changed, 113 insertions(+), 16 deletions(-)
> >
> > Index: linux-2.6/drivers/scsi/iscsi_tcp.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/scsi/iscsi_tcp.c
> > +++ linux-2.6/drivers/scsi/iscsi_tcp.c
> > @@ -42,6 +42,7 @@
> > #include <scsi/scsi_host.h>
> > #include <scsi/scsi.h>
> > #include <scsi/scsi_transport_iscsi.h>
> > +#include <scsi/scsi_device.h>
> >
> > #include "iscsi_tcp.h"
> >
> > @@ -845,9 +846,13 @@ iscsi_tcp_data_recv(read_descriptor_t *r
> > int rc;
> > struct iscsi_conn *conn = rd_desc->arg.data;
> > struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
> > - int processed;
> > + int processed = 0;
> > char pad[ISCSI_PAD_LEN];
> > struct scatterlist sg;
> > + unsigned long pflags = current->flags;
> > +
> > + if (sk_has_vmio(tcp_conn->sock->sk))
> > + current->flags |= PF_MEMALLOC;
> >
>
> Is this too late or not needed or what is it for? This function gets run
> from the network layer's softirq and at this point we have a skbuff with
> data that we want to process. The iscsi layer also does not allocate
> memory for read or write IO in this path.
I thought I found allocations in that path, lemme search...
found this:
iscsi_tcp_data_recv()
iscsi_data_rescv()
iscsi_complete_pdu()
__iscsi_complete_pdu()
iscsi_recv_pdu()
alloc_skb( GFP_ATOMIC);
> I think we would want to set this flag at a lower level. Something
> closer to where the skbuf is allocated?
Is that the skbuff you were talking about? If so, I'd need to carve a
path to pass the swapper information. I had that in a previous patch,
but that was large and ugly. I had to go carrying gfp_t flags all
through that call chain.
I could try again if you prefer that.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.
2006-09-14 6:17 ` Peter Zijlstra
@ 2006-09-14 19:22 ` Mike Christie
2006-09-14 20:35 ` Peter Zijlstra
0 siblings, 1 reply; 37+ messages in thread
From: Mike Christie @ 2006-09-14 19:22 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mm, linux-kernel, netdev, Linus Torvalds, Andrew Morton,
David Miller, Rik van Riel, Daniel Phillips
Peter Zijlstra wrote:
> On Wed, 2006-09-13 at 15:50 -0500, Mike Christie wrote:
>> Peter Zijlstra wrote:
>>> Implement sht->swapdev() for iSCSI. This method takes care of reserving
>>> the extra memory needed and marking all relevant sockets with SOCK_VMIO.
>>>
>>> When used for swapping, TCP socket creation is done under GFP_MEMALLOC and
>>> the TCP connect is done with SOCK_VMIO to ensure their success. Also the
>>> netlink userspace interface is marked SOCK_VMIO, this will ensure that even
>>> under pressure we can still communicate with the daemon (which runs as
>>> mlockall() and needs no additional memory to operate).
>>>
>>> Netlink requests are handled under the new PF_MEM_NOWAIT when a swapper is
>>> present. This ensures that the netlink socket will not block. User-space will
>>> need to retry failed requests.
>>>
>>> The TCP receive path is handled under PF_MEMALLOC for SOCK_VMIO sockets.
>>> This makes sure we do not block the critical socket, and that we do not
>>> fail to process incomming data.
>>>
>>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>>> CC: Mike Christie <michaelc@cs.wisc.edu>
>>> ---
>>> drivers/scsi/iscsi_tcp.c | 103 +++++++++++++++++++++++++++++++-----
>>> drivers/scsi/scsi_transport_iscsi.c | 23 +++++++-
>>> include/scsi/libiscsi.h | 1
>>> include/scsi/scsi_transport_iscsi.h | 2
>>> 4 files changed, 113 insertions(+), 16 deletions(-)
>>>
>>> Index: linux-2.6/drivers/scsi/iscsi_tcp.c
>>> ===================================================================
>>> --- linux-2.6.orig/drivers/scsi/iscsi_tcp.c
>>> +++ linux-2.6/drivers/scsi/iscsi_tcp.c
>>> @@ -42,6 +42,7 @@
>>> #include <scsi/scsi_host.h>
>>> #include <scsi/scsi.h>
>>> #include <scsi/scsi_transport_iscsi.h>
>>> +#include <scsi/scsi_device.h>
>>>
>>> #include "iscsi_tcp.h"
>>>
>>> @@ -845,9 +846,13 @@ iscsi_tcp_data_recv(read_descriptor_t *r
>>> int rc;
>>> struct iscsi_conn *conn = rd_desc->arg.data;
>>> struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
>>> - int processed;
>>> + int processed = 0;
>>> char pad[ISCSI_PAD_LEN];
>>> struct scatterlist sg;
>>> + unsigned long pflags = current->flags;
>>> +
>>> + if (sk_has_vmio(tcp_conn->sock->sk))
>>> + current->flags |= PF_MEMALLOC;
>>>
>> Is this too late or not needed or what is it for? This function gets run
>> from the network layer's softirq and at this point we have a skbuff with
>> data that we want to process. The iscsi layer also does not allocate
>> memory for read or write IO in this path.
>
> I thought I found allocations in that path, lemme search...
> found this:
>
> iscsi_tcp_data_recv()
> iscsi_data_rescv()
> iscsi_complete_pdu()
> __iscsi_complete_pdu()
> iscsi_recv_pdu()
> alloc_skb( GFP_ATOMIC);
>
You are right that is for the netlink interface. Could we move the
PF_MEMALLOC setting and clearing to iscsi_recv_pdu and and add it to
iscsi_conn_error in scsi_transport_iscsi.c so that iscsi_iser and
qla4xxx will have it set when they need it. I will send a patch for this
along with a way to have the netlink sock vmio set for all iscsi drivers
that need it.
>> I think we would want to set this flag at a lower level. Something
>> closer to where the skbuf is allocated?
>
> Is that the skbuff you were talking about? If so, I'd need to carve a
> path to pass the swapper information. I had that in a previous patch,
> but that was large and ugly. I had to go carrying gfp_t flags all
> through that call chain.
>
In my original post I was just concerned about the sk_buff that gets
passed to the iscsi layer in iscsi_tcp_data_recv. I was wondering if the
chunk of code in the network layer or network driver that allocated that
skbuff needed to set PF_MEMALLOC.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.
2006-09-14 19:22 ` Mike Christie
@ 2006-09-14 20:35 ` Peter Zijlstra
2006-09-14 20:46 ` Peter Zijlstra
2006-09-14 21:00 ` Mike Christie
0 siblings, 2 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-14 20:35 UTC (permalink / raw)
To: Mike Christie
Cc: linux-mm, linux-kernel, netdev, Linus Torvalds, Andrew Morton,
David Miller, Rik van Riel, Daniel Phillips
On Thu, 2006-09-14 at 14:22 -0500, Mike Christie wrote:
> Peter Zijlstra wrote:
> > On Wed, 2006-09-13 at 15:50 -0500, Mike Christie wrote:
> >> Peter Zijlstra wrote:
> >>> Implement sht->swapdev() for iSCSI. This method takes care of reserving
> >>> the extra memory needed and marking all relevant sockets with SOCK_VMIO.
> >>>
> >>> When used for swapping, TCP socket creation is done under GFP_MEMALLOC and
> >>> the TCP connect is done with SOCK_VMIO to ensure their success. Also the
> >>> netlink userspace interface is marked SOCK_VMIO, this will ensure that even
> >>> under pressure we can still communicate with the daemon (which runs as
> >>> mlockall() and needs no additional memory to operate).
> >>>
> >>> Netlink requests are handled under the new PF_MEM_NOWAIT when a swapper is
> >>> present. This ensures that the netlink socket will not block. User-space will
> >>> need to retry failed requests.
> >>>
> >>> The TCP receive path is handled under PF_MEMALLOC for SOCK_VMIO sockets.
> >>> This makes sure we do not block the critical socket, and that we do not
> >>> fail to process incomming data.
> >>>
> >>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >>> CC: Mike Christie <michaelc@cs.wisc.edu>
> >>> ---
> >>> drivers/scsi/iscsi_tcp.c | 103 +++++++++++++++++++++++++++++++-----
> >>> drivers/scsi/scsi_transport_iscsi.c | 23 +++++++-
> >>> include/scsi/libiscsi.h | 1
> >>> include/scsi/scsi_transport_iscsi.h | 2
> >>> 4 files changed, 113 insertions(+), 16 deletions(-)
> >>>
> >>> Index: linux-2.6/drivers/scsi/iscsi_tcp.c
> >>> ===================================================================
> >>> --- linux-2.6.orig/drivers/scsi/iscsi_tcp.c
> >>> +++ linux-2.6/drivers/scsi/iscsi_tcp.c
> >>> @@ -42,6 +42,7 @@
> >>> #include <scsi/scsi_host.h>
> >>> #include <scsi/scsi.h>
> >>> #include <scsi/scsi_transport_iscsi.h>
> >>> +#include <scsi/scsi_device.h>
> >>>
> >>> #include "iscsi_tcp.h"
> >>>
> >>> @@ -845,9 +846,13 @@ iscsi_tcp_data_recv(read_descriptor_t *r
> >>> int rc;
> >>> struct iscsi_conn *conn = rd_desc->arg.data;
> >>> struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
> >>> - int processed;
> >>> + int processed = 0;
> >>> char pad[ISCSI_PAD_LEN];
> >>> struct scatterlist sg;
> >>> + unsigned long pflags = current->flags;
> >>> +
> >>> + if (sk_has_vmio(tcp_conn->sock->sk))
> >>> + current->flags |= PF_MEMALLOC;
> >>>
> >> Is this too late or not needed or what is it for? This function gets run
> >> from the network layer's softirq and at this point we have a skbuff with
> >> data that we want to process. The iscsi layer also does not allocate
> >> memory for read or write IO in this path.
> >
> > I thought I found allocations in that path, lemme search...
> > found this:
> >
> > iscsi_tcp_data_recv()
> > iscsi_data_rescv()
> > iscsi_complete_pdu()
> > __iscsi_complete_pdu()
> > iscsi_recv_pdu()
> > alloc_skb( GFP_ATOMIC);
> >
>
> You are right that is for the netlink interface. Could we move the
> PF_MEMALLOC setting and clearing to iscsi_recv_pdu and and add it to
> iscsi_conn_error in scsi_transport_iscsi.c so that iscsi_iser and
> qla4xxx will have it set when they need it. I will send a patch for this
> along with a way to have the netlink sock vmio set for all iscsi drivers
> that need it.
I already have such a patch, look at:
http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/iscsi_vmio.patch
but what conditional do you want to use for PF_MEMALLOC, an
unconditional setting will be highly unpopular.
Hmm, perhaps you could key it of sk_has_vmio(nls)...
> >> I think we would want to set this flag at a lower level. Something
> >> closer to where the skbuf is allocated?
> >
> > Is that the skbuff you were talking about? If so, I'd need to carve a
> > path to pass the swapper information. I had that in a previous patch,
> > but that was large and ugly. I had to go carrying gfp_t flags all
> > through that call chain.
> >
>
> In my original post I was just concerned about the sk_buff that gets
> passed to the iscsi layer in iscsi_tcp_data_recv. I was wondering if the
> chunk of code in the network layer or network driver that allocated that
> skbuff needed to set PF_MEMALLOC.
(yeah I got that)
No, that got allocated because its a receive skb and !sk_vmio_socks(),
and got passed up because sk_has_vmio(iscsi_tcp_conn->sock->sk).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.
2006-09-14 20:35 ` Peter Zijlstra
@ 2006-09-14 20:46 ` Peter Zijlstra
2006-09-14 21:09 ` Mike Christie
2006-09-14 21:00 ` Mike Christie
1 sibling, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-14 20:46 UTC (permalink / raw)
To: Mike Christie
Cc: linux-mm, linux-kernel, netdev, Linus Torvalds, Andrew Morton,
David Miller, Rik van Riel, Daniel Phillips
On Thu, 2006-09-14 at 22:35 +0200, Peter Zijlstra wrote:
> On Thu, 2006-09-14 at 14:22 -0500, Mike Christie wrote:
> > > I thought I found allocations in that path, lemme search...
> > > found this:
> > >
> > > iscsi_tcp_data_recv()
> > > iscsi_data_rescv()
> > > iscsi_complete_pdu()
> > > __iscsi_complete_pdu()
> > > iscsi_recv_pdu()
> > > alloc_skb( GFP_ATOMIC);
> > >
> >
> > You are right that is for the netlink interface. Could we move the
> > PF_MEMALLOC setting and clearing to iscsi_recv_pdu and and add it to
> > iscsi_conn_error in scsi_transport_iscsi.c so that iscsi_iser and
> > qla4xxx will have it set when they need it. I will send a patch for this
> > along with a way to have the netlink sock vmio set for all iscsi drivers
> > that need it.
>
> I already have such a patch, look at:
> http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/iscsi_vmio.patch
>
> but what conditional do you want to use for PF_MEMALLOC, an
> unconditional setting will be highly unpopular.
>
> Hmm, perhaps you could key it of sk_has_vmio(nls)...
On second thought, not such a good idea, that will still be too course.
You only want to force feed stuff originating from
sk_has_vmio(iscsi_tcp_conn->sock->sk) connections, not all
connectections as soon as there is a swapper in the system.
In order to preserve that information you need extra state, abusing this
process flags is as good as propagating __GFP_EMERGENCY down the call
chain with extra gfp_t arguments, perhaps even better, since it will
make sure we catch all allocations.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.
2006-09-14 20:35 ` Peter Zijlstra
2006-09-14 20:46 ` Peter Zijlstra
@ 2006-09-14 21:00 ` Mike Christie
2006-09-14 21:03 ` Mike Christie
1 sibling, 1 reply; 37+ messages in thread
From: Mike Christie @ 2006-09-14 21:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mm, linux-kernel, netdev, Linus Torvalds, Andrew Morton,
David Miller, Rik van Riel, Daniel Phillips
Peter Zijlstra wrote:
> On Thu, 2006-09-14 at 14:22 -0500, Mike Christie wrote:
>> Peter Zijlstra wrote:
>>> On Wed, 2006-09-13 at 15:50 -0500, Mike Christie wrote:
>>>> Peter Zijlstra wrote:
>>>>> Implement sht->swapdev() for iSCSI. This method takes care of reserving
>>>>> the extra memory needed and marking all relevant sockets with SOCK_VMIO.
>>>>>
>>>>> When used for swapping, TCP socket creation is done under GFP_MEMALLOC and
>>>>> the TCP connect is done with SOCK_VMIO to ensure their success. Also the
>>>>> netlink userspace interface is marked SOCK_VMIO, this will ensure that even
>>>>> under pressure we can still communicate with the daemon (which runs as
>>>>> mlockall() and needs no additional memory to operate).
>>>>>
>>>>> Netlink requests are handled under the new PF_MEM_NOWAIT when a swapper is
>>>>> present. This ensures that the netlink socket will not block. User-space will
>>>>> need to retry failed requests.
>>>>>
>>>>> The TCP receive path is handled under PF_MEMALLOC for SOCK_VMIO sockets.
>>>>> This makes sure we do not block the critical socket, and that we do not
>>>>> fail to process incomming data.
>>>>>
>>>>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>>>>> CC: Mike Christie <michaelc@cs.wisc.edu>
>>>>> ---
>>>>> drivers/scsi/iscsi_tcp.c | 103 +++++++++++++++++++++++++++++++-----
>>>>> drivers/scsi/scsi_transport_iscsi.c | 23 +++++++-
>>>>> include/scsi/libiscsi.h | 1
>>>>> include/scsi/scsi_transport_iscsi.h | 2
>>>>> 4 files changed, 113 insertions(+), 16 deletions(-)
>>>>>
>>>>> Index: linux-2.6/drivers/scsi/iscsi_tcp.c
>>>>> ===================================================================
>>>>> --- linux-2.6.orig/drivers/scsi/iscsi_tcp.c
>>>>> +++ linux-2.6/drivers/scsi/iscsi_tcp.c
>>>>> @@ -42,6 +42,7 @@
>>>>> #include <scsi/scsi_host.h>
>>>>> #include <scsi/scsi.h>
>>>>> #include <scsi/scsi_transport_iscsi.h>
>>>>> +#include <scsi/scsi_device.h>
>>>>>
>>>>> #include "iscsi_tcp.h"
>>>>>
>>>>> @@ -845,9 +846,13 @@ iscsi_tcp_data_recv(read_descriptor_t *r
>>>>> int rc;
>>>>> struct iscsi_conn *conn = rd_desc->arg.data;
>>>>> struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
>>>>> - int processed;
>>>>> + int processed = 0;
>>>>> char pad[ISCSI_PAD_LEN];
>>>>> struct scatterlist sg;
>>>>> + unsigned long pflags = current->flags;
>>>>> +
>>>>> + if (sk_has_vmio(tcp_conn->sock->sk))
>>>>> + current->flags |= PF_MEMALLOC;
>>>>>
>>>> Is this too late or not needed or what is it for? This function gets run
>>>> from the network layer's softirq and at this point we have a skbuff with
>>>> data that we want to process. The iscsi layer also does not allocate
>>>> memory for read or write IO in this path.
>>> I thought I found allocations in that path, lemme search...
>>> found this:
>>>
>>> iscsi_tcp_data_recv()
>>> iscsi_data_rescv()
>>> iscsi_complete_pdu()
>>> __iscsi_complete_pdu()
>>> iscsi_recv_pdu()
>>> alloc_skb( GFP_ATOMIC);
>>>
>> You are right that is for the netlink interface. Could we move the
>> PF_MEMALLOC setting and clearing to iscsi_recv_pdu and and add it to
>> iscsi_conn_error in scsi_transport_iscsi.c so that iscsi_iser and
>> qla4xxx will have it set when they need it. I will send a patch for this
>> along with a way to have the netlink sock vmio set for all iscsi drivers
>> that need it.
>
> I already have such a patch, look at:
> http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/iscsi_vmio.patch
>
You are drowning me in patches :) I did not see that one. I was still
commenting on this patch :)
The new patch looks ok.
> but what conditional do you want to use for PF_MEMALLOC, an
> unconditional setting will be highly unpopular.
>
> Hmm, perhaps you could key it of sk_has_vmio(nls)...
Yes.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.
2006-09-14 21:00 ` Mike Christie
@ 2006-09-14 21:03 ` Mike Christie
2006-09-14 21:18 ` Peter Zijlstra
0 siblings, 1 reply; 37+ messages in thread
From: Mike Christie @ 2006-09-14 21:03 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mm, linux-kernel, netdev, Linus Torvalds, Andrew Morton,
David Miller, Rik van Riel, Daniel Phillips
Mike Christie wrote:
> Peter Zijlstra wrote:
>> On Thu, 2006-09-14 at 14:22 -0500, Mike Christie wrote:
>>> Peter Zijlstra wrote:
>>>> On Wed, 2006-09-13 at 15:50 -0500, Mike Christie wrote:
>>>>> Peter Zijlstra wrote:
>>>>>> Implement sht->swapdev() for iSCSI. This method takes care of reserving
>>>>>> the extra memory needed and marking all relevant sockets with SOCK_VMIO.
>>>>>>
>>>>>> When used for swapping, TCP socket creation is done under GFP_MEMALLOC and
>>>>>> the TCP connect is done with SOCK_VMIO to ensure their success. Also the
>>>>>> netlink userspace interface is marked SOCK_VMIO, this will ensure that even
>>>>>> under pressure we can still communicate with the daemon (which runs as
>>>>>> mlockall() and needs no additional memory to operate).
>>>>>>
>>>>>> Netlink requests are handled under the new PF_MEM_NOWAIT when a swapper is
>>>>>> present. This ensures that the netlink socket will not block. User-space will
>>>>>> need to retry failed requests.
>>>>>>
>>>>>> The TCP receive path is handled under PF_MEMALLOC for SOCK_VMIO sockets.
>>>>>> This makes sure we do not block the critical socket, and that we do not
>>>>>> fail to process incomming data.
>>>>>>
>>>>>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>>>>>> CC: Mike Christie <michaelc@cs.wisc.edu>
>>>>>> ---
>>>>>> drivers/scsi/iscsi_tcp.c | 103 +++++++++++++++++++++++++++++++-----
>>>>>> drivers/scsi/scsi_transport_iscsi.c | 23 +++++++-
>>>>>> include/scsi/libiscsi.h | 1
>>>>>> include/scsi/scsi_transport_iscsi.h | 2
>>>>>> 4 files changed, 113 insertions(+), 16 deletions(-)
>>>>>>
>>>>>> Index: linux-2.6/drivers/scsi/iscsi_tcp.c
>>>>>> ===================================================================
>>>>>> --- linux-2.6.orig/drivers/scsi/iscsi_tcp.c
>>>>>> +++ linux-2.6/drivers/scsi/iscsi_tcp.c
>>>>>> @@ -42,6 +42,7 @@
>>>>>> #include <scsi/scsi_host.h>
>>>>>> #include <scsi/scsi.h>
>>>>>> #include <scsi/scsi_transport_iscsi.h>
>>>>>> +#include <scsi/scsi_device.h>
>>>>>>
>>>>>> #include "iscsi_tcp.h"
>>>>>>
>>>>>> @@ -845,9 +846,13 @@ iscsi_tcp_data_recv(read_descriptor_t *r
>>>>>> int rc;
>>>>>> struct iscsi_conn *conn = rd_desc->arg.data;
>>>>>> struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
>>>>>> - int processed;
>>>>>> + int processed = 0;
>>>>>> char pad[ISCSI_PAD_LEN];
>>>>>> struct scatterlist sg;
>>>>>> + unsigned long pflags = current->flags;
>>>>>> +
>>>>>> + if (sk_has_vmio(tcp_conn->sock->sk))
>>>>>> + current->flags |= PF_MEMALLOC;
>>>>>>
>>>>> Is this too late or not needed or what is it for? This function gets run
>>>>> from the network layer's softirq and at this point we have a skbuff with
>>>>> data that we want to process. The iscsi layer also does not allocate
>>>>> memory for read or write IO in this path.
>>>> I thought I found allocations in that path, lemme search...
>>>> found this:
>>>>
>>>> iscsi_tcp_data_recv()
>>>> iscsi_data_rescv()
>>>> iscsi_complete_pdu()
>>>> __iscsi_complete_pdu()
>>>> iscsi_recv_pdu()
>>>> alloc_skb( GFP_ATOMIC);
>>>>
>>> You are right that is for the netlink interface. Could we move the
>>> PF_MEMALLOC setting and clearing to iscsi_recv_pdu and and add it to
>>> iscsi_conn_error in scsi_transport_iscsi.c so that iscsi_iser and
>>> qla4xxx will have it set when they need it. I will send a patch for this
>>> along with a way to have the netlink sock vmio set for all iscsi drivers
>>> that need it.
>> I already have such a patch, look at:
>> http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/iscsi_vmio.patch
>>
>
> You are drowning me in patches :) I did not see that one. I was still
> commenting on this patch :)
>
> The new patch looks ok.
>
Oh, I think you need a sock_put to go with netlink lookup (lookup does a
hold).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.
2006-09-14 20:46 ` Peter Zijlstra
@ 2006-09-14 21:09 ` Mike Christie
2006-09-14 21:28 ` Mike Christie
0 siblings, 1 reply; 37+ messages in thread
From: Mike Christie @ 2006-09-14 21:09 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mm, linux-kernel, netdev, Linus Torvalds, Andrew Morton,
David Miller, Rik van Riel, Daniel Phillips
Peter Zijlstra wrote:
> On Thu, 2006-09-14 at 22:35 +0200, Peter Zijlstra wrote:
>> On Thu, 2006-09-14 at 14:22 -0500, Mike Christie wrote:
>
>>>> I thought I found allocations in that path, lemme search...
>>>> found this:
>>>>
>>>> iscsi_tcp_data_recv()
>>>> iscsi_data_rescv()
>>>> iscsi_complete_pdu()
>>>> __iscsi_complete_pdu()
>>>> iscsi_recv_pdu()
>>>> alloc_skb( GFP_ATOMIC);
>>>>
>>> You are right that is for the netlink interface. Could we move the
>>> PF_MEMALLOC setting and clearing to iscsi_recv_pdu and and add it to
>>> iscsi_conn_error in scsi_transport_iscsi.c so that iscsi_iser and
>>> qla4xxx will have it set when they need it. I will send a patch for this
>>> along with a way to have the netlink sock vmio set for all iscsi drivers
>>> that need it.
>> I already have such a patch, look at:
>> http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/iscsi_vmio.patch
>>
>> but what conditional do you want to use for PF_MEMALLOC, an
>> unconditional setting will be highly unpopular.
>>
>> Hmm, perhaps you could key it of sk_has_vmio(nls)...
>
> On second thought, not such a good idea, that will still be too course.
> You only want to force feed stuff originating from
> sk_has_vmio(iscsi_tcp_conn->sock->sk) connections, not all
> connectections as soon as there is a swapper in the system.
>
You can move the iscsi_session->swapper field to the iscsi_cls_session
and have iscsi_swapdev take a iscsi_cls_session and set that flag.
iscsi_recv_pdu and iscsi_conn_error and all the llds can then access
this bit.
> In order to preserve that information you need extra state, abusing this
> process flags is as good as propagating __GFP_EMERGENCY down the call
> chain with extra gfp_t arguments, perhaps even better, since it will
> make sure we catch all allocations.
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.
2006-09-14 21:03 ` Mike Christie
@ 2006-09-14 21:18 ` Peter Zijlstra
0 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2006-09-14 21:18 UTC (permalink / raw)
To: Mike Christie
Cc: linux-mm, linux-kernel, netdev, Linus Torvalds, Andrew Morton,
David Miller, Rik van Riel, Daniel Phillips
On Thu, 2006-09-14 at 16:03 -0500, Mike Christie wrote:
> Mike Christie wrote:
> > Peter Zijlstra wrote:
> >> On Thu, 2006-09-14 at 14:22 -0500, Mike Christie wrote:
> >>> Peter Zijlstra wrote:
> >>>> On Wed, 2006-09-13 at 15:50 -0500, Mike Christie wrote:
> >>>>> Peter Zijlstra wrote:
> >>>>>> Implement sht->swapdev() for iSCSI. This method takes care of reserving
> >>>>>> the extra memory needed and marking all relevant sockets with SOCK_VMIO.
> >>>>>>
> >>>>>> When used for swapping, TCP socket creation is done under GFP_MEMALLOC and
> >>>>>> the TCP connect is done with SOCK_VMIO to ensure their success. Also the
> >>>>>> netlink userspace interface is marked SOCK_VMIO, this will ensure that even
> >>>>>> under pressure we can still communicate with the daemon (which runs as
> >>>>>> mlockall() and needs no additional memory to operate).
> >>>>>>
> >>>>>> Netlink requests are handled under the new PF_MEM_NOWAIT when a swapper is
> >>>>>> present. This ensures that the netlink socket will not block. User-space will
> >>>>>> need to retry failed requests.
> >>>>>>
> >>>>>> The TCP receive path is handled under PF_MEMALLOC for SOCK_VMIO sockets.
> >>>>>> This makes sure we do not block the critical socket, and that we do not
> >>>>>> fail to process incomming data.
> >>>>>>
> >>>>>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >>>>>> CC: Mike Christie <michaelc@cs.wisc.edu>
> >>>>>> ---
> >>>>>> drivers/scsi/iscsi_tcp.c | 103 +++++++++++++++++++++++++++++++-----
> >>>>>> drivers/scsi/scsi_transport_iscsi.c | 23 +++++++-
> >>>>>> include/scsi/libiscsi.h | 1
> >>>>>> include/scsi/scsi_transport_iscsi.h | 2
> >>>>>> 4 files changed, 113 insertions(+), 16 deletions(-)
> >>>>>>
> >>>>>> Index: linux-2.6/drivers/scsi/iscsi_tcp.c
> >>>>>> ===================================================================
> >>>>>> --- linux-2.6.orig/drivers/scsi/iscsi_tcp.c
> >>>>>> +++ linux-2.6/drivers/scsi/iscsi_tcp.c
> >>>>>> @@ -42,6 +42,7 @@
> >>>>>> #include <scsi/scsi_host.h>
> >>>>>> #include <scsi/scsi.h>
> >>>>>> #include <scsi/scsi_transport_iscsi.h>
> >>>>>> +#include <scsi/scsi_device.h>
> >>>>>>
> >>>>>> #include "iscsi_tcp.h"
> >>>>>>
> >>>>>> @@ -845,9 +846,13 @@ iscsi_tcp_data_recv(read_descriptor_t *r
> >>>>>> int rc;
> >>>>>> struct iscsi_conn *conn = rd_desc->arg.data;
> >>>>>> struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
> >>>>>> - int processed;
> >>>>>> + int processed = 0;
> >>>>>> char pad[ISCSI_PAD_LEN];
> >>>>>> struct scatterlist sg;
> >>>>>> + unsigned long pflags = current->flags;
> >>>>>> +
> >>>>>> + if (sk_has_vmio(tcp_conn->sock->sk))
> >>>>>> + current->flags |= PF_MEMALLOC;
> >>>>>>
> >>>>> Is this too late or not needed or what is it for? This function gets run
> >>>>> from the network layer's softirq and at this point we have a skbuff with
> >>>>> data that we want to process. The iscsi layer also does not allocate
> >>>>> memory for read or write IO in this path.
> >>>> I thought I found allocations in that path, lemme search...
> >>>> found this:
> >>>>
> >>>> iscsi_tcp_data_recv()
> >>>> iscsi_data_rescv()
> >>>> iscsi_complete_pdu()
> >>>> __iscsi_complete_pdu()
> >>>> iscsi_recv_pdu()
> >>>> alloc_skb( GFP_ATOMIC);
> >>>>
> >>> You are right that is for the netlink interface. Could we move the
> >>> PF_MEMALLOC setting and clearing to iscsi_recv_pdu and and add it to
> >>> iscsi_conn_error in scsi_transport_iscsi.c so that iscsi_iser and
> >>> qla4xxx will have it set when they need it. I will send a patch for this
> >>> along with a way to have the netlink sock vmio set for all iscsi drivers
> >>> that need it.
> >> I already have such a patch, look at:
> >> http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/iscsi_vmio.patch
> >>
> >
> > You are drowning me in patches :) I did not see that one. I was still
> > commenting on this patch :)
> >
> > The new patch looks ok.
> >
>
> Oh, I think you need a sock_put to go with netlink lookup (lookup does a
> hold).
D'0h again, I'd forgotten I'd used it there too.
hit refresh :-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 20/20] iscsi: support for swapping over iSCSI.
2006-09-14 21:09 ` Mike Christie
@ 2006-09-14 21:28 ` Mike Christie
0 siblings, 0 replies; 37+ messages in thread
From: Mike Christie @ 2006-09-14 21:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mm, linux-kernel, netdev, Linus Torvalds, Andrew Morton,
David Miller, Rik van Riel, Daniel Phillips
Mike Christie wrote:
> Peter Zijlstra wrote:
>> On Thu, 2006-09-14 at 22:35 +0200, Peter Zijlstra wrote:
>>> On Thu, 2006-09-14 at 14:22 -0500, Mike Christie wrote:
>>>>> I thought I found allocations in that path, lemme search...
>>>>> found this:
>>>>>
>>>>> iscsi_tcp_data_recv()
>>>>> iscsi_data_rescv()
>>>>> iscsi_complete_pdu()
>>>>> __iscsi_complete_pdu()
>>>>> iscsi_recv_pdu()
>>>>> alloc_skb( GFP_ATOMIC);
>>>>>
>>>> You are right that is for the netlink interface. Could we move the
>>>> PF_MEMALLOC setting and clearing to iscsi_recv_pdu and and add it to
>>>> iscsi_conn_error in scsi_transport_iscsi.c so that iscsi_iser and
>>>> qla4xxx will have it set when they need it. I will send a patch for this
>>>> along with a way to have the netlink sock vmio set for all iscsi drivers
>>>> that need it.
>>> I already have such a patch, look at:
>>> http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/iscsi_vmio.patch
>>>
>>> but what conditional do you want to use for PF_MEMALLOC, an
>>> unconditional setting will be highly unpopular.
>>>
>>> Hmm, perhaps you could key it of sk_has_vmio(nls)...
>> On second thought, not such a good idea, that will still be too course.
>> You only want to force feed stuff originating from
>> sk_has_vmio(iscsi_tcp_conn->sock->sk) connections, not all
>> connectections as soon as there is a swapper in the system.
>>
>
> You can move the iscsi_session->swapper field to the iscsi_cls_session
> and have iscsi_swapdev take a iscsi_cls_session and set that flag.
> iscsi_recv_pdu and iscsi_conn_error and all the llds can then access
> this bit.
>
>> In order to preserve that information you need extra state, abusing this
>> process flags is as good as propagating __GFP_EMERGENCY down the call
>> chain with extra gfp_t arguments, perhaps even better, since it will
>> make sure we catch all allocations.
>>
Oh yeah, on the send side we also allocate some memory for the netlink
interface if there is a connection error (iscsi_conn_failure ->
iscsi_conn_error). And when that is called from the transmit side we can
change the GFP_ATOMICs to GFP_NOIOs since we have process context.
So I am just saying we need to set that flag in a couple more places (if
you set it in iscsi_conn_error if iscsi_cls_session->swapper is set then
don't worry about it), and that I need to change iscsi_conn_failure and
iscsi_conn_error to take a gfp_t as an argument (or do a in_interrupt or
something) so we can use GFP_NOIO in the transmit code.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2006-09-14 21:28 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-12 15:25 [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Peter Zijlstra
2006-09-12 15:25 ` [PATCH 20/20] iscsi: support for swapping over iSCSI Peter Zijlstra
2006-09-13 20:50 ` Mike Christie
2006-09-14 6:17 ` Peter Zijlstra
2006-09-14 19:22 ` Mike Christie
2006-09-14 20:35 ` Peter Zijlstra
2006-09-14 20:46 ` Peter Zijlstra
2006-09-14 21:09 ` Mike Christie
2006-09-14 21:28 ` Mike Christie
2006-09-14 21:00 ` Mike Christie
2006-09-14 21:03 ` Mike Christie
2006-09-14 21:18 ` Peter Zijlstra
2006-09-12 15:25 ` [PATCH 10/20] mm: block device swap notification Peter Zijlstra
2006-09-12 15:25 ` [PATCH 18/20] netlink: add SOCK_VMIO support to AF_NETLINK Peter Zijlstra
2006-09-12 15:25 ` [PATCH 02/20] net: vm deadlock avoidance core Peter Zijlstra
2006-09-12 15:25 ` [PATCH 16/20] iscsi: add session context to ep_connect Peter Zijlstra
2006-09-12 15:25 ` [PATCH 08/20] nfs: enable swap on NFS Peter Zijlstra
2006-09-12 15:25 ` [PATCH 19/20] mm: a process flags to avoid blocking allocations Peter Zijlstra
2006-09-12 15:25 ` [PATCH 13/20] nbd: use swapdev hook to make swap deadlock free Peter Zijlstra
2006-09-12 15:25 ` [PATCH 01/20] mm: serialize access to min_free_kbytes Peter Zijlstra
2006-09-12 15:25 ` [PATCH 11/20] nbd: request_fn fixup Peter Zijlstra
2006-09-12 22:47 ` Jens Axboe
2006-09-13 0:21 ` Jeff Garzik
2006-09-13 6:14 ` Jens Axboe
2006-09-12 15:25 ` [PATCH 04/20] mm: methods for teaching filesystems about PG_swapcache pages Peter Zijlstra
2006-09-12 15:25 ` [PATCH 15/20] iscsi: kernel side tcp connect Peter Zijlstra
2006-09-12 15:25 ` [PATCH 05/20] uml: rename arch/um remove_mapping() Peter Zijlstra
2006-09-12 15:25 ` [PATCH 09/20] nfs: make swap on NFS robust Peter Zijlstra
2006-09-12 15:25 ` [PATCH 14/20] uml: enable scsi and add iscsi config Peter Zijlstra
2006-09-12 15:25 ` [PATCH 03/20] mm: add support for non block device backed swap files Peter Zijlstra
2006-09-12 15:25 ` [PATCH 12/20] nbd: limit blk_queue Peter Zijlstra
2006-09-12 22:47 ` Jens Axboe
2006-09-12 15:25 ` [PATCH 06/20] nfs: teach the NFS client how to treat PG_swapcache pages Peter Zijlstra
2006-09-12 15:25 ` [PATCH 07/20] nfs: add a comment explaining the use of PG_private in the NFS client Peter Zijlstra
2006-09-12 15:25 ` [PATCH 17/20] scsi: propagate the swapdev hook into the scsi stack Peter Zijlstra
2006-09-12 16:37 ` [PATCH 00/20] vm deadlock avoidance for NFS, NBD and iSCSI (take 7) Linus Torvalds
2006-09-12 23:58 ` Nate Diller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox