[RFC v1 0/2] mm: SLUB Freelist randomization

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC v1 0/2] mm: SLUB Freelist randomization

Thomas Garnier
This is RFC v1 for the SLUB Freelist randomization.

***Background:
This proposal follows the previous SLAB Freelist patch submitted to next.
It resuses parts of previous implementation and keep a similar approach.

The kernel heap allocators are using a sequential freelist making their
allocation predictable. This predictability makes kernel heap overflow
easier to exploit. An attacker can careful prepare the kernel heap to
control the following chunk overflowed.

For example these attacks exploit the predictability of the heap:
 - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
 - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)

***Problems that needed solving:
 - Randomize the Freelist used in the SLUB allocator.
 - Ensure good performance to encourage usage.
 - Get best entropy in early boot stage.

***Parts:
 - 01/02 Reorganize the SLAB Freelist randomization to share elements
   with the SLUB implementation.
 - 02/02 The SLUB Freelist randomization implementation. Similar approach
   than the SLAB but tailored to the singled freelist used in SLUB.

***Performance data (no major changes):

slab_test, before:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 67 cycles kfree -> 101 cycles
10000 times kmalloc(16) -> 68 cycles kfree -> 109 cycles
10000 times kmalloc(32) -> 76 cycles kfree -> 119 cycles
10000 times kmalloc(64) -> 88 cycles kfree -> 114 cycles
10000 times kmalloc(128) -> 100 cycles kfree -> 122 cycles
10000 times kmalloc(256) -> 128 cycles kfree -> 149 cycles
10000 times kmalloc(512) -> 108 cycles kfree -> 152 cycles
10000 times kmalloc(1024) -> 112 cycles kfree -> 158 cycles
10000 times kmalloc(2048) -> 161 cycles kfree -> 208 cycles
10000 times kmalloc(4096) -> 231 cycles kfree -> 239 cycles
10000 times kmalloc(8192) -> 341 cycles kfree -> 270 cycles
10000 times kmalloc(16384) -> 481 cycles kfree -> 323 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 90 cycles
10000 times kmalloc(16)/kfree -> 89 cycles
10000 times kmalloc(32)/kfree -> 88 cycles
10000 times kmalloc(64)/kfree -> 88 cycles
10000 times kmalloc(128)/kfree -> 94 cycles
10000 times kmalloc(256)/kfree -> 87 cycles
10000 times kmalloc(512)/kfree -> 91 cycles
10000 times kmalloc(1024)/kfree -> 90 cycles
10000 times kmalloc(2048)/kfree -> 90 cycles
10000 times kmalloc(4096)/kfree -> 90 cycles
10000 times kmalloc(8192)/kfree -> 90 cycles
10000 times kmalloc(16384)/kfree -> 642 cycles

After:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 60 cycles kfree -> 74 cycles
10000 times kmalloc(16) -> 63 cycles kfree -> 78 cycles
10000 times kmalloc(32) -> 72 cycles kfree -> 85 cycles
10000 times kmalloc(64) -> 91 cycles kfree -> 99 cycles
10000 times kmalloc(128) -> 112 cycles kfree -> 109 cycles
10000 times kmalloc(256) -> 127 cycles kfree -> 120 cycles
10000 times kmalloc(512) -> 125 cycles kfree -> 121 cycles
10000 times kmalloc(1024) -> 128 cycles kfree -> 125 cycles
10000 times kmalloc(2048) -> 167 cycles kfree -> 141 cycles
10000 times kmalloc(4096) -> 249 cycles kfree -> 174 cycles
10000 times kmalloc(8192) -> 377 cycles kfree -> 225 cycles
10000 times kmalloc(16384) -> 459 cycles kfree -> 247 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 72 cycles
10000 times kmalloc(16)/kfree -> 74 cycles
10000 times kmalloc(32)/kfree -> 71 cycles
10000 times kmalloc(64)/kfree -> 75 cycles
10000 times kmalloc(128)/kfree -> 71 cycles
10000 times kmalloc(256)/kfree -> 72 cycles
10000 times kmalloc(512)/kfree -> 72 cycles
10000 times kmalloc(1024)/kfree -> 73 cycles
10000 times kmalloc(2048)/kfree -> 73 cycles
10000 times kmalloc(4096)/kfree -> 72 cycles
10000 times kmalloc(8192)/kfree -> 72 cycles
10000 times kmalloc(16384)/kfree -> 546 cycles

Kernbench, before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 101.873 (1.16069)
User Time 1045.22 (1.60447)
System Time 88.969 (0.559195)
Percent CPU 1112.9 (13.8279)
Context Switches 189140 (2282.15)
Sleeps 99008.6 (768.091)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.47 (0.562732)
User Time 1045.3 (1.34263)
System Time 88.311 (0.342554)
Percent CPU 1105.8 (6.49444)
Context Switches 189081 (2355.78)
Sleeps 99231.5 (800.358)

Reply | Threaded
Open this post in threaded view
|

[RFC v1 1/2] mm: Reorganize SLAB freelist randomization

Thomas Garnier
This commit reorganizes the previous SLAB freelist randomization to
prepare for the SLUB implementation. It moves functions that will be
shared to slab_common. It also move the definition of freelist_idx_t in
the slab_def header so a similar type can be used for all common
functions.

Signed-off-by: Thomas Garnier <[hidden email]>
---
Based on next-20160517
---
 include/linux/slab_def.h | 11 +++++++-
 mm/slab.c                | 66 +-----------------------------------------------
 mm/slab.h                | 16 ++++++++++++
 mm/slab_common.c         | 50 ++++++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+), 66 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 8694f7a..e05a871 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -3,6 +3,15 @@
 
 #include <linux/reciprocal_div.h>
 
+#define FREELIST_BYTE_INDEX (((PAGE_SIZE >> BITS_PER_BYTE) \
+ <= SLAB_OBJ_MIN_SIZE) ? 1 : 0)
+
+#if FREELIST_BYTE_INDEX
+typedef unsigned char freelist_idx_t;
+#else
+typedef unsigned short freelist_idx_t;
+#endif
+
 /*
  * Definitions unique to the original Linux SLAB allocator.
  */
@@ -81,7 +90,7 @@ struct kmem_cache {
 #endif
 
 #ifdef CONFIG_SLAB_FREELIST_RANDOM
- void *random_seq;
+ freelist_idx_t *random_seq;
 #endif
 
  struct kmem_cache_node *node[MAX_NUMNODES];
diff --git a/mm/slab.c b/mm/slab.c
index cc8bbc1..a25c7bb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -157,15 +157,6 @@
 #define ARCH_KMALLOC_FLAGS SLAB_HWCACHE_ALIGN
 #endif
 
-#define FREELIST_BYTE_INDEX (((PAGE_SIZE >> BITS_PER_BYTE) \
- <= SLAB_OBJ_MIN_SIZE) ? 1 : 0)
-
-#if FREELIST_BYTE_INDEX
-typedef unsigned char freelist_idx_t;
-#else
-typedef unsigned short freelist_idx_t;
-#endif
-
 #define SLAB_OBJ_MAX_NUM ((1 << sizeof(freelist_idx_t) * BITS_PER_BYTE) - 1)
 
 /*
@@ -1236,61 +1227,6 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
  }
 }
 
-#ifdef CONFIG_SLAB_FREELIST_RANDOM
-static void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
- size_t count)
-{
- size_t i;
- unsigned int rand;
-
- for (i = 0; i < count; i++)
- list[i] = i;
-
- /* Fisher-Yates shuffle */
- for (i = count - 1; i > 0; i--) {
- rand = prandom_u32_state(state);
- rand %= (i + 1);
- swap(list[i], list[rand]);
- }
-}
-
-/* Create a random sequence per cache */
-static int cache_random_seq_create(struct kmem_cache *cachep, gfp_t gfp)
-{
- unsigned int seed, count = cachep->num;
- struct rnd_state state;
-
- if (count < 2)
- return 0;
-
- /* If it fails, we will just use the global lists */
- cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), gfp);
- if (!cachep->random_seq)
- return -ENOMEM;
-
- /* Get best entropy at this stage */
- get_random_bytes_arch(&seed, sizeof(seed));
- prandom_seed_state(&state, seed);
-
- freelist_randomize(&state, cachep->random_seq, count);
- return 0;
-}
-
-/* Destroy the per-cache random freelist sequence */
-static void cache_random_seq_destroy(struct kmem_cache *cachep)
-{
- kfree(cachep->random_seq);
- cachep->random_seq = NULL;
-}
-#else
-static inline int cache_random_seq_create(struct kmem_cache *cachep, gfp_t gfp)
-{
- return 0;
-}
-static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
-#endif /* CONFIG_SLAB_FREELIST_RANDOM */
-
-
 /*
  * Initialisation.  Called after the page allocator have been initialised and
  * before smp_init().
@@ -3979,7 +3915,7 @@ static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp)
  int shared = 0;
  int batchcount = 0;
 
- err = cache_random_seq_create(cachep, gfp);
+ err = cache_random_seq_create(cachep, cachep->num, gfp);
  if (err)
  goto end;
 
diff --git a/mm/slab.h b/mm/slab.h
index dedb1a9..2c33bf3 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -42,6 +42,7 @@ struct kmem_cache {
 #include <linux/kmemcheck.h>
 #include <linux/kasan.h>
 #include <linux/kmemleak.h>
+#include <linux/random.h>
 
 /*
  * State of the slab allocator.
@@ -464,4 +465,19 @@ int memcg_slab_show(struct seq_file *m, void *p);
 
 void ___cache_free(struct kmem_cache *cache, void *x, unsigned long addr);
 
+#ifdef CONFIG_SLAB_FREELIST_RANDOM
+void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
+ size_t count);
+int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count,
+ gfp_t gfp);
+void cache_random_seq_destroy(struct kmem_cache *cachep);
+#else
+static inline int cache_random_seq_create(struct kmem_cache *cachep,
+ unsigned int count, gfp_t gfp)
+{
+ return 0;
+}
+static inline void cache_random_seq_destroy(struct kmem_cache *cachep) { }
+#endif /* CONFIG_SLAB_FREELIST_RANDOM */
+
 #endif /* MM_SLAB_H */
diff --git a/mm/slab_common.c b/mm/slab_common.c
index a65dad7..c2cc48c 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1142,6 +1142,56 @@ int memcg_slab_show(struct seq_file *m, void *p)
 }
 #endif
 
+#ifdef CONFIG_SLAB_FREELIST_RANDOM
+/* Randomize a generic freelist */
+void freelist_randomize(struct rnd_state *state, freelist_idx_t *list,
+ size_t count)
+{
+ size_t i;
+ unsigned int rand;
+
+ for (i = 0; i < count; i++)
+ list[i] = i;
+
+ /* Fisher-Yates shuffle */
+ for (i = count - 1; i > 0; i--) {
+ rand = prandom_u32_state(state);
+ rand %= (i + 1);
+ swap(list[i], list[rand]);
+ }
+}
+
+/* Create a random sequence per cache */
+int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count,
+    gfp_t gfp)
+{
+ unsigned int seed;
+ struct rnd_state state;
+
+ if (count < 2)
+ return 0;
+
+ /* If it fails, we will just use the global lists */
+ cachep->random_seq = kcalloc(count, sizeof(freelist_idx_t), gfp);
+ if (!cachep->random_seq)
+ return -ENOMEM;
+
+ /* Get best entropy at this stage */
+ get_random_bytes_arch(&seed, sizeof(seed));
+ prandom_seed_state(&state, seed);
+
+ freelist_randomize(&state, cachep->random_seq, count);
+ return 0;
+}
+
+/* Destroy the per-cache random freelist sequence */
+void cache_random_seq_destroy(struct kmem_cache *cachep)
+{
+ kfree(cachep->random_seq);
+ cachep->random_seq = NULL;
+}
+#endif /* CONFIG_SLAB_FREELIST_RANDOM */
+
 /*
  * slabinfo_op - iterator that generates /proc/slabinfo
  *
--
2.8.0.rc3.226.g39d4020

Reply | Threaded
Open this post in threaded view
|

[RFC v1 2/2] mm: SLUB Freelist randomization

Thomas Garnier
In reply to this post by Thomas Garnier
Implements Freelist randomization for the SLUB allocator. It was
previous implemented for the SLAB allocator. Both use the same
configuration option (CONFIG_SLAB_FREELIST_RANDOM).

The list is randomized during initialization of a new set of pages. The
order on different freelist sizes is pre-computed at boot for
performance. Each kmem_cache has its own randomized freelist. This
security feature reduces the predictability of the kernel SLUB allocator
against heap overflows rendering attacks much less stable.

For example these attacks exploit the predictability of the heap:
 - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU)
 - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95)

To generate entropy, we use get_random_bytes_arch because 0 bits of
entropy is available in the boot stage.  In the worse case this function
will fallback to the get_random_bytes sub API.

Performance results highlighted no major changes:

slab_test, before:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 67 cycles kfree -> 101 cycles
10000 times kmalloc(16) -> 68 cycles kfree -> 109 cycles
10000 times kmalloc(32) -> 76 cycles kfree -> 119 cycles
10000 times kmalloc(64) -> 88 cycles kfree -> 114 cycles
10000 times kmalloc(128) -> 100 cycles kfree -> 122 cycles
10000 times kmalloc(256) -> 128 cycles kfree -> 149 cycles
10000 times kmalloc(512) -> 108 cycles kfree -> 152 cycles
10000 times kmalloc(1024) -> 112 cycles kfree -> 158 cycles
10000 times kmalloc(2048) -> 161 cycles kfree -> 208 cycles
10000 times kmalloc(4096) -> 231 cycles kfree -> 239 cycles
10000 times kmalloc(8192) -> 341 cycles kfree -> 270 cycles
10000 times kmalloc(16384) -> 481 cycles kfree -> 323 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 90 cycles
10000 times kmalloc(16)/kfree -> 89 cycles
10000 times kmalloc(32)/kfree -> 88 cycles
10000 times kmalloc(64)/kfree -> 88 cycles
10000 times kmalloc(128)/kfree -> 94 cycles
10000 times kmalloc(256)/kfree -> 87 cycles
10000 times kmalloc(512)/kfree -> 91 cycles
10000 times kmalloc(1024)/kfree -> 90 cycles
10000 times kmalloc(2048)/kfree -> 90 cycles
10000 times kmalloc(4096)/kfree -> 90 cycles
10000 times kmalloc(8192)/kfree -> 90 cycles
10000 times kmalloc(16384)/kfree -> 642 cycles

After:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 60 cycles kfree -> 74 cycles
10000 times kmalloc(16) -> 63 cycles kfree -> 78 cycles
10000 times kmalloc(32) -> 72 cycles kfree -> 85 cycles
10000 times kmalloc(64) -> 91 cycles kfree -> 99 cycles
10000 times kmalloc(128) -> 112 cycles kfree -> 109 cycles
10000 times kmalloc(256) -> 127 cycles kfree -> 120 cycles
10000 times kmalloc(512) -> 125 cycles kfree -> 121 cycles
10000 times kmalloc(1024) -> 128 cycles kfree -> 125 cycles
10000 times kmalloc(2048) -> 167 cycles kfree -> 141 cycles
10000 times kmalloc(4096) -> 249 cycles kfree -> 174 cycles
10000 times kmalloc(8192) -> 377 cycles kfree -> 225 cycles
10000 times kmalloc(16384) -> 459 cycles kfree -> 247 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 72 cycles
10000 times kmalloc(16)/kfree -> 74 cycles
10000 times kmalloc(32)/kfree -> 71 cycles
10000 times kmalloc(64)/kfree -> 75 cycles
10000 times kmalloc(128)/kfree -> 71 cycles
10000 times kmalloc(256)/kfree -> 72 cycles
10000 times kmalloc(512)/kfree -> 72 cycles
10000 times kmalloc(1024)/kfree -> 73 cycles
10000 times kmalloc(2048)/kfree -> 73 cycles
10000 times kmalloc(4096)/kfree -> 72 cycles
10000 times kmalloc(8192)/kfree -> 72 cycles
10000 times kmalloc(16384)/kfree -> 546 cycles

Kernbench, before:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 101.873 (1.16069)
User Time 1045.22 (1.60447)
System Time 88.969 (0.559195)
Percent CPU 1112.9 (13.8279)
Context Switches 189140 (2282.15)
Sleeps 99008.6 (768.091)

After:

Average Optimal load -j 12 Run (std deviation):
Elapsed Time 102.47 (0.562732)
User Time 1045.3 (1.34263)
System Time 88.311 (0.342554)
Percent CPU 1105.8 (6.49444)
Context Switches 189081 (2355.78)
Sleeps 99231.5 (800.358)

Signed-off-by: Thomas Garnier <[hidden email]>
---
Based on next-20160517
---
 include/linux/slub_def.h |   8 +++
 init/Kconfig             |   4 +-
 mm/slub.c                | 135 ++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 138 insertions(+), 9 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 665cd0c..22d487e 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -56,6 +56,9 @@ struct kmem_cache_order_objects {
  unsigned long x;
 };
 
+/* Index used for freelist randomization */
+typedef unsigned int freelist_idx_t;
+
 /*
  * Slab cache management.
  */
@@ -99,6 +102,11 @@ struct kmem_cache {
  */
  int remote_node_defrag_ratio;
 #endif
+
+#ifdef CONFIG_SLAB_FREELIST_RANDOM
+ freelist_idx_t *random_seq;
+#endif
+
  struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/init/Kconfig b/init/Kconfig
index 7b82f3f..4590629 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1784,10 +1784,10 @@ endchoice
 
 config SLAB_FREELIST_RANDOM
  default n
- depends on SLAB
+ depends on SLAB || SLUB
  bool "SLAB freelist randomization"
  help
-  Randomizes the freelist order used on creating new SLABs. This
+  Randomizes the freelist order used on creating new pages. This
   security feature reduces the predictability of the kernel slab
   allocator against heap overflows.
 
diff --git a/mm/slub.c b/mm/slub.c
index 538c858..d89780a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1405,6 +1405,111 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
  return page;
 }
 
+#ifdef CONFIG_SLAB_FREELIST_RANDOM
+/* Pre-initialize the random sequence cache */
+static int init_cache_random_seq(struct kmem_cache *s)
+{
+ int err;
+ unsigned long i, count = oo_objects(s->oo);
+
+ err = cache_random_seq_create(s, count, GFP_KERNEL);
+ if (err) {
+ pr_err("SLUB: Unable to initialize free list for %s\n",
+ s->name);
+ return err;
+ }
+
+ /* Transform to an offset on the set of pages */
+ if (s->random_seq) {
+ for (i = 0; i < count; i++)
+ s->random_seq[i] *= s->size;
+ }
+ return 0;
+}
+
+/* Initialize each random sequence freelist per cache */
+static void __init init_freelist_randomization(void)
+{
+ struct kmem_cache *s;
+
+ mutex_lock(&slab_mutex);
+
+ list_for_each_entry(s, &slab_caches, list)
+ init_cache_random_seq(s);
+
+ mutex_unlock(&slab_mutex);
+}
+
+/* Get the next entry on the pre-computed freelist randomized */
+static void *next_freelist_entry(struct kmem_cache *s, struct page *page,
+ unsigned long *pos, void *start,
+ unsigned long page_limit,
+ unsigned long freelist_count)
+{
+ freelist_idx_t idx;
+
+ /*
+ * If the target page allocation failed, the number of objects on the
+ * page might be smaller than the usual size defined by the cache.
+ */
+ do {
+ idx = s->random_seq[*pos];
+ *pos += 1;
+ if (*pos >= freelist_count)
+ *pos = 0;
+ } while (unlikely(idx >= page_limit));
+
+ return (char *)start + idx;
+}
+
+/* Shuffle the single linked freelist based on a random pre-computed sequence */
+static bool shuffle_freelist(struct kmem_cache *s, struct page *page)
+{
+ void *start;
+ void *cur;
+ void *next;
+ unsigned long idx, pos, page_limit, freelist_count;
+
+ if (page->objects < 2 || !s->random_seq)
+ return false;
+
+ /* Use best entropy available to define a random start */
+ freelist_count = oo_objects(s->oo);
+ get_random_bytes_arch(&pos, sizeof(pos));
+ pos %= freelist_count;
+
+ page_limit = page->objects * s->size;
+ start = fixup_red_left(s, page_address(page));
+
+ /* First entry is used as the base of the freelist */
+ cur = next_freelist_entry(s, page, &pos, start, page_limit,
+ freelist_count);
+ page->freelist = cur;
+
+ for (idx = 1; idx < page->objects; idx++) {
+ setup_object(s, page, cur);
+ next = next_freelist_entry(s, page, &pos, start, page_limit,
+ freelist_count);
+ set_freepointer(s, cur, next);
+ cur = next;
+ }
+ setup_object(s, page, cur);
+ set_freepointer(s, cur, NULL);
+
+ return true;
+}
+#else
+static inline int init_cache_random_seq(struct kmem_cache *s)
+{
+ return 0;
+}
+static inline void init_freelist_randomization(void) { }
+static inline bool shuffle_freelist(struct kmem_cache *s, struct page *page)
+{
+ return false;
+}
+#endif /* CONFIG_SLAB_FREELIST_RANDOM */
+
 static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 {
  struct page *page;
@@ -1412,6 +1517,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
  gfp_t alloc_gfp;
  void *start, *p;
  int idx, order;
+ bool shuffle;
 
  flags &= gfp_allowed_mask;
 
@@ -1473,15 +1579,19 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 
  kasan_poison_slab(page);
 
- for_each_object_idx(p, idx, s, start, page->objects) {
- setup_object(s, page, p);
- if (likely(idx < page->objects))
- set_freepointer(s, p, p + s->size);
- else
- set_freepointer(s, p, NULL);
+ shuffle = shuffle_freelist(s, page);
+
+ if (!shuffle) {
+ for_each_object_idx(p, idx, s, start, page->objects) {
+ setup_object(s, page, p);
+ if (likely(idx < page->objects))
+ set_freepointer(s, p, p + s->size);
+ else
+ set_freepointer(s, p, NULL);
+ }
+ page->freelist = fixup_red_left(s, start);
  }
 
- page->freelist = fixup_red_left(s, start);
  page->inuse = page->objects;
  page->frozen = 1;
 
@@ -3207,6 +3317,7 @@ static void free_kmem_cache_nodes(struct kmem_cache *s)
 
 void __kmem_cache_release(struct kmem_cache *s)
 {
+ cache_random_seq_destroy(s);
  free_percpu(s->cpu_slab);
  free_kmem_cache_nodes(s);
 }
@@ -3431,6 +3542,13 @@ static int kmem_cache_open(struct kmem_cache *s, unsigned long flags)
 #ifdef CONFIG_NUMA
  s->remote_node_defrag_ratio = 1000;
 #endif
+
+ /* Initialize the pre-computed randomized freelist if slab is up */
+ if (slab_state >= UP) {
+ if (init_cache_random_seq(s))
+ goto error;
+ }
+
  if (!init_kmem_cache_nodes(s))
  goto error;
 
@@ -3947,6 +4065,9 @@ void __init kmem_cache_init(void)
  setup_kmalloc_cache_index_table();
  create_kmalloc_caches(0);
 
+ /* Setup random freelists for each cache */
+ init_freelist_randomization();
+
 #ifdef CONFIG_SMP
  register_cpu_notifier(&slab_notifier);
 #endif
--
2.8.0.rc3.226.g39d4020

Reply | Threaded
Open this post in threaded view
|

Re: [RFC v1 2/2] mm: SLUB Freelist randomization

Christoph Lameter-5
0.On Wed, 18 May 2016, Thomas Garnier wrote:

> slab_test, before:
> 10000 times kmalloc(8) -> 67 cycles kfree -> 101 cycles
> 10000 times kmalloc(16) -> 68 cycles kfree -> 109 cycles
> 10000 times kmalloc(32) -> 76 cycles kfree -> 119 cycles
> 10000 times kmalloc(64) -> 88 cycles kfree -> 114 cycles

> After:
> 10000 times kmalloc(8) -> 60 cycles kfree -> 74 cycles
> 10000 times kmalloc(16) -> 63 cycles kfree -> 78 cycles
> 10000 times kmalloc(32) -> 72 cycles kfree -> 85 cycles
> 10000 times kmalloc(64) -> 91 cycles kfree -> 99 cycles

Erm... The fastpath was not touched and the tests primarily exercise the
fastpath. This is likely some artifact of code placement by the compiler?

Reply | Threaded
Open this post in threaded view
|

Re: [RFC v1 2/2] mm: SLUB Freelist randomization

Thomas Garnier
Yes, I agree that it is not related to the changes.

On Wed, May 18, 2016 at 11:24 AM, Christoph Lameter <[hidden email]> wrote:

> 0.On Wed, 18 May 2016, Thomas Garnier wrote:
>
>> slab_test, before:
>> 10000 times kmalloc(8) -> 67 cycles kfree -> 101 cycles
>> 10000 times kmalloc(16) -> 68 cycles kfree -> 109 cycles
>> 10000 times kmalloc(32) -> 76 cycles kfree -> 119 cycles
>> 10000 times kmalloc(64) -> 88 cycles kfree -> 114 cycles
>
>> After:
>> 10000 times kmalloc(8) -> 60 cycles kfree -> 74 cycles
>> 10000 times kmalloc(16) -> 63 cycles kfree -> 78 cycles
>> 10000 times kmalloc(32) -> 72 cycles kfree -> 85 cycles
>> 10000 times kmalloc(64) -> 91 cycles kfree -> 99 cycles
>
> Erm... The fastpath was not touched and the tests primarily exercise the
> fastpath. This is likely some artifact of code placement by the compiler?
>
Reply | Threaded
Open this post in threaded view
|

Re: [RFC v1 2/2] mm: SLUB Freelist randomization

Christoph Lameter-5
On Wed, 18 May 2016, Thomas Garnier wrote:

> Yes, I agree that it is not related to the changes.

Could you please provide meaningful test data?
Reply | Threaded
Open this post in threaded view
|

Re: [RFC v1 2/2] mm: SLUB Freelist randomization

Thomas Garnier
I thought the mix of slab_test & kernbench would show a diverse
picture on perf data. Is there another test that you think would be
useful?

Thanks,
Thomas

On Wed, May 18, 2016 at 12:02 PM, Christoph Lameter <[hidden email]> wrote:
> On Wed, 18 May 2016, Thomas Garnier wrote:
>
>> Yes, I agree that it is not related to the changes.
>
> Could you please provide meaningful test data?
Reply | Threaded
Open this post in threaded view
|

Re: [RFC v1 2/2] mm: SLUB Freelist randomization

Joonsoo Kim-2
On Wed, May 18, 2016 at 12:12:13PM -0700, Thomas Garnier wrote:
> I thought the mix of slab_test & kernbench would show a diverse
> picture on perf data. Is there another test that you think would be
> useful?

Single thread testing on slab_test would be meaningful because it also
touch the slowpath. Problem is just unstable result of slab_test.

You can get more stable result of slab_test if you repeat same test
sometimes and get average result.

Please use following slab_test. It will do each operations 100000
times and repeat it 50 times.

https://github.com/JoonsooKim/linux/blob/slab_test_robust-next-20160509/mm/slab_test.c

I did a quick test for this patchset and get following result.

- Before (With patch and randomization is disabled by config)

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
100000 times kmalloc(8) -> 42 cycles kfree -> 67 cycles
100000 times kmalloc(16) -> 43 cycles kfree -> 68 cycles
100000 times kmalloc(32) -> 47 cycles kfree -> 72 cycles
100000 times kmalloc(64) -> 54 cycles kfree -> 78 cycles
100000 times kmalloc(128) -> 75 cycles kfree -> 87 cycles
100000 times kmalloc(256) -> 84 cycles kfree -> 111 cycles
100000 times kmalloc(512) -> 82 cycles kfree -> 112 cycles
100000 times kmalloc(1024) -> 86 cycles kfree -> 113 cycles
100000 times kmalloc(2048) -> 113 cycles kfree -> 127 cycles
100000 times kmalloc(4096) -> 151 cycles kfree -> 154 cycles

- After (With patch and randomization is enabled by config)

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
100000 times kmalloc(8) -> 51 cycles kfree -> 68 cycles
100000 times kmalloc(16) -> 57 cycles kfree -> 70 cycles
100000 times kmalloc(32) -> 70 cycles kfree -> 75 cycles
100000 times kmalloc(64) -> 95 cycles kfree -> 84 cycles
100000 times kmalloc(128) -> 142 cycles kfree -> 97 cycles
100000 times kmalloc(256) -> 150 cycles kfree -> 107 cycles
100000 times kmalloc(512) -> 151 cycles kfree -> 107 cycles
100000 times kmalloc(1024) -> 154 cycles kfree -> 110 cycles
100000 times kmalloc(2048) -> 230 cycles kfree -> 124 cycles
100000 times kmalloc(4096) -> 423 cycles kfree -> 165 cycles

It seems that performance decreases a lot but I don't care about it
because it is a security feature and I don't have a better idea.

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: [RFC v1 2/2] mm: SLUB Freelist randomization

Thomas Garnier
I ran the test given by Joonsoo and it gave me these minimum cycles
per size across 20 usage:

size,before,after
8,63.00,64.50 (102.38%)
16,64.50,65.00 (100.78%)
32,65.00,65.00 (100.00%)
64,66.00,65.00 (98.48%)
128,66.00,65.00 (98.48%)
256,64.00,64.00 (100.00%)
512,65.00,66.00 (101.54%)
1024,68.00,64.00 (94.12%)
2048,66.00,65.00 (98.48%)
4096,66.00,66.00 (100.00%)

I assume the difference is bigger if you don't have RDRAND support.

Christoph, Joonsoo: Do you think it would be valuable to add a CONFIG
to disable additional randomization per new page? It will remove
additional entropy but increase performance for machines without arch
specific randomization instructions.

Thanks,
Thomas


On Wed, May 18, 2016 at 7:07 PM, Joonsoo Kim <[hidden email]> wrote:

> On Wed, May 18, 2016 at 12:12:13PM -0700, Thomas Garnier wrote:
>> I thought the mix of slab_test & kernbench would show a diverse
>> picture on perf data. Is there another test that you think would be
>> useful?
>
> Single thread testing on slab_test would be meaningful because it also
> touch the slowpath. Problem is just unstable result of slab_test.
>
> You can get more stable result of slab_test if you repeat same test
> sometimes and get average result.
>
> Please use following slab_test. It will do each operations 100000
> times and repeat it 50 times.
>
> https://github.com/JoonsooKim/linux/blob/slab_test_robust-next-20160509/mm/slab_test.c
>
> I did a quick test for this patchset and get following result.
>
> - Before (With patch and randomization is disabled by config)
>
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 100000 times kmalloc(8) -> 42 cycles kfree -> 67 cycles
> 100000 times kmalloc(16) -> 43 cycles kfree -> 68 cycles
> 100000 times kmalloc(32) -> 47 cycles kfree -> 72 cycles
> 100000 times kmalloc(64) -> 54 cycles kfree -> 78 cycles
> 100000 times kmalloc(128) -> 75 cycles kfree -> 87 cycles
> 100000 times kmalloc(256) -> 84 cycles kfree -> 111 cycles
> 100000 times kmalloc(512) -> 82 cycles kfree -> 112 cycles
> 100000 times kmalloc(1024) -> 86 cycles kfree -> 113 cycles
> 100000 times kmalloc(2048) -> 113 cycles kfree -> 127 cycles
> 100000 times kmalloc(4096) -> 151 cycles kfree -> 154 cycles
>
> - After (With patch and randomization is enabled by config)
>
> Single thread testing
> =====================
> 1. Kmalloc: Repeatedly allocate then free test
> 100000 times kmalloc(8) -> 51 cycles kfree -> 68 cycles
> 100000 times kmalloc(16) -> 57 cycles kfree -> 70 cycles
> 100000 times kmalloc(32) -> 70 cycles kfree -> 75 cycles
> 100000 times kmalloc(64) -> 95 cycles kfree -> 84 cycles
> 100000 times kmalloc(128) -> 142 cycles kfree -> 97 cycles
> 100000 times kmalloc(256) -> 150 cycles kfree -> 107 cycles
> 100000 times kmalloc(512) -> 151 cycles kfree -> 107 cycles
> 100000 times kmalloc(1024) -> 154 cycles kfree -> 110 cycles
> 100000 times kmalloc(2048) -> 230 cycles kfree -> 124 cycles
> 100000 times kmalloc(4096) -> 423 cycles kfree -> 165 cycles
>
> It seems that performance decreases a lot but I don't care about it
> because it is a security feature and I don't have a better idea.
>
> Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: [RFC v1 2/2] mm: SLUB Freelist randomization

js1304
2016-05-20 5:20 GMT+09:00 Thomas Garnier <[hidden email]>:
> I ran the test given by Joonsoo and it gave me these minimum cycles
> per size across 20 usage:

I can't understand what you did here. Maybe, it's due to my poor Engling.
Please explain more. You did single thread test? Why minimum cycles
rather than average?

> size,before,after
> 8,63.00,64.50 (102.38%)
> 16,64.50,65.00 (100.78%)
> 32,65.00,65.00 (100.00%)
> 64,66.00,65.00 (98.48%)
> 128,66.00,65.00 (98.48%)
> 256,64.00,64.00 (100.00%)
> 512,65.00,66.00 (101.54%)
> 1024,68.00,64.00 (94.12%)
> 2048,66.00,65.00 (98.48%)
> 4096,66.00,66.00 (100.00%)

It looks like performance of all size classes are the same?

> I assume the difference is bigger if you don't have RDRAND support.

What does RDRAND means? Kconfig? How can I check if I have RDRAND?

> Christoph, Joonsoo: Do you think it would be valuable to add a CONFIG
> to disable additional randomization per new page? It will remove
> additional entropy but increase performance for machines without arch
> specific randomization instructions.

I don't think that it deserve another CONFIG. If performance is a matter,
I think that removing additional entropy is better until it is proved that
entropy is a problem.

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: [RFC v1 2/2] mm: SLUB Freelist randomization

Thomas Garnier
On Thu, May 19, 2016 at 7:15 PM, Joonsoo Kim <[hidden email]> wrote:
> 2016-05-20 5:20 GMT+09:00 Thomas Garnier <[hidden email]>:
>> I ran the test given by Joonsoo and it gave me these minimum cycles
>> per size across 20 usage:
>
> I can't understand what you did here. Maybe, it's due to my poor Engling.
> Please explain more. You did single thread test? Why minimum cycles
> rather than average?
>

I used your version of slab_test and ran it 20 times for each
versions. I compared
the minimum number of cycles as an optimal case for comparison. As you said
slab_test results can be unreliable. Comparing the average across multiple runs
always gave odd results.

>> size,before,after
>> 8,63.00,64.50 (102.38%)
>> 16,64.50,65.00 (100.78%)
>> 32,65.00,65.00 (100.00%)
>> 64,66.00,65.00 (98.48%)
>> 128,66.00,65.00 (98.48%)
>> 256,64.00,64.00 (100.00%)
>> 512,65.00,66.00 (101.54%)
>> 1024,68.00,64.00 (94.12%)
>> 2048,66.00,65.00 (98.48%)
>> 4096,66.00,66.00 (100.00%)
>
> It looks like performance of all size classes are the same?
>
>> I assume the difference is bigger if you don't have RDRAND support.
>
> What does RDRAND means? Kconfig? How can I check if I have RDRAND?
>

Sorry, I was referring to the usage of get_random_bytes_arch which
will be faster
if the test machine support specific instructions (like RDRAND).

>> Christoph, Joonsoo: Do you think it would be valuable to add a CONFIG
>> to disable additional randomization per new page? It will remove
>> additional entropy but increase performance for machines without arch
>> specific randomization instructions.
>
> I don't think that it deserve another CONFIG. If performance is a matter,
> I think that removing additional entropy is better until it is proved that
> entropy is a problem.
>

I will do more testing before the next RFC to decide the best approach.

> Thanks.

Thanks for the comments,
Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [RFC v1 2/2] mm: SLUB Freelist randomization

Joonsoo Kim-2
On Fri, May 20, 2016 at 09:24:35AM -0700, Thomas Garnier wrote:

> On Thu, May 19, 2016 at 7:15 PM, Joonsoo Kim <[hidden email]> wrote:
> > 2016-05-20 5:20 GMT+09:00 Thomas Garnier <[hidden email]>:
> >> I ran the test given by Joonsoo and it gave me these minimum cycles
> >> per size across 20 usage:
> >
> > I can't understand what you did here. Maybe, it's due to my poor Engling.
> > Please explain more. You did single thread test? Why minimum cycles
> > rather than average?
> >
>
> I used your version of slab_test and ran it 20 times for each
> versions. I compared
> the minimum number of cycles as an optimal case for comparison. As you said
> slab_test results can be unreliable. Comparing the average across multiple runs
> always gave odd results.

Hmm... With my version, slab_test results seems to be reliable for me. You
can use average result in this case. Anyway, your minimum result looks
odd even if my version is used. Large sized test would go slowpath
more frequently so it should be worse.

>
> >> size,before,after
> >> 8,63.00,64.50 (102.38%)
> >> 16,64.50,65.00 (100.78%)
> >> 32,65.00,65.00 (100.00%)
> >> 64,66.00,65.00 (98.48%)
> >> 128,66.00,65.00 (98.48%)
> >> 256,64.00,64.00 (100.00%)
> >> 512,65.00,66.00 (101.54%)
> >> 1024,68.00,64.00 (94.12%)
> >> 2048,66.00,65.00 (98.48%)
> >> 4096,66.00,66.00 (100.00%)
> >
> > It looks like performance of all size classes are the same?
> >
> >> I assume the difference is bigger if you don't have RDRAND support.
> >
> > What does RDRAND means? Kconfig? How can I check if I have RDRAND?
> >
>
> Sorry, I was referring to the usage of get_random_bytes_arch which
> will be faster
> if the test machine support specific instructions (like RDRAND).

Thanks! I checked that my test bed (QEMU) doesn't support rdrand.
(/proc/cpuinfo)

> >> Christoph, Joonsoo: Do you think it would be valuable to add a CONFIG
> >> to disable additional randomization per new page? It will remove
> >> additional entropy but increase performance for machines without arch
> >> specific randomization instructions.
> >
> > I don't think that it deserve another CONFIG. If performance is a matter,
> > I think that removing additional entropy is better until it is proved that
> > entropy is a problem.
> >
>
> I will do more testing before the next RFC to decide the best approach.

Okay.

Thanks.