Skip to content

Commit e2e9b65

Browse files
borkmanndavem330
authored andcommitted
cls_bpf: add initial eBPF support for programmable classifiers
This work extends the "classic" BPF programmable tc classifier by extending its scope also to native eBPF code! This allows for user space to implement own custom, 'safe' C like classifiers (or whatever other frontend language LLVM et al may provide in future), that can then be compiled with the LLVM eBPF backend to an eBPF elf file. The result of this can be loaded into the kernel via iproute2's tc. In the kernel, they can be JITed on major archs and thus run in native performance. Simple, minimal toy example to demonstrate the workflow: #include <linux/ip.h> #include <linux/if_ether.h> #include <linux/bpf.h> #include "tc_bpf_api.h" __section("classify") int cls_main(struct sk_buff *skb) { return (0x800 << 16) | load_byte(skb, ETH_HLEN + __builtin_offsetof(struct iphdr, tos)); } char __license[] __section("license") = "GPL"; The classifier can then be compiled into eBPF opcodes and loaded via tc, for example: clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o tc filter add dev em1 parent 1: bpf cls.o [...] As it has been demonstrated, the scope can even reach up to a fully fledged flow dissector (similarly as in samples/bpf/sockex2_kern.c). For tc, maps are allowed to be used, but from kernel context only, in other words, eBPF code can keep state across filter invocations. In future, we perhaps may reattach from a different application to those maps e.g., to read out collected statistics/state. Similarly as in socket filters, we may extend functionality for eBPF classifiers over time depending on the use cases. For that purpose, cls_bpf programs are using BPF_PROG_TYPE_SCHED_CLS program type, so we can allow additional functions/accessors (e.g. an ABI compatible offset translation to skb fields/metadata). For an initial cls_bpf support, we allow the same set of helper functions as eBPF socket filters, but we could diverge at some point in time w/o problem. I was wondering whether cls_bpf and act_bpf could share C programs, I can imagine that at some point, we introduce i) further common handlers for both (or even beyond their scope), and/or if truly needed ii) some restricted function space for each of them. Both can be abstracted easily through struct bpf_verifier_ops in future. The context of cls_bpf versus act_bpf is slightly different though: a cls_bpf program will return a specific classid whereas act_bpf a drop/non-drop return code, latter may also in future mangle skbs. That said, we can surely have a "classify" and "action" section in a single object file, or considered mentioned constraint add a possibility of a shared section. The workflow for getting native eBPF running from tc [1] is as follows: for f_bpf, I've added a slightly modified ELF parser code from Alexei's kernel sample, which reads out the LLVM compiled object, sets up maps (and dynamically fixes up map fds) if any, and loads the eBPF instructions all centrally through the bpf syscall. The resulting fd from the loaded program itself is being passed down to cls_bpf, which looks up struct bpf_prog from the fd store, and holds reference, so that it stays available also after tc program lifetime. On tc filter destruction, it will then drop its reference. Moreover, I've also added the optional possibility to annotate an eBPF filter with a name (e.g. path to object file, or something else if preferred) so that when tc dumps currently installed filters, some more context can be given to an admin for a given instance (as opposed to just the file descriptor number). Last but not least, bpf_prog_get() and bpf_prog_put() needed to be exported, so that eBPF can be used from cls_bpf built as a module. Thanks to 60a3b22 ("net: bpf: make eBPF interpreter images read-only") I think this is of no concern since anything wanting to alter eBPF opcode after verification stage would crash the kernel. [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf Signed-off-by: Daniel Borkmann <[email protected]> Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent 24701ec commit e2e9b65

File tree

3 files changed

+158
-52
lines changed

3 files changed

+158
-52
lines changed

include/uapi/linux/pkt_cls.h

+2
Original file line numberDiff line numberDiff line change
@@ -397,6 +397,8 @@ enum {
397397
TCA_BPF_CLASSID,
398398
TCA_BPF_OPS_LEN,
399399
TCA_BPF_OPS,
400+
TCA_BPF_FD,
401+
TCA_BPF_NAME,
400402
__TCA_BPF_MAX,
401403
};
402404

kernel/bpf/syscall.c

+2
Original file line numberDiff line numberDiff line change
@@ -419,6 +419,7 @@ void bpf_prog_put(struct bpf_prog *prog)
419419
bpf_prog_free(prog);
420420
}
421421
}
422+
EXPORT_SYMBOL_GPL(bpf_prog_put);
422423

423424
static int bpf_prog_release(struct inode *inode, struct file *filp)
424425
{
@@ -466,6 +467,7 @@ struct bpf_prog *bpf_prog_get(u32 ufd)
466467
fdput(f);
467468
return prog;
468469
}
470+
EXPORT_SYMBOL_GPL(bpf_prog_get);
469471

470472
/* last field in 'union bpf_attr' used by this command */
471473
#define BPF_PROG_LOAD_LAST_FIELD log_buf

net/sched/cls_bpf.c

+154-52
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@
1616
#include <linux/types.h>
1717
#include <linux/skbuff.h>
1818
#include <linux/filter.h>
19+
#include <linux/bpf.h>
20+
1921
#include <net/rtnetlink.h>
2022
#include <net/pkt_cls.h>
2123
#include <net/sock.h>
@@ -24,6 +26,8 @@ MODULE_LICENSE("GPL");
2426
MODULE_AUTHOR("Daniel Borkmann <[email protected]>");
2527
MODULE_DESCRIPTION("TC BPF based classifier");
2628

29+
#define CLS_BPF_NAME_LEN 256
30+
2731
struct cls_bpf_head {
2832
struct list_head plist;
2933
u32 hgen;
@@ -32,18 +36,24 @@ struct cls_bpf_head {
3236

3337
struct cls_bpf_prog {
3438
struct bpf_prog *filter;
35-
struct sock_filter *bpf_ops;
36-
struct tcf_exts exts;
37-
struct tcf_result res;
3839
struct list_head link;
40+
struct tcf_result res;
41+
struct tcf_exts exts;
3942
u32 handle;
40-
u16 bpf_num_ops;
43+
union {
44+
u32 bpf_fd;
45+
u16 bpf_num_ops;
46+
};
47+
struct sock_filter *bpf_ops;
48+
const char *bpf_name;
4149
struct tcf_proto *tp;
4250
struct rcu_head rcu;
4351
};
4452

4553
static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
4654
[TCA_BPF_CLASSID] = { .type = NLA_U32 },
55+
[TCA_BPF_FD] = { .type = NLA_U32 },
56+
[TCA_BPF_NAME] = { .type = NLA_NUL_STRING, .len = CLS_BPF_NAME_LEN },
4757
[TCA_BPF_OPS_LEN] = { .type = NLA_U16 },
4858
[TCA_BPF_OPS] = { .type = NLA_BINARY,
4959
.len = sizeof(struct sock_filter) * BPF_MAXINSNS },
@@ -76,6 +86,11 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
7686
return -1;
7787
}
7888

89+
static bool cls_bpf_is_ebpf(const struct cls_bpf_prog *prog)
90+
{
91+
return !prog->bpf_ops;
92+
}
93+
7994
static int cls_bpf_init(struct tcf_proto *tp)
8095
{
8196
struct cls_bpf_head *head;
@@ -94,8 +109,12 @@ static void cls_bpf_delete_prog(struct tcf_proto *tp, struct cls_bpf_prog *prog)
94109
{
95110
tcf_exts_destroy(&prog->exts);
96111

97-
bpf_prog_destroy(prog->filter);
112+
if (cls_bpf_is_ebpf(prog))
113+
bpf_prog_put(prog->filter);
114+
else
115+
bpf_prog_destroy(prog->filter);
98116

117+
kfree(prog->bpf_name);
99118
kfree(prog->bpf_ops);
100119
kfree(prog);
101120
}
@@ -114,6 +133,7 @@ static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
114133
list_del_rcu(&prog->link);
115134
tcf_unbind_filter(tp, &prog->res);
116135
call_rcu(&prog->rcu, __cls_bpf_delete_prog);
136+
117137
return 0;
118138
}
119139

@@ -151,69 +171,121 @@ static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
151171
return ret;
152172
}
153173

154-
static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
155-
struct cls_bpf_prog *prog,
156-
unsigned long base, struct nlattr **tb,
157-
struct nlattr *est, bool ovr)
174+
static int cls_bpf_prog_from_ops(struct nlattr **tb,
175+
struct cls_bpf_prog *prog, u32 classid)
158176
{
159177
struct sock_filter *bpf_ops;
160-
struct tcf_exts exts;
161-
struct sock_fprog_kern tmp;
178+
struct sock_fprog_kern fprog_tmp;
162179
struct bpf_prog *fp;
163180
u16 bpf_size, bpf_num_ops;
164-
u32 classid;
165181
int ret;
166182

167-
if (!tb[TCA_BPF_OPS_LEN] || !tb[TCA_BPF_OPS] || !tb[TCA_BPF_CLASSID])
168-
return -EINVAL;
169-
170-
tcf_exts_init(&exts, TCA_BPF_ACT, TCA_BPF_POLICE);
171-
ret = tcf_exts_validate(net, tp, tb, est, &exts, ovr);
172-
if (ret < 0)
173-
return ret;
174-
175-
classid = nla_get_u32(tb[TCA_BPF_CLASSID]);
176183
bpf_num_ops = nla_get_u16(tb[TCA_BPF_OPS_LEN]);
177-
if (bpf_num_ops > BPF_MAXINSNS || bpf_num_ops == 0) {
178-
ret = -EINVAL;
179-
goto errout;
180-
}
184+
if (bpf_num_ops > BPF_MAXINSNS || bpf_num_ops == 0)
185+
return -EINVAL;
181186

182187
bpf_size = bpf_num_ops * sizeof(*bpf_ops);
183-
if (bpf_size != nla_len(tb[TCA_BPF_OPS])) {
184-
ret = -EINVAL;
185-
goto errout;
186-
}
188+
if (bpf_size != nla_len(tb[TCA_BPF_OPS]))
189+
return -EINVAL;
187190

188191
bpf_ops = kzalloc(bpf_size, GFP_KERNEL);
189-
if (bpf_ops == NULL) {
190-
ret = -ENOMEM;
191-
goto errout;
192-
}
192+
if (bpf_ops == NULL)
193+
return -ENOMEM;
193194

194195
memcpy(bpf_ops, nla_data(tb[TCA_BPF_OPS]), bpf_size);
195196

196-
tmp.len = bpf_num_ops;
197-
tmp.filter = bpf_ops;
197+
fprog_tmp.len = bpf_num_ops;
198+
fprog_tmp.filter = bpf_ops;
198199

199-
ret = bpf_prog_create(&fp, &tmp);
200-
if (ret)
201-
goto errout_free;
200+
ret = bpf_prog_create(&fp, &fprog_tmp);
201+
if (ret < 0) {
202+
kfree(bpf_ops);
203+
return ret;
204+
}
202205

203-
prog->bpf_num_ops = bpf_num_ops;
204206
prog->bpf_ops = bpf_ops;
207+
prog->bpf_num_ops = bpf_num_ops;
208+
prog->bpf_name = NULL;
209+
205210
prog->filter = fp;
206211
prog->res.classid = classid;
207212

213+
return 0;
214+
}
215+
216+
static int cls_bpf_prog_from_efd(struct nlattr **tb,
217+
struct cls_bpf_prog *prog, u32 classid)
218+
{
219+
struct bpf_prog *fp;
220+
char *name = NULL;
221+
u32 bpf_fd;
222+
223+
bpf_fd = nla_get_u32(tb[TCA_BPF_FD]);
224+
225+
fp = bpf_prog_get(bpf_fd);
226+
if (IS_ERR(fp))
227+
return PTR_ERR(fp);
228+
229+
if (fp->type != BPF_PROG_TYPE_SCHED_CLS) {
230+
bpf_prog_put(fp);
231+
return -EINVAL;
232+
}
233+
234+
if (tb[TCA_BPF_NAME]) {
235+
name = kmemdup(nla_data(tb[TCA_BPF_NAME]),
236+
nla_len(tb[TCA_BPF_NAME]),
237+
GFP_KERNEL);
238+
if (!name) {
239+
bpf_prog_put(fp);
240+
return -ENOMEM;
241+
}
242+
}
243+
244+
prog->bpf_ops = NULL;
245+
prog->bpf_fd = bpf_fd;
246+
prog->bpf_name = name;
247+
248+
prog->filter = fp;
249+
prog->res.classid = classid;
250+
251+
return 0;
252+
}
253+
254+
static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
255+
struct cls_bpf_prog *prog,
256+
unsigned long base, struct nlattr **tb,
257+
struct nlattr *est, bool ovr)
258+
{
259+
struct tcf_exts exts;
260+
bool is_bpf, is_ebpf;
261+
u32 classid;
262+
int ret;
263+
264+
is_bpf = tb[TCA_BPF_OPS_LEN] && tb[TCA_BPF_OPS];
265+
is_ebpf = tb[TCA_BPF_FD];
266+
267+
if ((!is_bpf && !is_ebpf) || (is_bpf && is_ebpf) ||
268+
!tb[TCA_BPF_CLASSID])
269+
return -EINVAL;
270+
271+
tcf_exts_init(&exts, TCA_BPF_ACT, TCA_BPF_POLICE);
272+
ret = tcf_exts_validate(net, tp, tb, est, &exts, ovr);
273+
if (ret < 0)
274+
return ret;
275+
276+
classid = nla_get_u32(tb[TCA_BPF_CLASSID]);
277+
278+
ret = is_bpf ? cls_bpf_prog_from_ops(tb, prog, classid) :
279+
cls_bpf_prog_from_efd(tb, prog, classid);
280+
if (ret < 0) {
281+
tcf_exts_destroy(&exts);
282+
return ret;
283+
}
284+
208285
tcf_bind_filter(tp, &prog->res, base);
209286
tcf_exts_change(tp, &prog->exts, &exts);
210287

211288
return 0;
212-
errout_free:
213-
kfree(bpf_ops);
214-
errout:
215-
tcf_exts_destroy(&exts);
216-
return ret;
217289
}
218290

219291
static u32 cls_bpf_grab_new_handle(struct tcf_proto *tp,
@@ -297,11 +369,43 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
297369
return ret;
298370
}
299371

372+
static int cls_bpf_dump_bpf_info(const struct cls_bpf_prog *prog,
373+
struct sk_buff *skb)
374+
{
375+
struct nlattr *nla;
376+
377+
if (nla_put_u16(skb, TCA_BPF_OPS_LEN, prog->bpf_num_ops))
378+
return -EMSGSIZE;
379+
380+
nla = nla_reserve(skb, TCA_BPF_OPS, prog->bpf_num_ops *
381+
sizeof(struct sock_filter));
382+
if (nla == NULL)
383+
return -EMSGSIZE;
384+
385+
memcpy(nla_data(nla), prog->bpf_ops, nla_len(nla));
386+
387+
return 0;
388+
}
389+
390+
static int cls_bpf_dump_ebpf_info(const struct cls_bpf_prog *prog,
391+
struct sk_buff *skb)
392+
{
393+
if (nla_put_u32(skb, TCA_BPF_FD, prog->bpf_fd))
394+
return -EMSGSIZE;
395+
396+
if (prog->bpf_name &&
397+
nla_put_string(skb, TCA_BPF_NAME, prog->bpf_name))
398+
return -EMSGSIZE;
399+
400+
return 0;
401+
}
402+
300403
static int cls_bpf_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
301404
struct sk_buff *skb, struct tcmsg *tm)
302405
{
303406
struct cls_bpf_prog *prog = (struct cls_bpf_prog *) fh;
304-
struct nlattr *nest, *nla;
407+
struct nlattr *nest;
408+
int ret;
305409

306410
if (prog == NULL)
307411
return skb->len;
@@ -314,16 +418,14 @@ static int cls_bpf_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
314418

315419
if (nla_put_u32(skb, TCA_BPF_CLASSID, prog->res.classid))
316420
goto nla_put_failure;
317-
if (nla_put_u16(skb, TCA_BPF_OPS_LEN, prog->bpf_num_ops))
318-
goto nla_put_failure;
319421

320-
nla = nla_reserve(skb, TCA_BPF_OPS, prog->bpf_num_ops *
321-
sizeof(struct sock_filter));
322-
if (nla == NULL)
422+
if (cls_bpf_is_ebpf(prog))
423+
ret = cls_bpf_dump_ebpf_info(prog, skb);
424+
else
425+
ret = cls_bpf_dump_bpf_info(prog, skb);
426+
if (ret)
323427
goto nla_put_failure;
324428

325-
memcpy(nla_data(nla), prog->bpf_ops, nla_len(nla));
326-
327429
if (tcf_exts_dump(skb, &prog->exts) < 0)
328430
goto nla_put_failure;
329431

0 commit comments

Comments
 (0)