вторник, 23 июня 2015 г.

Настройка роутинг демона BIRD для экспорта содержимого локальной роутинг таблицы по BGP

Все тесты будут вестись на CentOS 6 :)

Итак, установим Bird:
yum install -y bird

Откроем его конфиг:
vim /etc/bird.conf 

И закомментируем там все, что нам не закомментировано дабы избежать конфликтов.

После этого забиваем следующий конфиг:
protocol kernel {
    persist;                # Don't remove routes on bird shutdown
    learn;                  # Learn all alien routes from the kernel
    export none;            # Default is export none
    import all;
    scan time 20;           # Scan kernel routing table every 20 seconds
    device routes yes;
}

protocol device {
    scan time 10;           # Scan interfaces every 10 seconds
}

filter test_filter {
    # filter out private subnets
    if net ~ 10.0.0.0/8 then reject;
    if net ~ 192.168.0.0/16 then reject;
    if net ~ 172.16.0.0/12 then reject;

    # filter link local for IPv4
    if net ~ 169.254.0.0/16  then reject;

    accept;  
}


protocol bgp {
    local as 65000;

    source address 10.0.131.2;
    # do not connect to remote peers, work as route server now!
    # passive;

    import all;
    export filter test_filter;
    neighbor 10.0.129.2 as 65000;
}
Данный конфиг приводит к тому, что будет установлена BGP сессия с заданным узлом и в нее будут проанонсированы все маршруты из локальной таблицы Linux (в моем случае это маршруты до OpenVZ VPS).

После этого применяем изменения (в лоб, потому что тесты. В продакшене нужно делать через birdc):
/etc/init.d/bird restart

Убеждаемся, что BGP сессия поднялась:
show protocols all bgp1
name     proto    table    state  since       info
bgp1     BGP      master   up     15:07:32    Established  
  Preference:     100
  Input filter:   ACCEPT
  Output filter:  test_filter
  Routes:         0 imported, 27 exported, 0 preferred
  Route change stats:     received   rejected   filtered    ignored   accepted
    Import updates:              0          0          0          0          0
    Import withdraws:            0          0        ---          0          0
    Export updates:             29          0          2        ---         27
    Export withdraws:            0        ---        ---        ---          0
  BGP state:          Established
    Neighbor address: 10.0.129.2
    Neighbor AS:      65000
    Neighbor ID:      10.0.129.2
    Neighbor caps:    AS4
    Session:          internal multihop AS4
    Source address:   10.0.131.2
    Hold timer:       109/180
    Keepalive timer:  2/60

Все! :)

Аггрегация огромных сетей

Довольно часто приходится работать с реально огромными списками сетей. Как пример - можно взять любой IX, я возьму ближайший географически ко мне DATA-IX.

Итак, выдернем все их сети с помощью BGPq3:
/opt/bgpq3/bin/bgpq3 AS-DATAIX|awk '{print $5}' > /root/dataix_networks_list.dat
Наберется очень много - 251 тыща сетей:
wc -l /root/dataix_networks_list.dat
251009 /root/dataix_networks_list.dat
Посчитаем, сколько это в хостах:
cat  /root/dataix_networks_list.dat | sed 's#/# #' |awk '{print $2}'|perl -e 'my$total=0; do{ $_ = int($_); next unless $_; $total += 2**(32-$_) } for <>; print "Total networks size: $total\nTotal Internet size: " . 2**32 . "\n"'
Total networks size: 410 650 370
Total Internet size: 4 294 967 296
Согласитесь, смотрится совершенно неплохо даже на фоне размера всего адресного пространства IPv4.

Но что если там есть дублирующиеся сети или сети вложенные друг в друга? Это реально сложно проверить и еще сложнее починить "в лоб".

Но есть чудесная тулза - aggregate.

Устанавливаем ее:
apt-get install -y aggregate
И запускаем аггрегацию:

cat /root/dataix_networks_list.dat| aggregate > /root/dataix_networks_list_aggregated_new.dat
Она будет потреблять приличное количество CPU ресурсов и будет работать несколько минут:
real    2m29.608s
user    2m29.564s
sys    0m0.012s
Но на выходе мы получим просто потрясающие результаты! Число сетей сократится в 10 раз:
wc -l  /root/dataix_networks_list_aggregated_new.dat
24628 /root/dataix_networks_list_aggregated_new.dat
А число хостов вдвое:
Total networks size: 232866112
Total Internet size: 4294967296
Это особенно актуально, когда Вы сильно стеснены в аппаратных ресурсах (число ACL на свиче, число route префиксов l3 свиче).

Такой же способ оптимизации можно использовать и для FastNetMon, чтобы зазря не выделять память для сетей, которые вложены друг в друга :) 



Generate BGP filters with BGPQ3

Build it:
cd /tmp
wget http://snar.spb.ru/prog/bgpq3/bgpq3-0.1.31.tgz
tar -xvzf bgpq3-0.1.31.tgz
cd bgpq3-0.1.31/
./configure --prefix=/opt/bgpq3
sudo mkdir -p /opt/bgpq3/bin
sudo make install
Generate filter list by ASN (actually you could use AS-SET here too):
 /opt/bgpq3/bin/bgpq3 AS24940
no ip prefix-list NN
ip prefix-list NN permit 5.9.0.0/16
ip prefix-list NN permit 46.4.0.0/16
ip prefix-list NN permit 78.46.0.0/15
ip prefix-list NN permit 85.10.192.0/18
ip prefix-list NN permit 88.198.0.0/16
ip prefix-list NN permit 91.220.49.0/24
ip prefix-list NN permit 91.233.8.0/22
ip prefix-list NN permit 136.243.0.0/16
ip prefix-list NN permit 138.201.0.0/16
ip prefix-list NN permit 144.76.0.0/16
ip prefix-list NN permit 148.251.0.0/16
ip prefix-list NN permit 176.9.0.0/16
ip prefix-list NN permit 176.102.168.0/21
ip prefix-list NN permit 178.63.0.0/16
ip prefix-list NN permit 185.12.64.0/22
ip prefix-list NN permit 185.50.120.0/23
ip prefix-list NN permit 188.40.0.0/16
ip prefix-list NN permit 193.25.170.0/23
ip prefix-list NN permit 193.28.90.0/24
ip prefix-list NN permit 193.110.6.0/23
ip prefix-list NN permit 193.223.77.0/24
ip prefix-list NN permit 194.42.180.0/22
ip prefix-list NN permit 194.42.184.0/22
ip prefix-list NN permit 194.145.226.0/24
ip prefix-list NN permit 195.248.224.0/24
ip prefix-list NN permit 197.242.84.0/22
ip prefix-list NN permit 213.133.96.0/19
ip prefix-list NN permit 213.169.144.0/22
ip prefix-list NN permit 213.239.192.0/18
This toolkit supports so much options for diffrent vendors (and even json!).

Great thanks to author, Alexander Snarski.

Official site: here.

В случае ошибки:
FATAL ERROR:Partial write to radb, only 7 bytes written: Connection reset by peer
На Linux делаем вот так:
sysctl -w net.ipv4.tcp_window_scaling=1
sysctl -w net.core.rmem_max=33554432
sysctl -w net.core.wmem_max=33554432
sysctl -w net.ipv4.tcp_rmem="4096 87380 33554432"
sysctl -w net.ipv4.tcp_wmem="4096 65536 33554432"

суббота, 20 июня 2015 г.

How to run netmap on single queue and use another queues for Linux stack

Hello, folks!

I will do some ixgbe magic here! Please stay tuned =)

Here I could provide short reference about netmap compilation with patched ixgbe driver.

First of all you should get my patched driver. In this driver I have assigned first (0) queue to netmap and assigned another queues to Linux network stack.

Get driver sources and put they to "fake" Linux kernel source tree (netmap build system expect this):
cd /usr/src
mkdir -p /usr/src/fake_linux_kernel_sources/drivers/net/ethernet/intel
cd /usr/src/fake_linux_kernel_sources/drivers/net/ethernet/intel
git clone https://github.com/pavel-odintsov/ixgbe-linux-netmap-single-queue.git ixgbe_temp
mv ixgbe_temp/ixgbe-3.23.2.1/src/ ixgbe

Let's get netmap:
cd /usr/src
git clone https://github.com/luigirizzo/netmap.git -b next
cd netmap/LINUX/
Do some netmap patching:
sed -i 's/#define ixgbe_driver_name netmap_ixgbe_driver_name/\/\/\0/'  ixgbe_netmap_linux.h
sed -i 's/^char ixgbe_driver_name\[\]/\/\/\0/'  ixgbe_netmap_linux.h
sed -i '/$t\s\{1,\}if \[ \-f patches/d' configure
Let's compile it:
./configure --kernel-sources=/usr/src/fake_linux_kernel_sources --drivers=ixgbe
make
Unload old ixgbe not patched driver:
rmmod ixgbe
Load netmap:

insmod /usr/src/netmap/LINUX/netmap.ko
insmod /usr/src/netmap/LINUX/ixgbe/ixgbe.ko
Well, we have netmap which could process only first NIC hardware queue.

We should check it. For tests I will use flow director and flow all udp packets to 53 port to first queue:
ethtool -K eth5 ntuple on
ethtool --config-ntuple eth5 flow-type udp4 dst-port 53 action 0
Then we should built test environment.

Please compule test netmap receiver:
cd /usr/src/netmap/examples/
make

Yes, we are ready for tests!

Please run linux network stack receiver app in one console session:
tcpdump -n -i eth5

And netmap reciver app in another console session:
/usr/src/netmap/examples/pkt-gen -f rx -X -i netmap:eth5
Actually, we need external machine and please start pinging of target host from it and let's send udp packet to it from another session.

For udp packets generation you could use nc:
echo asdasda| nc -u 10.10.10.221  53

And you will saw:
 ./pkt-gen -f rx -i netmap:eth5 -X
689.290532 main [1651] interface is netmap:eth5
689.290977 extract_ip_range [288] range is 10.0.0.1:0 to 10.0.0.1:0
689.291015 extract_ip_range [288] range is 10.1.0.1:0 to 10.1.0.1:0
689.517212 main [1848] mapped 334980KB at 0x7fcf508ea000
Receiving from netmap:eth5: 1 queues, 1 threads and 1 cpus.
689.517331 main [1910] --- SPECIAL OPTIONS:

689.517345 main [1934] Wait 2 secs for phy reset
691.517508 main [1936] Ready...
691.517870 nm_open [456] overriding ifname eth5 ringid 0x0 flags 0x1
691.522007 receiver_body [1184] reading from netmap:eth5 fd 4 main_fd 3
692.523020 main_thread [1448] 0 pps (0 pkts in 1001104 usec)
692.525560 receiver_body [1191] waiting for initial packets, poll returns 0 0
693.524487 main_thread [1448] 0 pps (0 pkts in 1001468 usec)
693.528806 receiver_body [1191] waiting for initial packets, poll returns 0 0
694.525850 main_thread [1448] 0 pps (0 pkts in 1001363 usec)
694.532073 receiver_body [1191] waiting for initial packets, poll returns 0 0
695.526988 main_thread [1448] 0 pps (0 pkts in 1001137 usec)
695.535358 receiver_body [1191] waiting for initial packets, poll returns 0 0
696.528438 main_thread [1448] 0 pps (0 pkts in 1001450 usec)
696.538669 receiver_body [1191] waiting for initial packets, poll returns 0 0
697.529608 main_thread [1448] 0 pps (0 pkts in 1001170 usec)
697.542189 receiver_body [1191] waiting for initial packets, poll returns 0 0
698.530749 main_thread [1448] 0 pps (0 pkts in 1001141 usec)
698.545628 receiver_body [1191] waiting for initial packets, poll returns 0 0
699.531875 main_thread [1448] 0 pps (0 pkts in 1001126 usec)
699.549208 receiver_body [1191] waiting for initial packets, poll returns 0 0
700.532999 main_thread [1448] 0 pps (0 pkts in 1001124 usec)
700.552431 receiver_body [1191] waiting for initial packets, poll returns 0 0
ring 0x7fcf50954000 cur     0 [buf   4611 flags 0x0000 len    60]
    0: 90 e2 ba 83 3f 25 90 e2 ba 78 26 8d 08 00 45 00 ....?%...x&...E.
   16: 00 24 ce 85 40 00 40 11 43 49 0a 0a 0a 0a 0a 0a .$..@.@.CI......
   32: 0a dd ed 13 00 35 00 10 4f 47 61 73 64 61 73 64 .....5..OGasdasd
   48: 61 0a 00 00 00 00 00 00 00 00 00 00
701.534128 main_thread [1448] 1 pps (1 pkts in 1001129 usec)
702.535260 main_thread [1448] 0 pps (0 pkts in 1001132 usec)
703.536380 main_thread [1448] 0 pps (0 pkts in 1001120 usec)
And in tcpdump window:
tcpdump -n -i eth5
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth5, link-type EN10MB (Ethernet), capture size 262144 bytes
08:01:36.520074 IP 10.10.10.10 > 10.10.10.221: ICMP echo request, id 6581, seq 21, length 64
08:01:36.520114 IP 10.10.10.221 > 10.10.10.10: ICMP echo reply, id 6581, seq 21, length 64
08:01:37.519971 IP 10.10.10.10 > 10.10.10.221: ICMP echo request, id 6581, seq 22, length 64
08:01:37.520009 IP 10.10.10.221 > 10.10.10.10: ICMP echo reply, id 6581, seq 22, length 64
08:01:38.520028 IP 10.10.10.10 > 10.10.10.221: ICMP echo request, id 6581, seq 23, length 64
08:01:38.520060 IP 10.10.10.221 > 10.10.10.10: ICMP echo reply, id 6581, seq 23, length 64
08:01:39.520091 IP 10.10.10.10 > 10.10.10.221: ICMP echo request, id 6581, seq 24, length 64
08:01:39.520130 IP 10.10.10.221 > 10.10.10.10: ICMP echo reply, id 6581, seq 24, length 64
08:01:40.520096 IP 10.10.10.10 > 10.10.10.221: ICMP echo request, id 6581, seq 25, length 64
08:01:40.520134 IP 10.10.10.221 > 10.10.10.10: ICMP echo reply, id 6581, seq 25, length 64
08:01:41.520030 IP 10.10.10.10 > 10.10.10.221: ICMP echo request, id 6581, seq 26, length 64
08:01:41.520064 IP 10.10.10.221 > 10.10.10.10: ICMP echo reply, id 6581, seq 26, length 64
08:01:42.520016 IP 10.10.10.10 > 10.10.10.221: ICMP echo request, id 6581, seq 27, length 64
08:01:42.520053 IP 10.10.10.221 > 10.10.10.10: ICMP echo reply, id 6581, seq 27, length 64
08:01:43.520086 IP 10.10.10.10 > 10.10.10.221: ICMP echo request, id 6581, seq 28, length 64
08:01:43.520125 IP 10.10.10.221 > 10.10.10.10: ICMP echo reply, id 6581, seq 28, length 64
^C
16 packets captured
16 packets received by filter
0 packets dropped by kernel
As you can see. Linux haven't saw UDP packets to 53 port but still process icmp packets. Everything works well! Hurra!

 Folks, be aware. This patch is very rude and not tested well. And we need some way for detaching this queue from Linux side because for ICMP packets case Linux try to send reply packets over detached queue. Haha, actually, we could do it very simple. We could disable tx queue detaching and use it for Linux.

And we need do some custom RING number tuning for ixgbe driver.

Finalyly, this approach working but need some enhancements :) 

пятница, 19 июня 2015 г.

Сборка libbgpdump на Debian 8 Jessy

Офсайт: http://www.ris.ripe.net/source/bgpdump/
apt-get install -y libbz2-dev zlib1g-dev
cd /usr/src
wget http://www.ris.ripe.net/source/bgpdump/libbgpdump-1.4.99.15.tgz
tar -xf libbgpdump-1.4.99.15.tgz
cd libbgpdump-1.4.99.15
./configure --prefix=/opt/libbgpdump
make install
Добавляем:
echo "/opt/libbgpdump/lib" > /etc/ld.so.conf.d/libbgpdump.conf
ldconfig
Вызываем:
 /opt/libbgpdump/bin/bgpdump

Самый простой способ использовать новое ядро в Debian Jessie - взять из Sid

Друзья! Этот вариант для суровых экспериментаторов, которые знают, как починить систему после такого вандализма! Если Вы не знаете - то и ядро Вам обновлять не нужно, честно говорю! 

Скачать отсюда:  https://packages.debian.org/sid/amd64/linux-image-4.0.0-2-amd64/download

Так:
wget http://ftp.us.debian.org/debian/pool/main/l/linux/linux-headers-4.0.0-2-amd64_4.0.5-1_amd64.deb
wget http://ftp.us.debian.org/debian/pool/main/l/linux/linux-image-4.0.0-2-amd64_4.0.5-1_amd64.deb
wget http://ftp.us.debian.org/debian/pool/main/l/linux/linux-compiler-gcc-4.9-x86_4.0.5-1_amd64.deb
wget http://ftp.us.debian.org/debian/pool/main/l/linux/linux-headers-4.0.0-2-common_4.0.5-1_amd64.deb
wget http://ftp.us.debian.org/debian/pool/main/l/linux-tools/linux-kbuild-4.0_4.0.2-1_amd64.deb

И поставить через:
dpkg -i linux-compiler-gcc-4.9-x86_4.0.5-1_amd64.deb  linux-kbuild-4.0_4.0.2-1_amd64.deb
dpkg -i linux-image-4.0.0-2-amd64_4.0.5-1_amd64.deb
dpkg -i linux-headers-4.0.0-2-common_4.0.5-1_amd64.deb
dpkg -i linux-headers-4.0.0-2-amd64_4.0.5-1_amd64.deb

вторник, 16 июня 2015 г.

How to run netmap on ixgbe Virtual Function on Debian 8 Jessie

UPDATE: folks, please be AWARE! In this mode netmap working only in single-copy mode and will not work enough fast (1-2 mpps is a limit). For full support we need another patch for ixgbevf driver.

If you want zero copy support, please poke netmap guys here: https://github.com/luigirizzo/netmap/issues/63

First of all, install Netmap and ixgbe module: http://www.stableit.ru/2014/10/netmap-debian-7-wheezy-intel-82599.html

Please add this code to /etc/rc.local (please remove -e flag from /bin/sh in first line of /etc/rc.local):
insmod /usr/src/netmap/LINUX/netmap.ko
modprobe mdio
modprobe ptp
modprobe dca
insmod /usr/src/netmap/LINUX/ixgbe/ixgbe.ko max_vfs=2,2
insmod /lib/modules/3.16.0-4-amd64/kernel/drivers/net/ethernet/intel/ixgbevf/ixgbevf.ko
And reload server!

 Then, I have physical NIC eth6 which represented as 2 virtual NICs (ip link show):
8: eth6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 90:e2:ba:4a:d8:e8 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 90:e2:ba:55:aa:bb, spoof checking on, link-state auto
    vf 1 MAC 90:e2:ba:55:bb:cc, spoof checking on, link-state auto
15: eth10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 90:e2:ba:55:aa:bb brd ff:ff:ff:ff:ff:ff
16: eth11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 90:e2:ba:55:bb:cc brd ff:ff:ff:ff:ff:ff
Then, I configure MAC's for both virtual functions:
ip link set eth6 vf 0 mac 90:e2:ba:55:aa:bb
ip link set eth6 vf 1 mac 90:e2:ba:55:bb:cc
And reload ixgbevf driver (required for configuration):
rmmod ixgbevf
modprobe ixgbevf
After this operations all interfaces become ready:
ethtool eth6
Settings for eth6:
    Supported ports: [ FIBRE ]
    Supported link modes:   1000baseT/Full
                            10000baseT/Full
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Advertised link modes:  1000baseT/Full
                            10000baseT/Full
    Advertised pause frame use: Symmetric
    Advertised auto-negotiation: Yes
    Speed: 10000Mb/s
    Duplex: Full
    Port: FIBRE
    PHYAD: 0
    Transceiver: external
    Auto-negotiation: on
    Supports Wake-on: d
    Wake-on: d
    Current message level: 0x00000007 (7)
                   drv probe link
    Link detected: yes
[email protected] ~ # ethtool eth10
Settings for eth10:
    Supported ports: [ ]
    Supported link modes:   10000baseT/Full
    Supported pause frame use: No
    Supports auto-negotiation: No
    Advertised link modes:  Not reported
    Advertised pause frame use: No
    Advertised auto-negotiation: No
    Speed: 10000Mb/s
    Duplex: Full
    Port: Other
    PHYAD: 0
    Transceiver: Unknown!
    Auto-negotiation: off
    Current message level: 0x00000007 (7)
                   drv probe link
    Link detected: yes
[email protected] ~ # ethtool eth11
Settings for eth11:
    Supported ports: [ ]
    Supported link modes:   10000baseT/Full
    Supported pause frame use: No
    Supports auto-negotiation: No
    Advertised link modes:  Not reported
    Advertised pause frame use: No
    Advertised auto-negotiation: No
    Speed: 10000Mb/s
    Duplex: Full
    Port: Other
    PHYAD: 0
    Transceiver: Unknown!
    Auto-negotiation: off
    Current message level: 0x00000007 (7)
                   drv probe link
    Link detected: yes
And run pkt-get example app on ixgbe VF:
./pkt-gen -f rx -i netmap:eth10
361.732713 main [1651] interface is netmap:eth10
361.733491 extract_ip_range [288] range is 10.0.0.1:0 to 10.0.0.1:0
361.733506 extract_ip_range [288] range is 10.1.0.1:0 to 10.1.0.1:0
361.734155 main [1848] mapped 334980KB at 0x7f5f9804f000
Receiving from netmap:eth10: 1 queues, 1 threads and 1 cpus.
361.734182 main [1934] Wait 2 secs for phy reset
363.734288 main [1936] Ready...
363.734328 nm_open [456] overriding ifname eth10 ringid 0x0 flags 0x1
363.734384 receiver_body [1184] reading from netmap:eth10 fd 4 main_fd 3
364.735415 main_thread [1448] 3 pps (3 pkts in 1001042 usec)
365.736481 main_thread [1448] 1 pps (1 pkts in 1001065 usec)
366.737542 main_thread [1448] 1 pps (1 pkts in 1001061 usec)
^C367.134692 main_thread [1448] 0 pps (0 pkts in 397151 usec)


понедельник, 8 июня 2015 г.