insmod error in kernel module programming

February 24, 2020, 2:43 am

≫ Next: Perf on MIPS debug kernel, unable to enable frame_pointer

≪ Previous: How to support properly kernel suspend/resume feature in kernel driver?

I am just starting with modular programming.

Above are my two files:

hello.c

#include <linux/init.h>
#include <linux/module.h>

static int hello_init(void)
{
    printk(KERN_ALERT "TEST: Hello world\n");
    return 0;
}

static void hello_exit(void)
{
    printk(KERN_ALERT "TEST: Good Bye");
}

module_init(hello_init);
module_exit(hello_exit);

Makefile

obj-m += hello.o

KDIR = /usr/src/linux-headers-3.13.0-46-generic

all:
    $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules

clean:
    rm -rf *.o *.ko *.mod.* *.symvers *.order

And here's my terminal output showing error in insmod command, kindly help.

anubhav@anubhav-Inspiron-3421:~/Desktop/os$ make
make -C /usr/src/linux-headers-3.13.0-46-generic  SUBDIRS=/home/anubhav/Desktop/os modules
make[1]: Entering directory `/usr/src/linux-headers-3.13.0-46-generic'
Building modules, stage 2.
MODPOST 1 modules
make[1]: Leaving directory `/usr/src/linux-headers-3.13.0-46-generic'
anubhav@anubhav-Inspiron-3421:~/Desktop/os$ insmod hello.ko
insmod: ERROR: could not insert module hello.ko: Operation not permitted

↧

Perf on MIPS debug kernel, unable to enable frame_pointer

February 24, 2020, 4:00 am

≫ Next: Finding which process was killed by Linux OOM killer

≪ Previous: insmod error in kernel module programming

I am trying to use perf tool for MIPS & facing some trouble in getting back-stacks.

How can I enable FRAME_POINTER for mips ? I have DEBUG_KERNEL enabled, but it looks like -fno-omit-frame-pointer is not applicable for MIPS arch in kernel.

Does it mean frame pointer based stack unwinding with perf can't be achieved for MIPS ?

I don't see mips toolchain complaining about -fno-omit-frame-pointer flag

EDIT1

I am able to record perf events. Sample output for perf report

Report wouldn't help much without the stack unwinding.

# ./perf --version
perf version 5.6.rc2.gd04712cd3bd7
# uname -a
Linux localhost 3.14.28-1.19 #1 SMP Mon Feb 17 16:48:44 IST 2020 mips GNU/Linux

EDIT2

Perf features detected

Auto-detecting system features:
...                         dwarf: [ on  ]
...            dwarf_getlocations: [ on  ]
...                         glibc: [ on  ]
...                          gtk2: [ OFF ]
...                      libaudit: [ on  ]
...                        libbfd: [ OFF ]
...                        libcap: [ OFF ]
...                        libelf: [ on  ]
...                       libnuma: [ OFF ]
...        numa_num_possible_cpus: [ OFF ]
...                       libperl: [ OFF ]
...                     libpython: [ OFF ]
...                     libcrypto: [ OFF ]
...                     libunwind: [ OFF ]
...            libdw-dwarf-unwind: [ on  ]
...                          zlib: [ on  ]
...                          lzma: [ on  ]
...                     get_cpuid: [ OFF ]
...                           bpf: [ OFF ]
...                        libaio: [ on  ]
...                       libzstd: [ OFF ]
...        disassembler-four-args: [ OFF ]

EDIT3

I see feature test for libunwind had failed

cat linux-5.6-rc2/tools/build/feature/test-libunwind.make.output
/tmp/ccQnV5jZ.o: In function `main':
test-libunwind.c:(.text+0x1c): undefined reference to `_Umips_create_addr_space'
test-libunwind.c:(.text+0x4c): undefined reference to `_Umips_init_remote'
test-libunwind.c:(.text+0x70): undefined reference to `_Umips_dwarf_search_unwind_table'
collect2: error: ld returned 1 exit status

If I see the makefile for feature tests libunwind linking is not done for MIPS.

↧

Finding which process was killed by Linux OOM killer

February 24, 2020, 4:36 am

≫ Next: How to modify kernel DTB file

≪ Previous: Perf on MIPS debug kernel, unable to enable frame_pointer

When Linux runs out of memory (OOM), the OOM killer chooses a process to kill based on some heuristics (it's an interesting read: http://lwn.net/Articles/317814/).

How can one programmatically determine which processes have recently been killed by the OOM killer?

↧

How to modify kernel DTB file

February 24, 2020, 5:28 am

≫ Next: how to solve Kernel configuration is invalid issues

≪ Previous: Finding which process was killed by Linux OOM killer

Summary

I am currently compiling the Linux kernel (kernel, modules and DTB) with some custom drivers for a custom board. Occasionally I'll compile the kernel and realize that the compatibility string in the DTB file is not what the custom driver is looking for. Right now the only way i can remedy this is modify the DTS or kernel driver so the strings match and then recompile the kernel again. Is there are way I can just edit the DTB file to update the compatibility string?

Failed Attempts

I have been able to decompile the DTB file back to a DTS file using the command:

dtc -I dtb -o <filename>.dts -<filename>.dtb

However if I modify the DTS file and recompile using the command:

dtc -I dts -o <filename>.dtb -<filename>.dts

The kernel will not load the recompiled DTB file

↧

how to solve Kernel configuration is invalid issues

February 24, 2020, 6:31 am

≫ Next: Need help in separating Kernel Space from User Space using syscalls [closed]

≪ Previous: How to modify kernel DTB file

I'm trying to build module.

But here's some issues.

ERROR: Kernel configuration is invalid. include/generated/autoconf.h or include/config/auto.conf are missing. Run 'make oldconfig && make prepare' on kernel src to fix it.
WARNING: Symbol version dump ./Module.symvers is missing; modules will have no dependencies and modversions.`

And here's my makefile

ifeq ($(KERNELRELEASE),)


KERNELDIR ?= /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)

modules:
    $(MAKE) -C $(KERNELDIR) M=$(PWD) modules

modules_install:
    $(MAKE) -C $(KERNELDIR) M=$(PWD) modules_install

clean:
    rm -rf *.o *~ core .depend .*.cmd *.ko *.mod.c .tmp_versions

.PHONY: modules modules_install clean

else
    # called from kernel build system: just declare what our modules are
    obj-m := hello.o hellop.o seq.o jit.o jiq.o sleepy.o complete.o \
             silly.o faulty.o kdatasize.o kdataalign.o
endif

I tried building like this:

export KERNELDIR=/path/to/extern/linux/source
make

How can I solve this problem?

↧

Need help in separating Kernel Space from User Space using syscalls [closed]

February 24, 2020, 6:34 am

≫ Next: CAP_NET_ADMIN equivalent for *BSD

≪ Previous: how to solve Kernel configuration is invalid issues

This is my first question here, so please tell me if I make it too long or too vague. Thanks!!

I am working on a super-simple OS for an university project, and I want to separate Kernel Space from User Space. The only way I want my User Space to interact with hardware (for example, write things on the screen) is by using system calls. I have written the WRITE syscall so far, and I made it similar to the Linux-style (4 parameters: syscall ID, file descriptor, what to write, length of the thing I want to write).

I want to make a super-simple terminal program, which is located in my User Space (of course, because it is just a program). So I thought: OK, my terminal program is just gonna say "there you go Kernel, there is the thing I want to put on the screen, you take care of it". If I do that, then the screen driver that is located on my Kernel Space has to be coded like this "OK, I have been told to write something on the screen, so I will write it on the top". This way, everything that the terminal writes will be located on the top of the screen. Wonderful.

But what if now I want to make a second terminal program, but this time it writes things starting from the bottom? If I follow the Kernel/User separation, then the new terminal will "give the thing to write to the screen driver via syscall WRITE", but the screen driver will write it on the top!

So I thought: "OK, lets make the screen driver just write things where it is told", but if I do this, then the WRITE syscall that I made has no power to tell the where in the screen. I must write a new syscall called "write_things_in_specific_spot". But if I do this, then anybody could write a simple program using this new syscall to write the whole screen in black. It is just to much power for a program to have.

What sould I do to have the 2 terminals working, but keeping the separation of Kernel and User Space?

Note: I am actually working on top of the x64BareBones OS, so I am not aware of many things that happen deep below. In this project we just have to make 2 terminals that work (of course not at the same time).

Link to the x64BareBones OS: https://bitbucket.org/RowDaBoat/x64barebones/wiki/Home

↧

CAP_NET_ADMIN equivalent for *BSD

February 24, 2020, 4:01 pm

≫ Next: How to capture packet that dropped by kernel or OS? [closed]

≪ Previous: Need help in separating Kernel Space from User Space using syscalls [closed]

I'm contributing to a routing daemon, and investigating security measures. The daemon, when running, talks to the kernel and installs routes. On Linux, as a good practice, if the daemon is launched as root (and properly configured) it will quickly drop privileges and switch to an unprivileged user/group, but retain the CAP_NET_ADMINcapability.

I'm looking for a similar mechanism to use on popular BSDs (FreeBSD, OpenBSD, macOS).

It seems that Mandatory Access Control at least on FreeBSD could be the way to go; but I'm not sure. I'd appreciate pointers to code or documentation.

Thanks!

↧

How to capture packet that dropped by kernel or OS? [closed]

February 24, 2020, 7:35 pm

≫ Next: How to extract the TCP header of Incoming and Outgoing Packets using sk_buff? How can we create an LKM for the same?

≪ Previous: CAP_NET_ADMIN equivalent for *BSD

I have a udp server run p2p program. Each udp connection use a random port. when troubleshooting some bug, I saw large amount of dropped udp packet. By netstat -su , these's a lot of "packets to unknown port received" and "packet receive errors" packets, but I can't see the content. Is there any tool in linux can capture all the dropped packets?

↧

How to extract the TCP header of Incoming and Outgoing Packets using sk_buff? How can we create an LKM for the same?

February 24, 2020, 10:16 pm

≫ Next: Working of Raw Sockets in the Linux kernel

≪ Previous: How to capture packet that dropped by kernel or OS? [closed]

I want to extract the TCP header and print the Header details of both incoming as well as outgoing packets but I need to hook that module in the Linux Kernel. That is need to create an LKM for the same.

↧

Working of Raw Sockets in the Linux kernel

February 24, 2020, 11:16 pm

≫ Next: Mount device after bootup in isolinux or grub (Fedora CoreOS)

≪ Previous: How to extract the TCP header of Incoming and Outgoing Packets using sk_buff? How can we create an LKM for the same?

I'm working on integrating the traffic control layer of the linux kernel to a custom user-level network stack. I'm using raw sockets to do the same. My question is if we use raw sockets with AF_PACKET, RAW_SOCK, and IPPROTO_RAW, will the dev_queue_xmit (the function which is the starting point of the Queueing layer as far as I've read) be called? Or does the sockets interface directly call the network card driver?

↧

Mount device after bootup in isolinux or grub (Fedora CoreOS)

February 25, 2020, 1:29 am

≫ Next: Device Tree for PHY-less connection to a DSA switch

≪ Previous: Working of Raw Sockets in the Linux kernel

I have a isolinux.cfg and grub.cfg file in which I invoke a vmlinuz kernel. I already append some kernel parameters but I would like to mount my device /dev/sda1 on boot before the vmlinuz parameter invokes my installation.

In my case the kernel parameter: coreos.inst=yes will trigger the installation process to automatically start, but in beforehand the device should get mounted.

grub.cfg

menuentry 'Fedora CoreOS (Fast install)' --class fedora --class gnu-linux --class gnu --class os {
    linux /images/vmlinuz mitigations=auto,nosmt systemd.unified_cgroup_hierarchy=0 coreos.liveiso=fedora-coreos-31.20200113.3.1 rd.neednet=1 ip=dhcp ignition.firstboot ignition.platform.id=metal coreos.inst.ignition_url=/mnt/ignition.ign coreos.inst.install_dev=/dev/nvme0n1 coreos.inst.image_url=/mnt/fedora_coreos.raw.xz coreos.inst=yes coreos.inst.insecure
  initrd /images/initramfs.img
}

isolinux.cfg

label linux
  menu label ^Fedora CoreOS (Fast install)
  menu default
  kernel /images/vmlinuz
  append initrd=/images/initramfs.img mitigations=auto,nosmt systemd.unified_cgroup_hierarchy=0 coreos.liveiso=fedora-coreos-31.20200113.3.1 rd.neednet=1 ip=dhcp ignition.firstboot ignition.platform.id=metal coreos.inst.ignition_url=/mnt/ignition.ign coreos.inst.install_dev=/dev/nvme0n1 coreos.inst.image_url=/mnt/fedora_coreos.raw.xz coreos.inst=yes coreos.inst.insecure

So in the end I would like that /mnt/ is the mounted device: /dev/sda1

This most likely would mean to execute something like this: mount -t auto -v /dev/sdb1 /mnt/

Is this somehow possible to do as an appended option on the initrd or kernel? (Within those configuration files.)

If not, what would be the approach to do something similar without changing initramfs.img or vmlinuz file?

PS: It's not necessary that this approach works for both, legacy and efi boot. One would be enough, of course it would be nice if it's something that works in both cases.

Edit: It might be possible with the kernel parameter: systemd.* for example systemd.run=mount something like this.

↧

Device Tree for PHY-less connection to a DSA switch

February 25, 2020, 4:22 am

≫ Next: Difference between skb_header_pointer and skb_transport_header?

≪ Previous: Mount device after bootup in isolinux or grub (Fedora CoreOS)

We have a little problem with creating a device tree for our configuration of a Marvell DSA switch and a Xilinx Zynq processor. They are connected like this:

|——————————————|               |——————————————————————————————|
|     e000b000—|———— SGMII ————|—port6 (0x16)      port3 —— PHY3
| Zynq         |               |         mv88e6321            |
|     e000c000—|—x           x—|—port5             port4 —— PHY4
|——————————————|               |——————————————————————————————|
        |___________ MDIO _______________|

And we have a device tree for the Linux kernel, which looks like this:

ps7_ethernet_0: ps7-ethernet@e000b000 {
            #address-cells = <1>;
            #size-cells = <0>;
            clock-names = "ref_clk", "aper_clk";
            clocks = <&clkc 13>, <&clkc 30>;
            compatible = "xlnx,ps7-ethernet-1.00.a";
            interrupt-parent = <&ps7_scugic_0>;
            interrupts = <0 22 4>;
            local-mac-address = [00 0a 35 00 00 00];
            phy-handle = <&phy0>;
            phy-mode = "gmii";
            reg = <0xe000b000 0x1000>;
            xlnx,ptp-enet-clock = <0x69f6bcb>;
            xlnx,enet-reset = "";
            xlnx,eth-mode = <0x0>;
            xlnx,has-mdio = <0x1>;
            mdio_0: mdio {
                #address-cells = <1>;
                #size-cells = <0>;
                phy0: phy@16 {
                    compatiable = "marvell,dsa";
                    reg = <0x16>;
                } ;
            } ;

} ;

    dsa@0 {
            compatible = "marvell,dsa";

            #address-cells = <2>;
            #size-cells = <0>;

            interrupts = <10>;

            dsa,ethernet = <&ps7_ethernet_0>;
            dsa,mii-bus = <&mdio_0>;

            switch@0 {
                #address-cells = <1>;
                #size-cells = <0>;
                reg = <0 0>;
             port@3 {
                    reg = <3>;
                    label = "lan0";                
             };
             port@4 {
                    reg = <4>;
                    label = "lan1";
             };
             port@5 {
                    reg = <5>;
                    label = "lan2";
             };
             port@6 {
                    reg = <6>;
                    label = "cpu";       
             };
        };
        };     
} ;

The problem is, as you can see from the picture, there is no PHY attached to the port 6, i.e. the connection between the Zynq and the switch is PHY-less, but I had to specify <phy0> in the device tree to make the dsa driver to see the switch. But then it tries to talk to a non-existent PHY and fails, obviously.

So the question is: how to create a proper device tree for a dsa switch connected to a processor like this?

Thank you for any help!

(There is a somewhat similar question P1010 MAC to Switch port direct connection without PHY but I cannot comment on it and there is no answer, unfortunately)

↧

Difference between skb_header_pointer and skb_transport_header?

February 25, 2020, 4:28 am

≫ Next: What does the ! character do in ARM assembly?

≪ Previous: Device Tree for PHY-less connection to a DSA switch

I'm trying to implement a netfilter module, while processing sk_buff I found two possible ways to retrieve TCP header:

struct iphdr *ip_header = (struct iphdr *)skb_network_header(skb);
struct tcphdr *tcp_header = (struct tcphdr *)skb_transport_header(skb);

And

struct iphdr *ip_header = skb_header_pointer(skb, 0, sizeof(struct iphdr), &_iph)
struct tcphdr *tcp_header = skb_header_pointer(skb, ip_header->ihl * 4, sizeof(struct tcphdr), &_tcph);

Which one should I use?

↧

What does the ! character do in ARM assembly?

February 25, 2020, 5:32 am

≫ Next: Qradar directory access

≪ Previous: Difference between skb_header_pointer and skb_transport_header?

#include <stdio.h>
void fun();
int main()
{
        int a = 10;
        fun();
        return 0;

}
void fun()
{
    int a =  5;
}

Assembly code.

000103e4 <main>:
   103e4:       e52db008        str     fp, [sp, #-8]!
   103e8:       e58de004        str     lr, [sp, #4]
   103ec:       e28db004        add     fp, sp, #4
   103f0:       e24dd008        sub     sp, sp, #8
   103f4:       e3a0300a        mov     r3, #10
   103f8:       e50b3008        str     r3, [fp, #-8]
   103fc:       eb000005        bl      10418 <fun>
   10400:       e3a03000        mov     r3, #0
   10404:       e1a00003        mov     r0, r3
   10408:       e24bd004        sub     sp, fp, #4
   1040c:       e59db000        ldr     fp, [sp]
   10410:       e28dd004        add     sp, sp, #4
   10414:       e49df004        pop     {pc}            ; (ldr pc, [sp], #4)

00010418 <fun>:
   10418:       e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
   1041c:       e28db000        add     fp, sp, #0
   10420:       e24dd00c        sub     sp, sp, #12
   10424:       e3a03005        mov     r3, #5
   10428:       e50b3008        str     r3, [fp, #-8]
   1042c:       e1a00000        nop                     ; (mov r0, r0)
   10430:       e24bd000        sub     sp, fp, #0
   10434:       e49db004        pop     {fp}            ; (ldr fp, [sp], #4)
   10438:       e12fff1e        bx      lr

In a above assembly code 103e4: e52db008 str fp, [sp, #-8]!

I am new to assembly language. why '!' has been used what is the purpose.

↧

Qradar directory access

February 25, 2020, 6:31 am

≫ Next: Can rcu_assign_pointer() be used between rcu_read_lock() and rcu_read_unlock()?

≪ Previous: What does the ! character do in ARM assembly?

I want to access the folder /store/ariel/events/payloads/ in the Qradar directories from the App editor. I am trying the os.path.exists however it returns false however, the folder exists as well as the path is located if I run the script in the linux kernel of the Qradar. I would really appreciate if anyone can guide me on how to access the directories from the Qradar App Editor.

↧

Can rcu_assign_pointer() be used between rcu_read_lock() and rcu_read_unlock()?

February 25, 2020, 7:00 am

≫ Next: How to use PERF_SAMPLE_READ with mmap

≪ Previous: Qradar directory access

At begin, I have one CPU core to be a writer to write shared data and one core to be reader to read shared data.
I need reader to write back some data to share data.
I know that rcu_read_lock()/rcu_read_unlock() are used for reader to get shared data. But I'm not sure reader write back to share data will cause any problem?

In reader:

 rcu_read_lock();
 //get shared data
 //modify the data 
 rcu_assign_pointer(ptr1, ptr2)
 rcu_read_unlock();

Is this code valid?

↧

How to use PERF_SAMPLE_READ with mmap

February 25, 2020, 7:55 am

≫ Next: I have a cpu cache coherency-looking problem that I can't figure out how to fix. Two cpus see different contents of the same memory

≪ Previous: Can rcu_assign_pointer() be used between rcu_read_lock() and rcu_read_unlock()?

This question is related to the perf_event_open syscall, but there is no tag for it.

I'm currently looking to use the PERF_SAMPLE_READ member of the enum perf_event_sample_format to retreive some data from a memory map, but for an unknown reason, the syscall itself return "invalid argument" (errno 22).

I have the following configuration :

this->eventConfiguration.sample_freq = 11;
this->eventConfiguration.freq = true;
this->eventConfiguration.inherit = true;
this->eventConfiguration.sample_type = PERF_SAMPLE_CPU | PERF_SAMPLE_TIME | PERF_SAMPLE_PERIOD /*| PERF_SAMPLE_READ*/;

The event I'm tracking is PERF_COUNT_HW_CPU_CYCLES.

There is my syscall. I spy each core of my computer :

int fileDescriptor = syscall(__NR_perf_event_open, this->configuration.getEventConfiguration() , -1, i, -1, 0);

The handling of the error is shown below, but I don't think it's useful...

if(fileDescriptor < 0) {
  switch(errno) {
    // here is some cases
  };
}

Thanks in advance ! :-)

↧

I have a cpu cache coherency-looking problem that I can't figure out how to fix. Two cpus see different contents of the same memory

February 25, 2020, 6:17 pm

≫ Next: How to determine number of processor sockets on Linux ppc64le

≪ Previous: How to use PERF_SAMPLE_READ with mmap

I have a really weird problem I can't figure out, I haven't seen anything this unexplainable in my 30+ years of programming. Clearly I'm doing something wrong, but can't figure out what, and I can't even figure out a way around it.

I have a linux kernel module I've written that implements a block device. It calls out to userspace to supply the data for the block device via ioctl (as in the userspace program calls the kernel module via an ioctl to get block device requests)

Some technical information on the machines I'm testing on in case it matters:

It runs flawlessly on an intel core2 i7 somethingoroother.

> cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 58
model name      : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
stepping        : 9
microcode       : 0x21
cpu MHz         : 1798.762
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts md_clear flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 7139.44
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor 1-7 are the same

It runs flawlessly on a raspberry pi 0

> cat /proc/cpuinfo 
processor       : 0
model name      : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS        : 997.08
Features        : half thumb fastmult vfp edsp java tls 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xb76
CPU revision    : 7

Hardware        : BCM2835
Revision        : 920093
Serial          : 000000002d5dfda3

It runs flawlessly on a raspberry pi 3

> cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 4 (v7l)
BogoMIPS        : 38.40
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 1-3 are the same

Hardware        : BCM2835
Revision        : a02082
Serial          : 00000000e8f06b5e
Model           : Raspberry Pi 3 Model B Rev 1.2

But on my raspberry pi 4, it does something really weird that I can't explain, am really stumped about and I don't know how to fix.

> cat /proc/cpuinfo 
processor       : 0
model name      : ARMv7 Processor rev 3 (v7l)
BogoMIPS        : 270.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

Hardware        : BCM2835
Revision        : c03111
Serial          : 10000000b970c9df
Model           : Raspberry Pi 4 Model B Rev 1.1

processor       : 1-3 are the same

So I'm asking for help from somebody who knows more about cpus, multithreading, cache coherency and memory barriers than I do. Maybe I'm barking up the wrong tree and you can tell me that, if that's the case. I'm pretty sure the program is okay, I've written lots of complicated multithreaded programs in my life. I've checked it over lots of times, had other people review it as well. This is the first multithreaded kernel module I've written though, so that's where I'm in new territory.

Here's what's going on:

I register a callback with blk_queue_make_request() that handles the read and write requests, I drop all the other ones, returning error (but I never actually get anything but read/write)

    log_kern_debug("bio operation is not read or write: %d", operation);
    bio->bi_status = BLK_STS_MEDIUM; 
    return BLK_QC_T_NONE;

I get the callback from the kernel, I iterate through the segments in the bio. For each segment, I make a request to the userspace application (in another thread) to service the read and write requests. (I'll explain how that works in a minute) and then the original requesting thread goes to sleep. When userspace returns with the data (for read) or success/failure (for write) it hands over the data, wakes up the original requesting thread, and then the original requesting thread returns the bio to the kernel, when all the segments have been serviced:

    bio_endio(bio); // complete the bio, the kernel does the followup callback to the next guy in the chain who wants this bio
    return BLK_QC_T_NONE;

The way the call to userspace works is this: first, the userspace program makes an ioctl call to the kernel module and the kernel module blocks. That thread stays blocked until a request comes in for the block device. The information about the request (read/write, start position, length, etc) gets copied to a userspace-provided buffer with copy_to_user and then the ioctl call is unblocked and returns. Userspace gets the request from the return of the ioctl, does the read or write then makes another ioctl call to the kernel module with the results of the request, and then wakes up the original requesting thread, so it can return the result in the make_request callback, and then the userspace ioctl blocks again waiting for the next request to come in.

So here's the problem. On the raspberry pi 4 only, every once in a while, not all the time, the contents of the memory passed between the two threads doesn't end up looking the same from both thread's point of view. As in when the data gets passed from the userspace-side thread to the original requesting thread (for a read request in this example), the hash of the data (at THE SAME LOCATION IN MEMORY!) is different. I assume this is a cpu cache coherency type-problem, except that I put in calls to mb(), smp_mb() and READ_ONCE() and WRITE_ONCE() and I even tried just plain old sleeping to give the original calling thread's cpu time to notice. It will reliably fail, but not all the time. I don't have any other raspberry pi 4's to test with, but I'm pretty sure the machine is fine because everything else works great. It's something I'm not doing right, but I don't know what.

What follows is a grep of the kern.log and an explanation showing what's going on. Every request going to userspace gets a transaction id. the start pos is the location in the block device to read from or write to. the length is the length of the bio segment to read/write, the crc32 column is the crc32 of the data in the bio segment buffer, (for the length listed, always 4k). The address column is the address of the bio segment buffer the data read from userspace is copied into (that the crc32 comes from) which always the same for a given transaction, and the last column is current->tid.

oper    trans id start pos        length           crc32            address  thread
write:  00000a2d 000000000001d000 0000000000001000 0000000010e5cad0          27240

read0:  00000b40 000000000001d000 0000000000001000 000000009b5eeca2 88314387 31415
read1:  00000b40 000000000001d000 0000000000001000 0000000010e5cad0 88314387 31392
read2:  00000b40 000000000001d000 0000000000001000 0000000010e5cad0 88314387 31415
readx:  00000b40 000000000001d000 0000000000001000 0000000010e5cad0 88314387 31392
read3:  00000b40 000000000001d000 0000000000001000 0000000010e5cad0 88314387 31415

read0:  00000c49 000000000001d000 0000000000001000 000000009b5eeca2 88314387 31417
read1:  00000c49 000000000001d000 0000000000001000 0000000010e5cad0 88314387 31392
read2:  00000c49 000000000001d000 0000000000001000 000000009b5eeca2 88314387 31417
readx:  00000c49 000000000001d000 0000000000001000 0000000010e5cad0 88314387 31392
read3:  00000c49 000000000001d000 0000000000001000 000000009b5eeca2 88314387 31417

read0:  00000d4f 000000000001d000 0000000000001000 000000009b5eeca2 88314387 31419
read1:  00000d4f 000000000001d000 0000000000001000 0000000010e5cad0 88314387 31392
read2:  00000d4f 000000000001d000 0000000000001000 0000000010e5cad0 88314387 31419
readx:  00000d4f 000000000001d000 0000000000001000 0000000010e5cad0 88314387 31392
read3:  00000d4f 000000000001d000 0000000000001000 0000000010e5cad0 88314387 31419

read0:  00000e53 000000000001d000 0000000000001000 000000009b5eeca2 1c6fcd65 31422
read1:  00000e53 000000000001d000 0000000000001000 0000000010e5cad0 1c6fcd65 31392
read2:  00000e53 000000000001d000 0000000000001000 0000000010e5cad0 1c6fcd65 31422
readx:  00000e53 000000000001d000 0000000000001000 0000000010e5cad0 1c6fcd65 31392
read3:  00000e53 000000000001d000 0000000000001000 0000000010e5cad0 1c6fcd65 31422

So the steps in the process are are follows, let's look at the first transaction, id b40 because that one worked correctly. Then we'll look at the second one c49 that didn't work. Transaction ids always increase, the logs above are in chronological order.

1) First the write comes in (trans id a2d) The crc32 of the data written is 10e5cad0. That's the crc32 we expect to see on all reads afterwards until the next write.

2) a read request comes in to the blk_queue_make_request callback handler on thread 31415. At this point I log ("read0") the crc32 of the contents of the bio segment buffer before it is written to, so I can see the before-it-changes value of the bio segment buffer at 88314387.

3) I call copy_to_user the information about the read request. return from the ioctl, userspace processes it, does an ioctl back into the kernel module with the resulting data and that data is copy_from_user()ed to the bio segment buffer (at 88314387). It logs ("read1") the crc32 of the bio segment buffer from userspace thread 31392's point of view. It is the expected 10e5cad0.

4) userspace wakes up the original requesting thread id 31415 now that the data is in the bio segment buffer at 88314387. thread 31415 calculates the crc32 again and logs ("read2") the the value it sees from 31415's point of view. Again as expected it is 10e5cad0.

5) For extra sanity checking (the reason for which will become clear in the next transaction) the userspace thread 31392 does the crc of the bio buffer at 8831487 again, and comes up with the expected value 10e5cad0 and logs it ("readx"). There is no reason it should change, nobody is updating it, it still says 10e5cad0.

6) as a final extra sanity check, the original requesting thread 31415 sleeps for 2ms, and calculates the crc32 again and logs it ("read3"). Everything worked, all is well.

Now let's look at the next transaction id c49. This is a case where the filesystem requested to read the same block twice. I forced this in my test with echo 3 > /proc/sys/vm/drop_caches. I'll start counting steps at 2 so the steps line up from the first example.

2) a read request comes in to the blk_queue_make_request callback handler on thread 31417. At this point I log ("read0") the crc32 of the contents of the bio segment buffer before we write to it. This is the same bio segment buffer from first transaction b40 (memory location 88314387), but apparently it's been written over since we last set it, and that's fine. It also seems to have been set to the same value as it was at the start of transaction b47, the crc32 value is 9b5eeca2. That's fine. We know the inital crc32 value of that bio segment buffer from thread id 31417's point of view before anybody writes to the buffer.

4) userspace wakes up the original requesting thread id 31417 now that the data SHOULD BE in the bio segment buffer at 88314387. Thread 31417 calculates the crc32 again and logs ("read2") the the value it sees from its (thread 31417's) point of view. But this time, the value is not the expected value 10e5cad0. Instead it is the same value (9b5eeca2) as it was before the request was sent to userspace to update the buffer. It is as if userspace didn't write to the buffer. But it did, because we read it, calculated the crc32 value and logged it in the userspace-side thread 31392. Same memory location, different thread, different perception of the contents of the bio segment buffer at 88314387. Different thread, presumably different cpu, and therefore different cpu cache. Even if I was screwing up the thread blocking and waking up the logs show the order of events, that one thread read the correct value after the other thread misread it.

5) Again the extra sanity checking, the userspace thread 31392 does the crc of the same bio buffer at 8831487 again, gets the same correct value of 10e5cad0 ("readx"). The logs are chronological, so thread id 31392 sees the correct value, after thread id 31417 saw the wrong value. Thread id 31392 comes up with the expected value 10e5cad0 and logs it ("readx").

6) as a final extra sanity check, the original requesting thread 31417 sleeps for 2ms, and calculates the crc32 again and logs it ("read3"), and it still sees the incorrect value 9b5eeca2.

Of the four read transaction I logged above, 1, 3 and 4 worked, and 2 did not. So I figured out, okay, it must be a cache coherency problem. But I added mb() and smp_mb() calls after read1 and before read2, and nothing changed.

I am stumped. I've read the linux kernel memory barrier page

https://www.kernel.org/doc/Documentation/memory-barriers.txt

a number of times, and I figured smp_mb() should just fix everything, but it does not.

I have no idea how to fix this. I can't even think of a lousy workaround. I set the contents of a memory location, and the other thread just doesn't see it. What do I do?

Help? Thanks.

↧

How to determine number of processor sockets on Linux ppc64le

February 25, 2020, 8:43 pm

≫ Next: I2C Connection BPi and Arduino

≪ Previous: I have a cpu cache coherency-looking problem that I can't figure out how to fix. Two cpus see different contents of the same memory

There seems to be a bug with lscpu where it can not determine the correct number of sockets. There is an issue opened for this but I haven't got any response https://github.com/karelzak/util-linux/issues/698. This is my output:

Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                256
On-line CPU(s) list:   0-255
Thread(s) per core:    8
Core(s) per socket:    1
Socket(s):             32
NUMA node(s):          5
Model:                 IBM,9119-MHE
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0-255
NUMA node4 CPU(s):
NUMA node5 CPU(s):
NUMA node6 CPU(s):
NUMA node7 CPU(s):

Is there another way to go about getting the number of sockets?

↧

I2C Connection BPi and Arduino

February 25, 2020, 10:10 pm

≫ Next: Unexpected System failure of CentOS7 due to LVM partition not mounted

≪ Previous: How to determine number of processor sockets on Linux ppc64le

recently I have an on-going project on IoT involving communication between Arduino Devices with a berry board called Banana Pi. I noticed there are lots of similar topics out there but none of them discuss specifically how to set up the i2c driver and pins in Banana Pi since it is not a straight forward to install the library and start coding.

Here is my setup:

Banana Pi
1. I installed Linux bananapi 3.4.104-bananian #1 SMP PREEMPT Mon Apr 6 18:25:40 UTC 2015 armv7l GNU/Linux on the Pi itself
2. I installed required libraries and packages apt-get install python-smbus python-dev i2c-tools
3. I followed other setup requirements (adding i2c-dev to /etc/modules, some dtparams to /boot/config albeit Linux bananapi doesn't initially have that file like the raspberry counterpart) from https://radiostud.io/howto-i2c-communication-rpi/
4. Unlike its Raspbian counterpart, Linux BananaPi doesn't have option to activate I2C device from sudo raspi-config interfacing option. Therefore I skip this step.
5. I coded simple implementation to display the message I received from Arduino

import smbus
from time import sleep

def main():
    SLAVE_ADDRESS = 0x04
    DATA_LENGTH = 4

    I2CBus = smbus.SMBus(1)
    print('initiated i2c connection as master')

    while True:
        try:
            message = I2CBus.read_byte_data(SLAVE_ADDRESS, 4)
            print(message)
            sleep(1)
        except KeyboardInterrupt:
            break

if __name__ == '__main__':
    main()

**Question: Every time I run the above code it returns: **

Traceback (most recent call last):
  File "index.py", line 20, in <module>
    main()
  File "index.py", line 13, in main
    message = I2CBus.read_byte_data(SLAVE_ADDRESS, 4)
IOError: [Errno 70] Communication error on send

which makes me think something is not right on step no 2 and 3 above. Some folks ask to modify the kernel as stated here http://forum.banana-pi.org/t/has-anyone-added-an-rtc/5004/5. However, the steps and their confidence are not clear with some other folks saying the GPIO pin to become unusable upon modifying the kernel, which I won't risk wasting time reflashing the memory back to its original OS, etc. to gain the GPIO functionality back.

Arduino
I use Arduino Due as the slave device to Banana Pi in this I2C communication. Cabling is simply connecting Arduino SDA to Pi's SDA and Arduino SCL to PI's SCL (both devices operate at 3.3V) with grounds connected to each other as well. Did some coding on the Arduino side again following https://radiostud.io/howto-i2c-communication-rpi/ with a simple modification to send everything it receives from the serial monitor to the BPi. But since the error showed above I don't think that matters in this question

Your help is very appreciated due to the lack of proper resource

↧