When is `struct file*` is created for open file in kernel space? [duplicate]

April 12, 2020, 3:25 pm

≫ Next: gcc: error: unrecognized argument in option ‘-mabi=aapcs-linux’

≪ Previous: What is the purpose of the /usr/src/linux/drivers directory?

I am learning Linux Kernel mostly reading LDD3. And I am a bit confused. That said (ch3, p.53):

The file structure represents an open file. (It is not specific to device drivers; every open file in the system has an associated struct file in kernel space.) It is created by the kernel on open and is passed to any function that operates on the file, until the last close. After all instances of the file are closed, the kernel releases the data structure.

On the other hand, earlier one can read about open file operation (ch3, p.51) :

Though this is always the first operation performed on the device file, the driver is not required to declare a corresponding method. If this entry is NULL , opening the device always succeeds, but your driver isn’t notified.

So if I don't provide this file operation then when the file corresponding to my open file is created? For example, I have written a small example which produces a file in the /proc filesystem, but has not provided the open file operation --- only the read operation:

static struct file_operations this_proc_ops = {    .owner = THIS_MODULE,    .read = this_proc_read};

and I can open it and read it successfully. I imagine that it wouldn't be possible without corresponding file in kernel-space, would it? So my question is when the struct file is created in this case?

↧

gcc: error: unrecognized argument in option ‘-mabi=aapcs-linux’

April 12, 2020, 4:02 pm

≫ Next: how to prevent mouse input event in linux use c++

≪ Previous: When is `struct file*` is created for open file in kernel space? [duplicate]

first I install cross-tool, then I build the toolchain, then I want to build the kernel but While compiling kernel 2.6.34 for RAM, repeatedly got the error

    root@kali:~/felabs/sysdev/tinysystem/linux-2.6.34# make ARCH=arm CROSS-COMPILE=arm-linux-scripts/kconfig/conf -s arch/arm/Kconfig  CHK     include/linux/version.h  UPD     include/linux/version.h  CHK     include/generated/utsrelease.h  UPD     include/generated/utsrelease.h  Generating include/generated/mach-types.h  CC      kernel/bounds.sgcc: error: unrecognized argument in option ‘-mabi=aapcs-linux’gcc: note: valid arguments to ‘-mabi=’ are: ms sysvgcc: error: unrecognized command line option ‘-mlittle-endian’gcc: error: unrecognized command line option ‘-mapcs’gcc: error: unrecognized command line option ‘-mno-sched-prolog’gcc: error: unrecognized command line option ‘-mno-thumb-interwork’/root/felabs/sysdev/tinysystem/linux-2.6.34/./Kbuild:35: recipe for target 'kernel/bounds.s' failedmake[1]: *** [kernel/bounds.s] Error 1Makefile:986: recipe for target 'prepare0' failedmake: *** [prepare0] Error 2

↧

how to prevent mouse input event in linux use c++

April 13, 2020, 12:30 am

≫ Next: Round and curly bracket block of code in C

≪ Previous: gcc: error: unrecognized argument in option ‘-mabi=aapcs-linux’

Is there any way to prevent part of mouse event on linux system using c++?

I want implement a function that like: turn a switch on ,then the whole system disable mouse right click. or other switch on ,then disable mouse wheel event.

after some search,i found document of Linux Input Subsystem,use that user space api,i can read mouse input event,but that event still send to application. and i can not find any API that can cancel specific mouse event.

Is there some API can do that?or i need write a kernel model?

Thanks!

↧

Round and curly bracket block of code in C

April 13, 2020, 3:45 am

≫ Next: how to fix problem with initialization from incompatible pointer type

≪ Previous: how to prevent mouse input event in linux use c++

Can anyone explain what this macro evaluates to:

#define memcpy(dest,src,n) ({ \void * _res = dest; \__asm__ ("cld;rep;movsb" \    ::"D" ((long)(_res)),"S" ((long)(src)),"c" ((long) (n)) \    :"di","si","cx"); \_res; \})

This is taken from the first version of Linux kernel, but I am wondering what does a block of code surrounded by this ({ }) represent and where would it be used?

↧

how to fix problem with initialization from incompatible pointer type

April 13, 2020, 6:08 am

≫ Next: add-on creation in suse linux

≪ Previous: Round and curly bracket block of code in C

I'm trying to create a kernel module, which creates a subdirectory in the /proc directory, and contains a file that can be written and read from user space.

But always when I compile the module I get the same errors:

error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types].read = read_proc,error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types].write = write_proc,error: dereferencing pointer to incomplete type ‘struct proc_dir_entry’     Our_Proc_File->read_proc = read_proc;

Code:

#define __KERNEL__#define MODULE#include<linux/seq_file.h> #include <linux/module.h>#include <linux/proc_fs.h>#include <linux/string.h>#include <linux/init.h>#include <linux/kernel.h>   #include <asm/uaccess.h>MODULE_LICENSE("GPL");MODULE_AUTHOR("1");#define MAXBUFFSIZE 256static char mm_buff[MAXBUFFSIZE];static struct proc_dir_entry* Our_Proc_File;static struct proc_dir_entry* Our_Proc_Dir;static ssize_t read_proc(  char* buffer,  char** buffer_location, off_t offset,        int buffer_length,  int* eof, void* data){        int len =0;        static int count = 1;        if(offset > 0){                *eof = 1;                return len;        }        len = sprintf(buffer,"[%d] %s\n",count++,mm_buff);        return len;}static ssize_t write_proc( struct file* file,const char* buffer, unsigned long count,            void *data){        if(count < (MAXBUFFSIZE-1)){                strncpy(mm_buff,buffer,count);                mm_buff[count] = '\0';                printk("Buffer: %s\n",mm_buff);        }        return count;}static struct file_operations myops ={    .owner = THIS_MODULE,    .read = read_proc,    .write = write_proc,};static int __init simple_init(void){        printk("MyModule Loaded Successfully\n");        Our_Proc_Dir = proc_mkdir("Colours",NULL);        if(IS_ERR(Our_Proc_Dir)){                printk("Failed to create directory\n");                return -1;        }        Our_Proc_File = proc_create("Orange",0644,Our_Proc_Dir,&myops);       if(IS_ERR(Our_Proc_File)){                proc_remove(Our_Proc_Dir);                return -1;        }        Our_Proc_File->read_proc = read_proc;        Our_Proc_File->write_proc =write_proc;        return 0;}static void __exit fun(void){        if(Our_Proc_File) proc_remove(Our_Proc_File); if(Our_Proc_Dir) proc_remove(Our_Proc_Dir);        printk("MyModule Exit!\n");}module_init(simple_init);module_exit(fun);

↧

add-on creation in suse linux

April 13, 2020, 6:59 am

≫ Next: How to use QEMU's deterministic record and replay feature for a Linux kernel boot?

≪ Previous: how to fix problem with initialization from incompatible pointer type

I have downloaded a new kernel version(4.14) and built with Suse Linux, so kernel.rpm files gets generated. But I want to repack this newly built kernel with existing suse-linux.iso image. So, for this purpose I am using Mksusecd command:-

sudo mksusecd --create new_ker.iso  --addon kernel-4.14.175_94.41_default.x86_64.rpm -- SLE-12-SP4-Desktop-DVD-x86_64-GM-DVD1.iso

This command is creating an image file that has a new kernel as an add-on. While installation it is showing added add-on but it is not installing the new kernel. Installation proceeded with Old existing kernel in the ISO image.

Can anyone guide me on how I can add a newly built kernel to the existing ISO image file so that I can install the newly built kernel during the installation process of Suse-linux.

↧

How to use QEMU's deterministic record and replay feature for a Linux kernel boot?

April 13, 2020, 10:00 am

≫ Next: Kernel compilation warning: objtool: ioctl.isra.11()+0x268: unsupported intra-function call

≪ Previous: add-on creation in suse linux

QEMU supports deterministic record and replay as documented at: https://github.com/qemu/qemu/blob/v2.9.0/docs/replay.txt

However, I could not get replay working for a full Linux kernel boot: it always hangs at some point.

These are the commands I'm running:

#!/usr/bin/env bashcmd="\time \./buildroot/output.x86_64~/host/usr/bin/qemu-system-x86_64 \-M pc \-append 'root=/dev/sda console=ttyS0 nokaslr printk.time=y - lkmc_eval=\"/rand_check.out;wget -S google.com;/poweroff.out;\"' \-kernel './buildroot/output.x86_64~/images/bzImage' \-nographic \\-drive file=./buildroot/output.x86_64~/images/rootfs.ext2,if=none,id=img-direct,format=raw \-drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \-device ide-hd,drive=img-blkreplay \\-netdev user,id=net1 \-device rtl8139,netdev=net1 \-object filter-replay,id=replay,netdev=net1 \"echo "$cmd"eval "$cmd -icount 'shift=7,rr=record,rrfile=replay.bin'"# Different than previous.eval "$cmd -icount 'shift=7,rr=record,rrfile=replay.bin'"# Same as previous.eval "$cmd -icount 'shift=7,rr=replay,rrfile=replay.bin'"

and my kernel and root filesystem were generated with this Buildroot setup: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/0a1a600d49d1292be82a47cfde6f0355996478f0 which uses QEMU v2.9.0.

lkmc_eval gets evaled by my init scripts. Here we print userspace stuff that is usually random to check that we are actually deterministic, and then power off the machine.

How I came up with those commands:

start from the working command I used in my repo without record replay
copy paste the hard disk and networking parts from the wiki: https://wiki.qemu.org/Features/record-replay

The in-tree docs say there is no networking support, but the wiki and git log says they were added as of v2.9.0, so I think the docs are just outdated compared to the wiki.

Using that setup, the boot replay progresses quite far, but hangs at the message:

[   31.692427] NET: Registered protocol family 17

In the initial record, the next message would have been:

[   31.777326] sd 1:0:0:0: [sda] Attached SCSI disk

so I'm suspicious that it is a block device matter.

The timestamps are however identical, so I'm confident that the record and replay has worked so far.

If for the networking I use just:

-net none

then the record itself hangs at:

[   19.669685] ALSA device list:[   19.670756]   No soundcards found.

If anyone wants to try a QEMU patch against it, just checkout to your patch inside /qemu/ and run:

./build -t host-qemu-reconfigure

to rebuild.

↧

Kernel compilation warning: objtool: ioctl.isra.11()+0x268: unsupported intra-function call

April 13, 2020, 12:52 pm

≫ Next: C unused pointer parameter

≪ Previous: How to use QEMU's deterministic record and replay feature for a Linux kernel boot?

When I compile my kernel project, I get this warning:

warning: objtool: ioctl.isra.11()+0x268: unsupported intra-function call

What does is means?

↧

C unused pointer parameter

April 13, 2020, 12:56 pm

≫ Next: How do the likely/unlikely macros in the Linux kernel work and what is their benefit?

≪ Previous: Kernel compilation warning: objtool: ioctl.isra.11()+0x268: unsupported intra-function call

I have found the following function definition in Linux source:

static int __ref kernel_init(void *unused){    int ret;    kernel_init_freeable();    /* need to finish all async __init code before freeing the memory */    async_synchronize_full();    ftrace_free_init_mem();    free_initmem();    mark_readonly();    /*     * Kernel mappings are now finalized - update the userspace page-table     * to finalize PTI.     */    pti_finalize();    system_state = SYSTEM_RUNNING;    numa_default_policy();    rcu_end_inkernel_boot();    if (ramdisk_execute_command) {        ret = run_init_process(ramdisk_execute_command);        if (!ret)            return 0;        pr_err("Failed to execute %s (error %d)\n",               ramdisk_execute_command, ret);    }    /*     * We try each of these until one succeeds.     *     * The Bourne shell can be used instead of init if we are     * trying to recover a really broken machine.     */    if (execute_command) {        ret = run_init_process(execute_command);        if (!ret)            return 0;        panic("Requested init %s failed (error %d).",              execute_command, ret);    }    if (!try_to_run_init_process("/sbin/init") ||        !try_to_run_init_process("/etc/init") ||        !try_to_run_init_process("/bin/init") ||        !try_to_run_init_process("/bin/sh"))        return 0;    panic("No working init found.  Try passing init= option to kernel. ""See Linux Documentation/admin-guide/init.rst for guidance.");}

My question concerns the unused argument to the function. I have seen some other questions where one can use the GCC attribute specifier to declare the argument as unused and some other techniques, but this one looks like it would generate some weird compiler warnings since I see no usage of any suppression techniques here. Does anyone know what is the use of this argument here?

↧

How do the likely/unlikely macros in the Linux kernel work and what is their benefit?

April 13, 2020, 1:33 pm

≫ Next: How IOMMU unmaps the IOVA comming from different pheripherals through DMA

≪ Previous: C unused pointer parameter

I've been digging through some parts of the Linux kernel, and found calls like this:

if (unlikely(fd < 0)){    /* Do something */}

if (likely(!err)){    /* Do something */}

I've found the definition of them:

#define likely(x)       __builtin_expect((x),1)#define unlikely(x)     __builtin_expect((x),0)

I know that they are for optimization, but how do they work? And how much performance/size decrease can be expected from using them? And is it worth the hassle (and losing the portability probably) at least in bottleneck code (in userspace, of course).

↧

How IOMMU unmaps the IOVA comming from different pheripherals through DMA

April 13, 2020, 2:17 pm

≫ Next: How to know linux scheduler time slice?

≪ Previous: How do the likely/unlikely macros in the Linux kernel work and what is their benefit?

I have been trying to get the information on this for so long and still haven't got anything solid. So, what I have learned so far is that the IOMMU converts the IOVA provided by the DMA to the physical address and reads or writes from/to the memory. My questions are as follows:

1) Does IOMMU store different Memory map for every single device? Does each device see the address range starting from zero in their virtual address space

2) Where are these IOMMU memory maps are stored?

3) How does IOMMU know about which device the request is coming from if every device sees the virtual address starting from zero in their virtual address space?

4) Does the device also transmit some kind of Device specific ID or something which IOMMU recognizes and uses this to unmap the IOVA and protect the other memory addresses being seen or written by this device?

↧

How to know linux scheduler time slice?

April 13, 2020, 2:38 pm

≫ Next: How to properly disable SMAP from a linux module?

≪ Previous: How IOMMU unmaps the IOVA comming from different pheripherals through DMA

I'm looking for the value of the time slice (or quantum) of my Linux kernel.

Specific Questions:

Is there a /proc file which expose such an information ?
(Or) Is it well-defined in the Linux header of my distributions ?
(Or) Is there a C function of the Linux API (maybe sysinfo) that expose this value ?

↧

How to properly disable SMAP from a linux module?

April 13, 2020, 4:19 pm

≫ Next: Why doesn't TCP timer use min-heap

≪ Previous: How to know linux scheduler time slice?

I am following a tutorial from here.

I have the following code:

#include <linux/init.h>           // Macros used to mark up functions e.g. __init __exit#include <linux/module.h>         // Core header for loading LKMs into the kernel#include <linux/device.h>         // Header to support the kernel Driver Model#include <linux/kernel.h>         // Contains types, macros, functions for the kernel#include <linux/fs.h>             // Header for the Linux file system support#include <linux/uaccess.h>          // Required for the copy to user function#define  DEVICE_NAME "ebbchar"    ///< The device will appear at /dev/ebbchar using this value#define  CLASS_NAME  "ebb"        ///< The device class -- this is a character device driverMODULE_LICENSE("GPL");            ///< The license type -- this affects available functionalityMODULE_AUTHOR("Derek Molloy");    ///< The author -- visible when you use modinfoMODULE_DESCRIPTION("A simple Linux char driver for the BBB");  ///< The description -- see modinfoMODULE_VERSION("0.1");            ///< A version number to inform usersstatic int    majorNumber;                  ///< Stores the device number -- determined automaticallystatic char   message[256] = {0};           ///< Memory for the string that is passed from userspacestatic short  size_of_message;              ///< Used to remember the size of the string storedstatic int    numberOpens = 0;              ///< Counts the number of times the device is openedstatic struct class*  ebbcharClass  = NULL; ///< The device-driver class struct pointerstatic struct device* ebbcharDevice = NULL; ///< The device-driver device struct pointerstatic int     dev_open(struct inode *, struct file *);static int     dev_release(struct inode *, struct file *);static ssize_t dev_read(struct file *, char *, size_t, loff_t *);static ssize_t dev_write(struct file *, const char *, size_t, loff_t *);static struct file_operations fops ={   .open = dev_open,   .read = dev_read,   .write = dev_write,   .release = dev_release,};static int __init ebbchar_init(void){   printk(KERN_INFO "EBBChar: Initializing the EBBChar LKM\n");   // Try to dynamically allocate a major number for the device -- more difficult but worth it   majorNumber = register_chrdev(0, DEVICE_NAME, &fops);   if (majorNumber<0){      printk(KERN_ALERT "EBBChar failed to register a major number\n");      return majorNumber;   }   printk(KERN_INFO "EBBChar: registered correctly with major number %d\n", majorNumber);   // Register the device class   ebbcharClass = class_create(THIS_MODULE, CLASS_NAME);   if (IS_ERR(ebbcharClass)){                // Check for error and clean up if there is      unregister_chrdev(majorNumber, DEVICE_NAME);      printk(KERN_ALERT "Failed to register device class\n");      return PTR_ERR(ebbcharClass);          // Correct way to return an error on a pointer   }   printk(KERN_INFO "EBBChar: device class registered correctly\n");   // Register the device driver   ebbcharDevice = device_create(ebbcharClass, NULL, MKDEV(majorNumber, 0), NULL, DEVICE_NAME);   if (IS_ERR(ebbcharDevice)){               // Clean up if there is an error      class_destroy(ebbcharClass);           // Repeated code but the alternative is goto statements      unregister_chrdev(majorNumber, DEVICE_NAME);      printk(KERN_ALERT "Failed to create the device\n");      return PTR_ERR(ebbcharDevice);   }   printk(KERN_INFO "EBBChar: device class created correctly\n"); // Made it! device was initialized   return 0;}static void __exit ebbchar_exit(void){   device_destroy(ebbcharClass, MKDEV(majorNumber, 0));     // remove the device   class_unregister(ebbcharClass);                          // unregister the device class   class_destroy(ebbcharClass);                             // remove the device class   unregister_chrdev(majorNumber, DEVICE_NAME);             // unregister the major number   printk(KERN_INFO "EBBChar: Goodbye from the LKM!\n");}static int dev_open(struct inode *inodep, struct file *filep){   numberOpens++;   printk(KERN_INFO "EBBChar: Device has been opened %d time(s)\n", numberOpens);   return 0;}static ssize_t dev_read(struct file *filep, char *buffer, size_t len, loff_t *offset){   int error_count = 0;   // copy_to_user has the format ( * to, *from, size) and returns 0 on success   error_count = copy_to_user(buffer, message, size_of_message);   if (error_count==0){            // if true then have success      printk(KERN_INFO "EBBChar: Sent %d characters to the user\n", size_of_message);      return (size_of_message=0);  // clear the position to the start and return 0   }   else {      printk(KERN_INFO "EBBChar: Failed to send %d characters to the user\n", error_count);      return -EFAULT;              // Failed -- return a bad address message (i.e. -14)   }}static ssize_t dev_write(struct file *filep, const char *buffer, size_t len, loff_t *offset){   sprintf(message, "%s(%zu letters)", buffer, len);   // appending received string with its length   size_of_message = strlen(message);                 // store the length of the stored message   printk(KERN_INFO "EBBChar: Received %zu characters from the user\n", len);   return len;}static int dev_release(struct inode *inodep, struct file *filep){   printk(KERN_INFO "EBBChar: Device successfully closed\n");   return 0;}module_init(ebbchar_init);module_exit(ebbchar_exit);

I have a small testing file as well from the tutorial. The problem is that when the testing code runs, the process ends up being killed. The logs files say it is due to Supervisor Mode access and that a page fault exception was thrown.

After some research and looking in log files It came down to compatibility problems with Supervisor Mode Access Prevention, where kernel code can't access user code due to the new SMAP feature of some CPUs.

After disabling SMAP at boot time with the nosmap option the testing code works just fine.

I am looking for a way to disable/circumvent SMAP properly in module code. Since this application could run on multiple CPUs, I don't think that changing the CR4 register is the proper way.

I think the copy_to_user() function is a good lead. The problem arises when write is called. Could anyone point to me what is the proper way to code the write() function for this module?

↧

Why doesn't TCP timer use min-heap

April 13, 2020, 7:43 pm

≫ Next: Linux Kernel Modules for Listing Tasks - BFS

≪ Previous: How to properly disable SMAP from a linux module?

It is a normal choice to use min-heap as a timer to achieve a high performances. We can see it in libevent/libev. But when reading TCP source code, I find that the linux kernel doesn't use min-heap to manage these timers for every events/sockets.

What's the up limit of min-heap timer?
Why doesn't tcp timer use this data structure?

↧

Linux Kernel Modules for Listing Tasks - BFS

April 13, 2020, 11:01 pm

≫ Next: Embedded C in SystemTap - dereferencing pointer to incomplete type

≪ Previous: Why doesn't TCP timer use min-heap

I am trying to implement Breadth First Search to display kernel tasks list using linux module but unable to do. Below is the module for DFS , can anyone suggest how to do BFS

void dfs(struct task_struct *task) { struct task_struct *child; //Pointer to the next child struct list_head *list; //Children

//task->comm is the task' name//task->state is the task's state (-1 unrunnable, 0 runnable, >0 stopped)//task->pid is the task's process IDprintk(KERN_INFO "Name: %-20s State: %ld\tProcess ID: %d\n", task->comm, task->state, task->pid);list_for_each(list, &task->children) { //Loop over children    child = list_entry(list, struct task_struct, sibling); //Get child    /* child points to the next child in the list */    dfs(child); //DFS from child}

}

↧

Embedded C in SystemTap - dereferencing pointer to incomplete type

April 14, 2020, 12:31 am

≫ Next: end kernel panic - not syncing: attempted to kill init! exitcode = 0x00007f00

≪ Previous: Linux Kernel Modules for Listing Tasks - BFS

I am following this tutorial: https://blog.lexfo.fr/cve-2017-11176-linux-kernel-exploitation-part1.html

As I try to see what the netlink_sock contains in state, I use this embedded C code:

%{    #include <net/sock.h>    #include <linux/netlink.h>%}function dump_netlink_sock:long (arg_sock:long)%{    struct sock *sk = (void*) STAP_ARG_arg_sock;    struct netlink_sock * nlk = (void*) sk;    _stp_printf("-={ dump_netlink_sock: %p }=-\n", nlk);    _stp_printf("- sk = %p\n", sk);    _stp_printf("- sk->sk_rmem_alloc = %d\n", sk->sk_rmem_alloc);    _stp_printf("- sk->sk_rcvbuf = %d\n", sk->sk_rcvbuf);    _stp_printf("- sk->sk_refcnt = %d\n", sk->sk_refcnt);     _stp_printf("- nlk->state = %x\n", (nlk->state & 0x1));    _stp_printf("-={ dump_netlink_sock: END}=-\n");%}probe kernel.function("netlink_attachskb"){    if (execname() == "exploit")    {        printf("(%d - %d) >>> netlink_attachskb (%s)\n", pid(), tid(), $$parms)    }    dump_netlink_sock($sk);}

I made sure by myself in the Linux kernel source code - state exists in netlink_sock.

This is my result:

shahar@debian:~/exploitation$ sudo stap -v -g mq_notify.stp[sudo] password for shahar:Pass 1: parsed user script and 95 library script(s) using 83352virt/28420res/4880shr/24252data kb, in 0usr/80sys/78real ms.Pass 2: analyzed script: 699 probe(s), 15 function(s), 5 embed(s), 0 global(s) using 278348virt/102604res/6696shr/97036data kb, in 420usr/730sys/1153real ms.Pass 3: translated to C into "/tmp/stapFmyHer/stap_cc49251867b5bd20ade8fc721d5f8895_209103_src.c" using 275848virt/102252res/6468shr/97036data kb, in 20usr/10sys/33real ms./tmp/stapFmyHer/stap_cc49251867b5bd20ade8fc721d5f8895_209103_src.c: In function ‘function_dump_netlink_sock’:/tmp/stapFmyHer/stap_cc49251867b5bd20ade8fc721d5f8895_209103_src.c:2517:41: error: dereferencing pointer to incomplete type  _stp_printf("- nlk->state = %x\n", (nlk->state & 0x1));                                         ^make[3]: *** [/tmp/stapFmyHer/stap_cc49251867b5bd20ade8fc721d5f8895_209103_src.o] Error 1make[2]: *** [_module_/tmp/stapFmyHer] Error 2make[1]: *** [sub-make] Error 2make: *** [all] Error 2WARNING: kbuild exited with status: 2Pass 4: compiled C into "stap_cc49251867b5bd20ade8fc721d5f8895_209103.ko" in 130usr/380sys/740real ms.Pass 4: compilation failed.  [man error::pass4]Tip: /usr/share/doc/systemtap/README.Debian should help you get started.

In addition, I tried to create my own struct (basically copied netlink_sock from Linux source, but I could not get it compiled - I am not sure where to place my struct in the .stp file.

↧

end kernel panic - not syncing: attempted to kill init! exitcode = 0x00007f00

April 14, 2020, 1:17 am

≫ Next: Memory consumption for sockets in linux

≪ Previous: Embedded C in SystemTap - dereferencing pointer to incomplete type

enter image description here

I cannot access terminal and recovery mode is also not working. Please, can someone help me?

↧

Memory consumption for sockets in linux

April 14, 2020, 2:53 am

≫ Next: Memory consumption for sockets in linux & compaction behavior

≪ Previous: end kernel panic - not syncing: attempted to kill init! exitcode = 0x00007f00

Our system had a memory footprint growing gradually.After doing too many debugging with the profilers, didnt reach the exact point of issue.Now after verifying random things on the system, it got cornered to the websockets we used.

Those sockets had a lot of unread messages in its queue. Memory usage was directly proportional to the number of messages. By clearing up the messages in queue, there was a huge memory reclaimed.

Problem:

Tested OS version: CentOS 7.5

I tried checking the memory occupied by the sockets using '/proc/net/sockstat'the mem column showed the memory was ~300mb
Total memory for the recv-Q ~6mb measured using netstat -tunp (treated the numbers in recv-Q as bytes)

But when i cleaned up the unread messages, I got ~1.5gb of memory reclaimed. (using free command)

Anything else to be checked to get the right memory usage for sockets?

Is that an unwanted memory usage done by linux? How to debug the memory used by sockets further?

Why linux tools like top isnt listing the memory usage for sockets? It shows us the memory for processes, cache and buffers, but why not sockets.

Additional details:Changing the memory allocator to jemalloc didnt stop this memory growth. So it is not an issue associated with glibc.

↧

Memory consumption for sockets in linux & compaction behavior

April 14, 2020, 7:06 am

≫ Next: dereferencing pointer to incomplete type ‘const struct cred’

≪ Previous: Memory consumption for sockets in linux

Those sockets had a lot of unread messages in its queue. Memory usage was directly proportional to the number of messages. By clearing up the messages in queue, there was a huge memory reclaimed.

Problem:

Tested OS version: CentOS 7.5

I tried checking the memory occupied by the sockets using '/proc/net/sockstat'the mem column showed the memory was ~300mb
Total memory for the recv-Q ~6mb measured using netstat -tunp (treated the numbers in recv-Q as bytes)

But when i cleaned up the unread messages, I got ~1.5gb of memory reclaimed. (using free command)

Anything else to be checked to get the right memory usage for sockets?

Is that an unwanted memory usage done by linux? How to debug the memory used by sockets further?

Why linux tools like top isnt listing the memory usage for sockets? It shows us the memory for processes, cache and buffers, but why not sockets.

Additional details:Changing the memory allocator to jemalloc didnt stop this memory growth. So it is not an issue associated with glibc.

=================================================================

Edited Info: After doing some work with the test application

Converted our problem into a simple test program and ran it in servers with different kernel versions.

The test program: 5000 sockets and 4 incoming messages(3 bytes per message) to that socket every minute. Also did some work on using the ss -tm to clearly understand the buffer memory behavior.

Machine 1: Kernel: 2.6.32/proc/sys/net/core/rmem_max=124928

At start: Free mem: 2.5gbFor every incoming message, mem in ss -tm grew by 512 bytes per socket.At some point, there was a sudden drop in memory usage by sockets.

Before memory drop:

free -m : free memory: 1.1G
sockstat: TCP: inuse 6 orphan 1 tw 161 alloc 5265 mem 114138
ss -tm : mem:(r112784,w0,f1904,t0)

After memory drop:

free -m free memory: 2.3G
sockstat TCP: inuse 6 orphan 1 tw 157 alloc 5266 mem 8042
ss -tm mem:(r9528,w0,f952,t0)

Values in recv-Q was constantly increasing with the expected values.

It was the point where "r" value reached approx equal to core/rmem_maxSeemed like a compaction process happened there.

Machine 2: Kernel: 3.10.0/proc/sys/net/core/rmem_max=212992

Here I expected that memory will get dropped at ~212992. But this machine had the upgraded version ss which showed the rb=367360 size itself. So waited for the exact compaction process to happen.

At start:

ss -tm : skmem:(r0,rb367360,t0,tb87040,f53248,w0,o0,bl0)
sockstat: TCP: inuse 4 orphan 0 tw 97 alloc 5042 mem 4992

Here also the memory kept on increasing in the expected rate. There was memory drop at a particular point of time.

At memory drop point 1:Before memory drop:

free : free memory: 2.1gb
sockstat : TCP: inuse 4 orphan 0 tw 89 alloc 5097 mem 354398
ss -tm : skmem:(r290560,rb367360,t0,tb87040,f256,w0,o0,bl0)

After memory drop:

free : free memory: 3.1gb
sockstat : TCP: inuse 4 orphan 0 tw 93 alloc 5099 mem 187542
coming to ss -tm, saw a different behavior this time:
50% of the sockets had compacted values,
skmem:(r4352,rb367360,t0,tb87040,f3840,w0,o0,bl0)
and the remaining had actual values (not compacted)
skmem:(r291072,rb367360,t0,tb87040,f3840,w0,o0,bl0)

So compaction happened soon before "r" value reached "rb"

Next, waited till the "r" value reaches "rb"

Memory drop point 2There the next point of memory drop happened. All the socket buffers were compacted.(except 100 sockets) and huge memory was reclaimed.

=================================================================

Now my understanding:

The actual problem we faced in our servers: The memory footprint was growing continuously and machine started using swap space and got slowed down. Now after running the test programs, I understood that the free space available in our servers were not enough to hold till the point of compaction.

My Questions:

Is this compaction a inbuilt behavior of socket buffers?
If yes, when will that happen, in Machine 2, i had different experience than the one in machine 1? Which value should be tuned to bring the compaction early?
"mem" value in sockstat and sum of "r" value in ss add up to give the total memory occupied by the socket? Or they are same values listed by different tools.

(As per my tests, I see (mem value sockstat + skmem buffer value) equals the memory getting freed up.)

↧

dereferencing pointer to incomplete type ‘const struct cred’

April 14, 2020, 8:46 am

≫ Next: fatal error: linux/gpio/machine.h: No such file or directory

≪ Previous: Memory consumption for sockets in linux & compaction behavior

I want to understand this error. Printing UID of a process code:

printk(KERN_INFO "User ID = %d\n", (task)->cred->uid);

The error:

error: dereferencing pointer to incomplete type ‘const struct cred’

↧