How to get list of deprecated kernel functions for different kernel versions?

October 26, 2019, 6:15 am

≫ Next: userfaultfd seems slow - is it expected?

≪ Previous: How to add parameters to Linux system call?

While compiling an external linux kernel module for a 5.0 kernel version (x64 module on native x64 system) with deprecated function 'timespec_sub' (was compileable at least until kernel version 4.15) it occurred that it is difficult to find suitable replacements or updated kernel functions for the deprecated ones.

Is there a list or other summary of deprecated kernel functions and recommended function replacements/updates for different kernel versions?

↧

userfaultfd seems slow - is it expected?

October 26, 2019, 2:00 pm

≫ Next: How does the page table of the page fault handler is loaded into the MMU?

≪ Previous: How to get list of deprecated kernel functions for different kernel versions?

When I first learned about usefaultfd my initial reaction was - wow, finally! - this is a very useful feature, I can see some interesting applications of it outside of QEMU/KVM kind of scenarios.

So, I finally tried it to see if it works for my case. Functionally - yes, it's perfect, and it trivially solves a tricky problem that otherwise would require kernel changes.

However, it seems very slow. A quick benchmark shows that in my test program it takes ~50 microseconds to handle a page fault. Which is way longer than what kernel is able to do for, say, private anonymous mapping updates - it is consistently below ~2 microseconds per page fault or so on the same machine.

I wonder if I am missing something very obvious here and somehow don't set it up correctly?

↧

How does the page table of the page fault handler is loaded into the MMU?

October 27, 2019, 7:08 am

≫ Next: Permission denied on Nova Server

≪ Previous: userfaultfd seems slow - is it expected?

Can someone please explain to me how does the page fault handler's page table is loaded into the MMU ?

Lets consider that a process A is being executed which page table has currently been loaded into the MMU from memory and during the course of execution page fault occurs, Now to execute the page fault handler the corresponding page table has to be loaded into the MMU from the memory but since we have process A's page table in MMU how does the page fault handler even executes in this situation ?

Does the processor know the physical address of the page fault handler hence it disables the MMU and start executing the page fault handler from the physical location ? I did a search around but could not find the answer.

Platforms - Linux on x86, ARM architecture

How it is handled on both the architectures ?

↧

Permission denied on Nova Server

October 27, 2019, 2:19 pm

≫ Next: How to write dummy ALSA compliant device driver?

≪ Previous: How does the page table of the page fault handler is loaded into the MMU?

I am using a Linux based server as portal for my assignments at UMUC called Nova. In order to access this portal, i am prompted to enter my UMUC credentials. After my password is entered, I receive this message " Permission denied (publickey,gssapi-keyex,gssapi-with-mic,keyboard-interactive)." What can I do to solve this problem?

i am using a Macbook

My Screen

↧

How to write dummy ALSA compliant device driver?

October 27, 2019, 5:08 pm

≫ Next: How to know which core that send the inter-processor interrupt?

≪ Previous: Permission denied on Nova Server

I want to write dummy ALSA compliant driver as a loadable kernel module. When accessing it by aplay/arecord throught the ALSA-lib, let's say, it must behave as normal 7.1 channel audio device providing all the basic controls at least - sampling rates, number of channels, format, etc... Underneath it will just get every channel from the audio stream and will send it through the network as UDP packet stream. It must be capable to be loaded multiple times and ultimately it would expose as many as want audio devices under /dev. In that way we will have multiple virtual sound cards in the system.

What should be the minimal structure of such a kernel module? Can you give me an example skeleton (at least the interfaces) to be 100% ALSA compliant? ALSA driver examples are so poor...

↧

How to know which core that send the inter-processor interrupt?

October 27, 2019, 10:42 pm

≫ Next: When will getsockname() set ENOBUFS？

≪ Previous: How to write dummy ALSA compliant device driver?

I am working on ARM64. I add an inter-processor interrupt (IPI) handler in void handle_IPI(int ipinr, struct pt_regs *regs) in linux/arch/arm64/kernel/smp.c.

When I get an IPI, I want to know which core sends this IPI. For example, core 3 sends an IPI to core 0, then core 0 gets the IPI, I want to know that this IPI is from core 3 in the IPI handler.

Is there a way to do it?

Thanks a lot.

↧

When will getsockname() set ENOBUFS？

October 27, 2019, 11:05 pm

≫ Next: What are the most common busmaster operations, and how are they better than regular DMA?

≪ Previous: How to know which core that send the inter-processor interrupt?

When will getsockname() set ENOBUFS？

The manual says getsockname() can set errno to ENOBUFS, but I have checked the related source code inside kernel(3.10, 4.9, 5.2):

inet_getname
inet6_getname
unix_getname
netlink_getname
ipx_getname
move_addr_to_user

None of them return ENOBUFS.

Can anybody tell me if there is anything I was missed?

↧

What are the most common busmaster operations, and how are they better than regular DMA?

October 28, 2019, 5:03 am

≫ Next: 'Bad page state' kernel error appearing randomly on azure vm when SAS is running

≪ Previous: When will getsockname() set ENOBUFS？

Can someone list the most common operations that use the bus mastering provision of the host bus? I can list a few..

1) The GPU transfers the overall framebuffer to the video card using bus-mastering over PCI-e (in recent x86).

2) The ethernet card transfers a received packet to main-memory using bus-mastering.

3) I assume the hard-disk too uses bus-mastering to transfer blocks.

In this context, when do these devices/drives use bus-mastering, vs 3rd party DMA?

Recently, it seems the linux kernel has started supporting something called, P2P DMA within the PCIe, where devices communicate directly among themselves. Now how is P2P DMA fundamentally different from the regular bus-mastering DMA. I guess, till now, bus-mastering was only used by the device to transfer to the buffer created by the DMA subsystem and it was always to or from the main-memory, right? P2P Dma is a provision that allows one to bypass the main memory altogether, I guess. I also read somewhere that such provisions were being used by some of the proprietary graphics drivers in high end gaming systems and that Linux is somewhat of a latecomer to the party.

Can someone provide a broad overview of the varieties of DMA available in modern systems, and some way to conceptually understand them, if there is one?

Edit: regular DMA changed to 3rd party DMA

↧

'Bad page state' kernel error appearing randomly on azure vm when SAS is running

October 28, 2019, 5:30 am

≫ Next: How to know linux scheduler time slice?

≪ Previous: What are the most common busmaster operations, and how are they better than regular DMA?

Since a few months, we start seeing Bad page state errors appearing in /var/log/message.

Here is the exact stack trace

Sep 27 15:14:11 az-prod-sas1 kernel: BUG: Bad page state in process sas  pfn:1a49ff
Sep 27 15:14:11 az-prod-sas1 kernel: page:ffffd9a146927fc0 count:0 mapcount:1 mapping:          (null) index:0x7f48e7fff
Sep 27 15:14:11 az-prod-sas1 kernel: page flags: 0x2fffff00080018(uptodate|dirty|swapbacked)
Sep 27 15:14:11 az-prod-sas1 kernel: page dumped because: nonzero mapcount
Sep 27 15:14:11 az-prod-sas1 kernel: Modules linked in: binfmt_misc iptable_security bridge stp llc nf_conntrack_netlink nfnetlink ext4 mbcache jbd2 nfsv3 nfs_acl nfs lockd grace drbg fscache ansi_cprng c
mac arc4 md4 nls_utf8 cifs ccm dns_resolver overlay(T) ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack sunrpc dm_mirror dm_region_hash dm_log dm_mod joydev sb_edac iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg hv_utils ptp hv_balloo
n pps_core pcspkr i2c_piix4 ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic hv_netvsc hv_storvsc scsi_transport_fc hyperv_keyboard scsi_tgt hid_hyperv crct10dif_pclmul crct10dif_common crc32c_
intel ata_generic
Sep 27 15:14:11 az-prod-sas1 kernel: pata_acpi floppy hyperv_fb ata_piix serio_raw libata hv_vmbus
Sep 27 15:14:11 az-prod-sas1 kernel: CPU: 2 PID: 117797 Comm: sas Tainted: G               ------------ T 3.10.0-957.12.2.el7.x86_64 #1
Sep 27 15:14:11 az-pro-sas1 kernel: BUG: Bad page state in process sas  pfn:1a49ff
Sep 27 15:14:11 az-pro-sas1 kernel: page:ffffd9a146927fc0 count:0 mapcount:1 mapping:          (null) index:0x7f48e7fff
Sep 27 15:14:11 az-pro-sas1 kernel: page flags: 0x2fffff00080018(uptodate|dirty|swapbacked)
Sep 27 15:14:11 az-pro-sas1 kernel: page dumped because: nonzero mapcount
Sep 27 15:14:11 az-pro-sas1 kernel: Modules linked in: binfmt_misc iptable_security bridge stp llc nf_conntrack_netlink nfnetlink ext4 mbcache jbd2 nfsv3 nfs_acl nfs lockd grace drbg fscache ansi_cprng c
mac arc4 md4 nls_utf8 cifs ccm dns_resolver overlay(T) ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack sunrpc dm_mirror dm_region_hash dm_log dm_mod joydev sb_edac iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg hv_utils ptp hv_balloo
n pps_core pcspkr i2c_piix4 ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic hv_netvsc hv_storvsc scsi_transport_fc hyperv_keyboard scsi_tgt hid_hyperv crct10dif_pclmul crct10dif_common crc32c_
intel ata_generic
Sep 27 15:14:11 az-pro-sas1 kernel: pata_acpi floppy hyperv_fb ata_piix serio_raw libata hv_vmbus
Sep 27 15:14:11 az-pro-sas1 kernel: CPU: 2 PID: 117797 Comm: sas Tainted: G               ------------ T 3.10.0-957.12.2.el7.x86_64 #1
Sep 27 15:14:11 az-pro-sas1 kernel: Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
Sep 27 15:14:11 az-pro-sas1 kernel: Call Trace:
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a963041>] dump_stack+0x19/0x1b
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a95dcf3>] bad_page.part.76+0xdc/0xf9
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3bf100>] free_pages_prepare+0x170/0x190
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3bfb74>] free_hot_cold_page+0x74/0x160
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3c4a13>] __put_single_page+0x23/0x30
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3c4a65>] put_page+0x45/0x60
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a42ca37>] __split_huge_page+0x357/0x880
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a42cfd6>] split_huge_page_to_list+0x76/0xf0
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a42de30>] __split_huge_page_pmd+0x1d0/0x5c0
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3e781d>] unmap_page_range+0xbdd/0xc30
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3e78f1>] unmap_single_vma+0x81/0xf0
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3e8d2d>] zap_page_range+0x11d/0x190
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a3e3c1d>] SyS_madvise+0x49d/0xac0
Sep 27 15:14:11 az-pro-sas1 kernel: [<ffffffff8a975ddb>] system_call_fastpath+0x22/0x27
Sep 27 15:14:11 az-pro-sas1 kernel: Disabling lock debugging due to kernel taint
Sep 27 15:14:12 az-pro-sas1 sh: abrt-dump-oops: Found oopses: 1disa

We manage to replicate the problem on other VMs and we tried upgrading/downgrading the kernel, we also tried disabling transparent huge pages but without luck.

Its a CentOS7 vm, running on Azure with the following versions:

CentOS Linux release 7.6.1810 (Core)
Linux 3.10.0-957.12.2.el7.x86_64

The error is appearing randomly but its always when SAS is running. When it occurs the SAS process just hangs forever and after some time the vm the CPUs start burning and the vm become non-responsive

Any help would be greatly appreciated!

↧

How to know linux scheduler time slice?

October 28, 2019, 6:06 am

≫ Next: Inconsistent values of ARM PMU cycles counter

≪ Previous: 'Bad page state' kernel error appearing randomly on azure vm when SAS is running

I'm looking for the value of the time slice (or quantum) of my Linux kernel.

Is there a /proc file which expose such an information ?

(Or) Is it well-defined in the Linux header of my distributions ?

(Or) Is there a C function of the Linux API (maybe sysinfo) that expose this value ?

Thanks in advance.

↧

Inconsistent values of ARM PMU cycles counter

October 28, 2019, 7:27 am

≫ Next: Why can moving the mutex_unlock() after dev_kfree_skb() eliminate the use-after-free bug?

≪ Previous: How to know linux scheduler time slice?

I'm trying to measure performance of my code in linux kernel with pmu. First of all I want to test pmu therefore created simple loop of couple operations in kernel. I placed it under spin lock with disabled interrupts so my test code can't be preempted. Then I printed cycle counter to check how much CPU cycles this loop takes. But I see very different values at each print: 100, 500, 1000, 200, ... My question is: why I see so different values every time? PS: in countrary to cycle counter, pmu's instruction counter is stable and I see same values every time. I also tried to use arm timer but it also showing different values similar to pmu's cycle counter. Here is how I use ARM timer to measure performance:

unsigned long long ticks_start, ticks_end;
int i = 0, j;
unsigned long flags;

spin_lock_irqsave(&lock, flags);
while (i++ < 100) {
   j = 0;
   asm volatile("mrs %0, CNTPCT_EL0" : "=r" (ticks_start)); 
   while (j++ < 10000) {
      asm volatile ("nop");
   }
   asm volatile("mrs %0, CNTPCT_EL0" : "=r" (ticks_end));
   printk("ticks %d are: %llu\n", i, ticks_end - ticks_start);
}
spin_unlock_irqrestore(&lock, flags);

and output on real device are (cortex A-57):

...
ticks 31 are: 2287
ticks 32 are: 2287
ticks 33 are: 2287
ticks 34 are: 1984
ticks 35 are: 457
ticks 36 are: 1604
ticks 37 are: 2287
...

↧

Why can moving the mutex_unlock() after dev_kfree_skb() eliminate the use-after-free bug?

October 28, 2019, 7:31 am

≫ Next: If I do not acquire a spinlock in a softirq, is it then alright to sleep?

≪ Previous: Inconsistent values of ARM PMU cycles counter

Recently, I find a paper named "Effective Static Analysis of Concurrency Use-After-Free Bugs in Linux Device Drivers". It uses a motivating example (it is also a kernel patch: https://github.com/torvalds/linux/commit/4f68ef64cd7f). I investigate the related kernel source code, I found the function dev_kfree_skb(frame.skb) will first decrease the user_count of frame.skb, if user_count == 0, the struct frame.skb will be really freed. However, the read operation in another function will try to obtain the frame.skb first, which will increase the use_count. So, I think even without this kernel patch, the memory of frame.skb will not be freed before the read operation. I don't know whether this kernel patch is really necessary.

int cw1200_hw_scan(...) { 
  ......
  mutex_lock(&priv->conf_mutex); 
  ......
  mutex_unlock(&priv->conf_mutex);
  if (frame.skb)
    dev_kfree_skb(frame.skb); // FREE 
  ......
}


void cw1200_bss_info_changed(...) { 
  ......
  mutex_lock(&priv->conf_mutex); 
  ......
  cw1200_upload_beacon(...); // read frame.skb in this function
  ......
  mutex_unlock(&priv->conf_mutex); 
  ......
}

↧

If I do not acquire a spinlock in a softirq, is it then alright to sleep?

October 28, 2019, 8:14 am

≫ Next: Linux kernel thread address space

≪ Previous: Why can moving the mutex_unlock() after dev_kfree_skb() eliminate the use-after-free bug?

If I do not acquire a spinlock in softirq context, is it then alright to sleep?

I understand that it is incorrect to sleep after acquiring a spinlock. The process put to sleep might wake up on another CPU while the spinlock would've been acquired on some other cpu. Interrupts would have been disabled on the cpu where spinlock was acquired while restore happens on another cpu. (Will this cause a kernel panic immediately?) But if there is no spinlock involved, will the sleep or (schedule() call), work fine? Or is this wrong due to the way context saving/restoring works? If I call schedule() in a softirq, where does the interrupt context/stack gets saved? kernel mode stack of the current process? When the process gets to run again, will it be able to continue from where it left? Or is this where things will go wrong? That is, schedule() does not know about softirq/interrupt stack and will not save them. So when the process gets to run again, it would have no idea about the softirq?

↧

Linux kernel thread address space

October 28, 2019, 2:05 pm

≫ Next: What is TCP socket status 8A, as shown in /proc/net/tcp?

≪ Previous: If I do not acquire a spinlock in a softirq, is it then alright to sleep?

I read that linux kernel threads don't have their own address space, their mm field is set to NULL. I know that all kernel threads share address space, but still, they have their own stack right? They need to somehow describe that field, and without mm, how to they do that? and other lists like open files, where do they keep it? also, what's the point of setting active_mm field to previous user task's mm? thanks in advance.

↧

What is TCP socket status 8A, as shown in /proc/net/tcp?

October 28, 2019, 2:41 pm

≫ Next: Which process runs first when a fork() is called

≪ Previous: Linux kernel thread address space

I'm trying to interpret the status ('st') column of output from /proc/net/tcp, and I'm seeing unexpected values.

I've seen previous questions like List of possible internal socket statuses from /proc. That references the kernel docs, but those only seem to document the statuses up to a maximum of 0C, whilst I'm seeing 8A.

This is my full output:

  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode                                                     
   0: 00000000:A6D8 00000000:0000 8A 00000002:00000000 00:00000000 00000000  1001        0 11240259 1 0000000000000000 100 0 0 2 0                   
   1: 00000000:9D3E 00000000:0000 8A 00000000:00000000 00:00000000 00000000  1001        0 11956055 1 0000000000000000 100 0 0 10 0                  
   2: 00000000:9F7E 00000000:0000 8A 00000012:00000000 00:00000000 00000000  1001        0 73658 1 0000000000000000 100 0 0 2 0                      
   3: 00000000:A702 00000000:0000 8A 00000012:00000000 00:00000000 00000000  1001        0 73654 1 0000000000000000 100 0 0 2 0                      
   4: 00000000:A905 00000000:0000 8A 00000012:00000000 00:00000000 00000000  1001        0 73666 1 0000000000000000 100 0 0 2 0                      
   5: 00000000:A926 00000000:0000 8A 00000000:00000000 00:00000000 00000000  1001        0 11370549 1 0000000000000000 100 0 0 10 0                  
   6: 00000000:AACA 00000000:0000 8A 00000000:00000000 00:00000000 00000000  1001        0 11357036 1 0000000000000000 100 0 0 10 0                  
   7: 00000000:A8EC 00000000:0000 8A 00000000:00000000 00:00000000 00000000  1001        0 11319108 1 0000000000000000 100 0 0 10 0                  
   8: 00000000:AAD3 00000000:0000 8A 00000000:00000000 00:00000000 00000000  1001        0 11418384 1 0000000000000000 100 0 0 10 0                  
   9: 00000000:AEB3 00000000:0000 8A 00000012:00000000 00:00000000 00000000  1001        0 73662 1 0000000000000000 100 0 0 2 0                      
  10: 00000000:9E54 00000000:0000 8A 00000002:00000000 00:00000000 00000000  1001        0 11121735 1 0000000000000000 100 0 0 2 0                   
  11: 00000000:9ED5 00000000:0000 8A 00000000:00000000 00:00000000 00000000  1001        0 11504164 1 0000000000000000 100 0 0 10 0                  
  12: 6700000A:B53F 026AE00D:01BB 01 00000000:00000000 02:000000AA 00000000 10307        0 12005540 2 0000000000000000 22 4 1 10 1400                
  13: 6700000A:BE36 EF292834:01BB 06 00000000:00000000 03:00000C3F 00000000     0        0 0 3 0000000000000000                                      
  14: 6700000A:9930 0E11D9AC:01BB 08 00000000:0000026D 00:00000000 00000000 10037        0 11976223 1 0000000000000000 26 4 30 10 1400

Given the context, it seems likely that 8A is some special case of LISTEN, but I can't find any documentation for that. Is the upper byte used for some extra set of flags, so this is LISTEN + something else? Every other example I can find has 0 in the upper byte though.

Not sure if it's relevant, but this output is coming from an unrooted Android device.

↧

Which process runs first when a fork() is called

October 28, 2019, 2:46 pm

≫ Next: current->mm gives NULL in linux kernel

≪ Previous: What is TCP socket status 8A, as shown in /proc/net/tcp?

I wrote this program

main()
{
    int pid;
    pid=fork();
    if(pid==0)
        printf("\nI am child\n");
    else
        printf("\nI am parent\n");

    return 0;
}

Whose output when executed is

 ./a.out 

I am parent

I am child

When I run with the strace program, the output is [Last part]

arch_prctl(ARCH_SET_FS, 0x7fd1bbf52700) = 0
mprotect(0x7fd1bbd47000, 16384, PROT_READ) = 0
mprotect(0x7fd1bbf70000, 4096, PROT_READ) = 0
munmap(0x7fd1bbf54000, 103886)          = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd1bbf529d0) = 5109

I am child
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd1bbf6d000
write(1, "\n", 1
)                       = 1
write(1, "I am parent\n", 12I am parent
)           = 12
exit_group(13)                          = ?

The output shows that parent is run first, but the strace output seems to show that child runs first, since its printed first.

What is the rule?

↧

current->mm gives NULL in linux kernel

October 28, 2019, 6:56 pm

≫ Next: Calling user-space program functions from kernel modules [duplicate]

≪ Previous: Which process runs first when a fork() is called

I would like to walk the page table, so I have accessed the current->mm, but it gives NULL value.

I'm working on linux kernel 3.9 and I don't understand how could current->mm is zero.

Is there something I miss here?

↧

Calling user-space program functions from kernel modules [duplicate]

October 29, 2019, 1:32 am

≫ Next: What is "the kernel address space"?

≪ Previous: current->mm gives NULL in linux kernel

This question is an exact duplicate of:

How can I Execute/Call a user-space defined function from Linux kernel space module? 2 answers

First of all, I am working on an embedded board which I want to take control of its RGB LED using PUSH BUTTON existing both on this board. The LED file path is : "/sys/class/leds" and the BUTTON file path is : "/dev/input/event0"

For that, I have developed user-space C program :

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>

#include <sys/select.h>
#include <sys/time.h>
#include <errno.h>

#include <linux/input.h>

#define BTN_FILE_PATH "/dev/input/event0"
#define LED_PATH "/sys/class/leds"
#define red "red"
#define green "green"

void change_led_state(char *led_path, int led_value)
{
    char    lpath[64];
    FILE    *led_fd;

    strncpy(lpath, led_path, sizeof(lpath) - 1);
    lpath[sizeof(lpath) - 1] = '\0';

    led_fd = fopen(lpath, "w");

    if (led_fd == NULL) {
        fprintf(stderr, "simplekey: unable to access led\n");
        return;
    }

    fprintf(led_fd, "%d\n", led_value);

    fclose(led_fd);
}

void reset_leds(void)
{
    change_led_state(LED_PATH "/" red "/brightness", 0);
    change_led_state(LED_PATH "/" green "/brightness", 0);
}

int configure_leds(void)
{
    FILE    *l_fd;
    FILE    *r_fd;
    char    *none_str = "none";

    /* Configure leds for hand control */
    l_fd = fopen(LED_PATH "/" red "/trigger", "w");
    r_fd = fopen(LED_PATH "/" green "/trigger", "w");

    if (l_fd == NULL || r_fd == NULL) {
        perror("simplekey: unable to configure led");
        return -EACCES;
    }

    fprintf(r_fd, "%s\n", none_str);
    fprintf(l_fd, "%s\n", none_str);

    fclose(r_fd);
    fclose(l_fd);

    /* Switch off leds */
    reset_leds();

    return 0;
}

void eval_keycode(int code)
{
    static int red_state = 0;
    static int green_state = 0;

    switch (code) {
    case 260:
        printf("BTN left pressed\n");

        /* figure out red state */
        red_state = red_state ? 0 : 1;

        change_led_state(LED_PATH "/" red "/brightness", red_state);
        break;

    case BTN_RIGHT:
        printf("BTN right pressed\n");

        /* figure out green state */
        green_state = green_state ? 0 : 1;

        change_led_state(LED_PATH "/" green "/brightness", green_state);
        break;
    }
}


int main(void)
{
    int file;
    /* how many bytes were read */
    size_t  rb;
    int ret;
    int yalv;
    /* the events (up to 64 at once) */
    struct input_event  ev[64];
    char    *str = BTN_FILE_PATH;

    printf("Starting simplekey app\n");

    ret = configure_leds();
    if (ret < 0)
        exit(1);

    printf("File Path: %s\n", str);

    if((file = open(str, O_RDONLY)) < 0) {
        perror("simplekey: File can not open");
        exit(1);
    }

    for (;;) {
        /* Blocking read */
        rb= read(file, &ev, sizeof(ev));



        for (yalv = 0;
            yalv < (int) (rb / sizeof(struct input_event));
            yalv++) {
            if (ev[yalv].type == EV_KEY) {


                /* Change state on button pressed */
                if (ev[yalv].value == 0)
                    eval_keycode(ev[yalv].code);
            }
        }
    }

    close(file);
    reset_leds();
    exit(0);
}

This code works fine and I can now switch on and off the LED using the BUTTON.

In second step, I was asked to develop a Linux kernel module in order to run the above code in kernel mode, so this is the module that I have created :

#include <linux/init.h>
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/device.h>  
#include <linux/kernel.h>
#include <linux/uaccess.h>
#include<linux/slab.h>
#include <linux/input.h>



MODULE_LICENSE("GPL");      
MODULE_AUTHOR("Gaston");  
MODULE_DESCRIPTION("A simple Linux char driver"); 
MODULE_VERSION("0.1"); 

#define BTN_FILE_PATH "/dev/input/event0"


int file;
char *str = BTN_FILE_PATH;



int exer_open(struct inode *pinode, struct file *pfile) {
    struct file *f;

    f = filp_open(str, 0 , O_RDONLY);
    if (IS_ERR(f)) {
        printk("simplekey: File can not open");
        return(PTR_ERR(f));
    }
    pfile->private_data = f;

    printk(KERN_INFO "Device has been opened\n");
    return 0;
}



void eval_keycode(int code);

ssize_t exer_read(struct file *pfile, char __user *buffer, size_t length, loff_t *offset) {
    struct file *f = pfile->private_data;
    enum { MAX_BUF_SIZE = 4096 };
    size_t buf_size = 0;
    char *buf = NULL;
    ssize_t total = 0;
    ssize_t rc = 0;

    struct input_event  *ev;
    int yalv;

    /* Allocate temporary buffer. */
    if (length) {
        buf_size = min_t(size_t, MAX_BUF_SIZE, length);
        ev = kmalloc(buf_size, GFP_KERNEL);
        if (ev == NULL) {
            return -ENOMEM;
        }
    }

    /* Read file to buffer in chunks. */
    do {
        size_t amount = min_t(size_t, length, buf_size);

        rc = kernel_read(f, ev, amount, offset);
        if (rc > 0) {
            /* Have read some data from file. */
            if (copy_to_user(buffer, ev, rc) != 0) {
                /* Bad user memory! */
                rc = -EFAULT;
            } else {
                /* Update totals. */
                total += rc;
                buffer += rc;
                *offset += rc;
                length -= rc;

        for (yalv = 0; yalv < (int) (rc / sizeof(struct input_event)); yalv++) {
            if (ev[yalv].type == EV_KEY) {
                if (ev[yalv].value == 0)
                    eval_keycode(ev[yalv].code);
            }
        }


                if (rc < amount) {
                    /* Didn't read the full amount, so terminate early. */
                    rc = 0;
                }
            }
        }
    } 
    while (rc > 0 && length > 0);

    /* Free temporary buffer. */
    kfree(buf);

    if (total > 0) {
       return total;
    }
    return rc;
}



ssize_t exer_write(struct file *pfile, const char __user *buffer, size_t length, loff_t *offset) {

    return 0;

}   


int exer_close(struct inode *pinode, struct file *pfile) {
    struct file *f = pfile->private_data;
    int rc;

    rc = filp_close(f, NULL);
    if (rc == 0) {
        printk(KERN_INFO "Device successfully closed\n");
    }
    return rc;
}



struct file_operations exer_file_operations = { 
    .owner = THIS_MODULE,
    .open = exer_open,
    .read = exer_read,
    .write = exer_write,
    .release = exer_close,
};




int exer_simple_module_init(void) {

    printk(KERN_INFO "Initializing the LKM\n");
    register_chrdev(240, "Simple Char Drv", &exer_file_operations);
    return 0;
}



void exer_simple_module_exit(void) {

    unregister_chrdev(240, "Simple Char Drv");
}


module_init(exer_simple_module_init);
module_exit(exer_simple_module_exit);

For that moment, I don't know if the module code is correct or not and will let me control the LED or I am doing something wrong, but, the problem I am facing now is that there is a function called eval_keycode() which is decalred in my principal user-space program and which I must use in the Linux kernel as you can see.

I can not copy the code of this function in the kernel module as I do not have the required libraries for it. So I was wondering if there is some way to be able to call a user-space defined function to the kernel module, or link between them, in order to use it without re-copying it there.

I know there is user space memory access API functions like copy_to_user , strnlen_user , get_user ... But I do not think that they can help me here.

Could anyone help me please ?

Thank you!

↧

What is "the kernel address space"?

October 29, 2019, 6:29 am

≫ Next: what's the execution order of multiple child processes

≪ Previous: Calling user-space program functions from kernel modules [duplicate]

From Understanding The Linux Kernel, here is some discussion about kernel thread vs user process i.e. regular process:

Besides user processes, Unix systems include a few privileged processes called kernel threads with the following characteristics:
• They run in Kernel Mode in the kernel address space.
• They do not interact with users, and thus do not require terminal devices.
• They are usually created during system startup and remain alive until the system is shut down.
...
In Linux, kernel threads differ from regular processes in the following ways:
• Kernel threads run only in Kernel Mode, while regular processes run alternatively in Kernel Mode and in User Mode.
• Because kernel threads run only in Kernel Mode, they use only linear addresses greater than PAGE_OFFSET. Regular processes, on the other hand, use all four gigabytes of linear addresses, in either User Mode or Kernel Mode.

I have heard about the virtual address space of a user process i.e. regular process, and a portion of the address space is mapped to the kernel code and data.

My Questions:

I was wondering what "the kernel address space" in the above quote mean?
Is it not the part of the virtual address space of a user process?
Does it mean that the kernel have its own virtual address space, just like a user process has its own virtual address space?

↧

what's the execution order of multiple child processes

October 29, 2019, 9:28 am

≫ Next: Is there a way to add kexec functionality to busybox initrd?

≪ Previous: What is "the kernel address space"?

use the follwing code to fork four chile processes

#define _GNU_SOURCE
#include  <stdio.h>
#include  <unistd.h>
#include  <sys/wait.h>
#include  <stdlib.h>

pid_t pid;
int child(int id);

int main(void)
{
    int cpus = 0; 
    cpus = sysconf(_SC_NPROCESSORS_CONF);
    printf("cpu number %ld\n",cpus);

    if ((pid = fork())==0)
        child(1);  
    if ((pid = fork())==0)
        child(2);
    if ((pid = fork())==0)
        child(3);
    if ((pid = fork())==0)
        child(4);

    printf("fork over\n");
    int i,status;  
    for (i=0;i<4;i++)
    { 
    pid = wait(&status);
    }
    exit(0);
}

int child(int id)
{
    printf("child %d is running!pid :%d\n",id,getpid());
    exit(id);
}

I run that code in vmware and I set the cpu num to 1.
The expected output is the first child is the first to exectue. But the real output is on the contrary.
output: fork over
child 4 is running!pid :4258
child 3 is running!pid :4257
child 2 is running!pid :4256
child 1 is running!pid :4255

They are normal processes so the scheduling algorithm is CFS.
The process which has min vruntime should run first.
Obviously the child 1's vruntime is smaller than others.
So it's so confusing to get such output. I hope someone can help me to figure out it.

The Linux release version is Ubuntu 16.04
The kernel version is 4.4

↧