#linuxcnc-devel | Logs for 2014-08-23

Back
[01:48:30] <memleak> kms on radeon r9 290 (volcanic islands) has no bad latency spikes..
[01:49:20] <memleak> so if you want 3D + good RTAI performance, get a 500 dollar GPU... :/
[08:04:19] <jepler> memleak: not really, you still copied parts of the non-distributable file verbatim
[09:02:40] <jepler> memleak: new pull request offered. this one I built and got no modpost undefined symbol errors
[09:03:28] <memleak> jepler you're an amazing coder, thanks
[09:04:39] <memleak> will merge
[09:23:51] <jepler> I'm an experienced coder who knew what to grep for in the gcc manual pages
[09:24:23] <jepler> memleak: anyhow, you're welcome; I hope this iteration works better than the last one
[09:26:42] <jepler> overnight latency-histogram on the odroid is +-40us @ 1ms
[09:26:57] <jepler> having run for 40k seconds
[09:32:42] <jepler> "Kernel panic - not syncing: timer doesn't work through interrupt-remapped IO-APIC"
[09:33:08] <jepler> memleak: so my kernel built from 3.14.16 plus the patch from your RTAI doesn't boot :-/
[10:43:58] <jepler> pretty sure it's spidev itself that has bad latency. with the odroid as the master, merely not having a properly behaving slave couldn't change the timing of spidev.
[10:44:52] <jepler> so I wrote a component that does simple 64-bit spi transactions on /dev/spidev without expecting anything in particular in response.. and predictably, it gets realtime deadline errors
[10:45:24] <jepler> the functions max-time rises to nearly 4ms after just a few seconds
[10:46:08] <jepler> +'
[10:47:48] <pcw_home> SPI masters dont have any data dependence so they dont even know if the slave is there
[10:47:54] <jepler> pcw_home: right
[10:48:30] <jepler> hostmot2 *does* do something exceptional if it starts getting crazy data read back, though, so I needed this simpler component to confirm it was the kernel spi device alone that causes the problem
[10:50:54] <pcw_home> where everything is encapsulated in one packet like Ethernet, there probably should be some sanity checks built in
[10:51:57] <pcw_home> (say read the cookie or write a sequence number on writes thats checked on reads)
[10:52:49] <pcw_home> not sure if the overhead is worth it or not on SPI
[10:53:16] <jepler> in spi, a basic minimum check would be if that first returned word is all AAs
[10:53:42] <jepler> when the connection is bad, spi reads as all bits one, which seems to cause an immediate watchdog bite report
[10:53:51] <Tom_itx> why AA and not FF?
[10:54:30] <jepler> Tom_itx: when properly working, the first byte the SPI-attached mesa card shifts out is AAAAAAAA
[10:54:33] <jepler> just because it is
[10:54:46] <Tom_itx> gotcha.. figured that out after the enter key :)
[10:55:11] <pcw_home> its left over debug code (it could actually be anything)
[10:56:44] <pcw_home> better for the driver to complain about SPI connection issues than proceeding with garbage data
[11:03:13] <jepler> pcw_home: would you have room for a crc32?
[11:04:17] <jepler> the idea's not quite baked, but: Imagine that the reply to each command after the first is the crc of the whole transaction up to that point
[11:05:07] <jepler> now the linuxcnc side could just tack an extra "read zero bytes at 0000" command at the end and check that the CRC matches
[11:06:57] <jepler> but you'd have to have the CRC available really quick
[11:07:48] <pcw_home> CRC done bit by bit is available immediately
[11:09:25] <pcw_home> currently I think the 0xAAAAAAAA is only at start of frame
[11:09:29] <jepler> yes
[11:09:43] <jepler> elsewhere it reads back zeros for the words that are commands
[11:10:23] <jepler> maybe instead it could do something like read back the CRC32 of the received data XOR CRC32 of the sent data, or the CRC16 of the received data concatenated with the CRC16 of the sent data
[11:11:33] <jepler> at 32MHz, reading back that checksum would add just another 1us to a transaction
[11:15:07] <jepler> ugh ugh ugh gross: every spi transaction maps and unmaps dma memory
[11:17:38] <pcw_home> Yeah, using the general purpose SPI driver is like watch repair with a Swiss army knife
[11:21:18] <jepler> though in the case of my simple transfer program, the amount of data transferred is just 8 bytes per direction and the code reads like it's supposed to use a FIFO instead of DMA when the amount of data to transfer is small
[11:22:20] <jepler> fifo is maybe 0x1ff or 0x7f .. bytes? words?
[11:24:50] <pcw_home> I would expect that the SPI hardware has a small FIFO (the Alwinner A10/20 and TI Sitara SPI hardware is like that)
[11:25:42] <pcw_home> I think the A10 is 64 bytes, beyond that it can use demand mode DMA
[11:26:30] <pcw_home> (to keep the TX FIFO filled and RX FIFO emptied)
[11:27:12] <CaptHindsight> same depth 64 in the Exynos
[11:27:22] <jepler> > Two independent 32-bits wide transmit and receive FIFOs: depth 64 in port 0 and depth 16 in port 1 and 2
[11:28:02] <jepler> > SPI controls the number of packets to be received in master mode. Set SFR (PACKET_CNT_REG) to receive any
[11:28:05] <jepler> number of packets. SPI stops generating SPICLK if the number of packets is similar to PACKET_CNT_REG.
[11:28:08] <jepler> "similar to" ??
[11:28:27] <pcw_home> :-)
[11:29:46] <CaptHindsight> "in my mind" it probably means "the same" or =
[11:55:55] <memleak> jepler, i had that problem too
[11:58:30] <memleak> one sec. its a kernel config issue
[11:59:22] <jepler> of course it is
[11:59:46] <memleak> i know which options roughly
[12:00:10] <memleak> however i dont know which one specifically so i disable a set
[12:01:54] <jepler> if there are some options that are forbidden when ipipe is selected, can't that be expressed in the Kconfig language?
[12:02:07] <jepler> rather than just carrying around information in your head
[12:02:24] <memleak> im going to post it
[12:02:30] <memleak> and adjust .patch soon
[12:02:56] <memleak> give me a minute and ill write it up
[12:04:50] <memleak> dpaste.com/02AFGWJ
[12:05:28] <memleak> i turned all those off at once and it fixed the issue for me. i think its just IOMMU and the PCI options
[12:05:53] <memleak> it will need trail and error which im really not in the mood for atm
[12:05:58] <memleak> *trial
[12:07:11] <memleak> i dont want to adjust .patch file until i know which exact options cause it
[12:28:03] <memleak> just turning off PCI IOAPIC doesnt do it
[12:43:23] <memleak> its IOMMU that causes it
[13:00:36] <jepler> memleak: "Disable IOMMU Hardware Support" ?
[13:02:46] <memleak> git pull and patch kernel source with updated hal-linux patch
[13:05:16] <memleak> ill do the same for 3.10 if successful for you
[13:05:28] <memleak> (works here)
[13:07:21] <memleak> jepler, are you using SMP?
[13:08:21] <jepler> memleak: yes
[13:09:04] <memleak> 32 or 64?
[13:09:27] <memleak> also you'll need SSE2
[13:09:29] <cradek> > if you just want to make a rip cut you must have a DXF of a straight line to import
[13:10:11] <memleak> i havent tested compilation on 32-bit yet
[13:16:46] <jepler> well, it did turn off a bunch of config options
[13:17:14] <jepler> I notice one of them is -CONFIG_IRQ_REMAP=y
[13:17:27] <jepler> which could be related to "timer doesn't work through interrupt-remapped IO-APIC"
[13:18:46] <memleak> thats related to IOMMU
[13:18:52] <memleak> all IOMMU must be off
[13:26:15] <jepler> do you understand why it's incompatible, or just that it *is* incompatible?
[13:29:27] <memleak> https://lkml.org/lkml/2012/8/7/558
[13:29:57] <memleak> that "fix" either isnt present or doesnt do anything when ipipe is enable
[13:30:08] * memleak checks source
[13:30:42] <memleak> its present.
[13:32:13] <memleak> http://www.xenomai.org/pipermail/xenomai/2012-September/026228.html
[13:33:05] <memleak> ah i see it i think..
[13:33:36] <memleak> in your ipipe enabled kernel look at line 1251 in arch/x86/kernel/io_apic.c
[13:33:57] <memleak> compare that code to the code in the xenomai email i posted
[13:35:37] <memleak> "mark the free vectors" is handeled differently
[13:38:35] <jepler> ok thanks for the explanation
[13:40:01] <memleak> you betcha!
[14:19:39] <memleak> oh. i see what you meant.. sorry..
[15:06:52] <jepler> pcw_home: is there some reason I couldn't use a 10P IDC cable & connectors for 7i90?
[15:09:01] <jepler> I guess the 26p cable is now plentiful as a "raspberry pi gpio cable"
[15:17:23] <pcw_home> IDC connectors smaller than the deign size run into adjacent pins
[15:17:32] <pcw_home> design size
[15:18:28] <pcw_home> (due to the added width of the mechanism that hosts the 2 pieces together)
[15:18:37] <jepler> aha
[15:18:50] <jepler> I'm surprised I don't recall having found that out the hard way
[15:18:56] <pcw_home> you can use the individual wire blocks but then you have to get the right crimper
[15:20:01] <pcw_home> no reason you cant crimp 10 pin cable in a 26 pin IDC connector though
[15:20:39] <jepler> true
[15:22:24] <jepler> so the general shape of the odroid u3 is the same as their "IO shield", with these two 2mm connectors at the top http://dn.odroid.com/homebackup/U3%20IO%20SHIELD%20REV0.3.png
[15:22:43] <pcw_home> Thats how I tested the 7I90 SPI interface
[15:22:44] <pcw_home> 7I80HD --> 50 hdr --> 10 pin cable --> 26 hdr --> 7I90
[15:23:33] <jepler> I'm trying to decide whether to get the mechanical part OK I need to make my board as big as that (to use 4 holes for stability) is necessary, or whether to just make the board big enough to go to the top two screws (smaller, cheaper board)
[15:24:02] <archivist> compromise, 3 screws
[15:24:19] <jepler> archivist: the board house I'll use costs by the smallest enclosing rectangle, so I don't think I save
[15:24:35] <pcw_home> the 4 and 8 pin 2mm headers are the I/O connectors?
[15:24:39] <jepler> pcw_home: right
[15:25:05] <archivist> or something I saw inside a month or two ago, two screws and an application of a glue gun
[15:25:24] <jepler> they're on the bottom of the main board, so my board sits under it and the .100" header sticks out from that edge of the board
[15:25:45] <archivist> it was the high voltage flash supply in kids camera
[15:27:06] <jepler> archivist: nice work, they must have saved pennies that way
[15:27:14] <pcw_home> Hmm long enough for a couple 50 pin headers and a FPGA
[15:27:34] <jepler> pcw_home: don't start seeing a product in your mind's eye yet
[15:29:11] <jepler> anyway, the first tiny little board without any mounting provision ended up sitting like this, so it's easy to understand in retrospect why the female connectors on the u3 aren't happy anymore: http://emergent.unpythonic.net/files/sandbox/IMG_20140823_150751_027.jpg
[15:29:24] <jepler> so I'm thinking .. yes, enough board so it mounts at 4 points
[15:30:56] <pcw_home> Yeah misalignment is tough on the connectors
[15:31:50] <jepler> too bad shrouded 2mm male headers don't seem to be a thing
[15:32:54] <pcw_home> they are available but usually for IDC cables
[15:34:33] <pcw_home> not common for board-to-board (and dont help lengthwise alignment since the are sized for the longer IDC connectors)
[16:29:20] <jepler> I feel very guilty about all this unused board area http://emergent.unpythonic.net/files/sandbox/oddspi.png
[16:30:42] <pcw_home> needs a FPGA
[16:32:59] <jepler> from time to time I contemplate the idea of putting a hostmot2 firmware on a non-mesa card
[16:33:02] <jepler> .. but why would I?
[16:33:46] <jepler> "so that there's an Open Hardware mesa board" is an idea that tickles my fancy a bit
[16:33:55] <jepler> er, "open hardware hostmot2 board" I should say
[16:34:07] <pcw_home> its probably quite portable to any Xilinx FPGA
[16:34:51] <pcw_home> I am looking at trying to port it to the lattice ICE 40 series for lower cost simple things
[16:35:38] <jepler> are there many xilinx-specific bits?
[16:36:03] <pcw_home> Yes there are some xilinx primitives
[16:36:23] <pcw_home> SRL16s and DCMs mainly
[16:37:41] <pcw_home> probably some ways to genericise them
[16:41:57] <jepler> http://opensource.zylin.com/zpu.htm 442 LUT @ 95 MHz after P&R with 32 bit datapath and 32kBytes BRAM(example using Xilinx part).
[16:43:49] <jepler> stack based with some ops via microcode
[16:44:52] <jepler> bbl
[16:52:40] <pcw_home> Ive looked at the ZPU but its really really slow
[17:02:01] <cmorley> pcw_home: do dome 7i77 and 7i76 not have encoder counters in a different mode?
[17:15:37] <pcw_home> the MPG inputs on the field I/O default to 1x mode
[17:16:47] <pcw_home> the normal high speed hostmot2 encoder counters have 4x and step/dir mode (no 1x mode)
[17:18:17] <pcw_home> 1x mode makes sense for MPGs since you may want 1 count per detent
[17:18:18] <pcw_home> (and count between detents so its hard to "tease")
[17:20:22] <pcw_home> I think all 24 possible count modes are available via per channel EEPROM setup variables
[17:20:23] <pcw_home> But I think this is somewhat purposely undocumented
[17:34:25] <pcw_home> If you are asking about whether the MPG encoders are there, firmware versions before 12 didnt have the encoders
[17:34:27] <pcw_home> so the 7I76/77 remote firmware should be upgraded
[17:35:03] <pcw_home> (there is a script for doing this)
[17:46:39] <memleak> my email isnt going through >:(
[17:46:49] <memleak> wrong channel
[18:47:53] <cmorley> pcw_home: I suppose the firmware version is shown in dmesg. is the script to upgrade the 'mesaflash' program?
[18:48:44] <cmorley> It would be nice to have x1 mode on hi-speed encoders too.
[18:50:04] <cmorley> I am updating pncconf trying to add the modes that include the field IO MPG counters - when I went to test on my hardware - there are no counters. I guess I need to update them. Thanks.
[20:47:46] <pcw_home> the script is in here, it uses setsserial which is in linuxcnc 2.6 or later
[20:47:47] <pcw_home> http://www.mesanet.com/software/parallel/sserial.zip
[21:45:24] <pcw_home> Oh and the remote firmware version is a HAL parameter
[22:01:16] <micges> pcw_home: hi
[22:02:22] <micges> eta of 7i92 and 7i54?
[22:04:44] <pcw_home> kits being put together now so maybe a couple weeks for 7I54 and a month for 7I92
[22:05:37] <micges> good
[22:07:52] <jepler> what's i54?
[22:09:22] <micges> 6x small 3A H bridge
[22:09:42] <micges> iirc 30V
[22:10:16] <micges> jepler: btw great work with spi
[22:14:08] <jepler> micges: it was easy with hm2_eth to study, and the queued I/O stuff helped
[22:14:19] <jepler> micges: still, it's not giving good RT performance on my box so there's more work ahead
[22:21:30] <micges> glad it helps
[22:24:55] <jepler> I guess I'll just put some art in the blank area http://emergent.unpythonic.net/files/sandbox/art.png
[22:26:07] <micges> haha
[22:30:59] <jepler> wow these are art http://boldport.blogspot.com/2014/02/so-you-want-to-manufacture-your-printed.html
[22:41:20] <CaptHindsight> jepler: we getting some odd latency test results with isolcpus with 3.14 64b RTAI http://wiki.linuxcnc.org/cgi-bin/wiki.pl?The_Isolcpus_Boot_Parameter_And_GRUB2#Latency_test_results_for_isolcpus
[22:42:42] <CaptHindsight> we also need to find a utility that accurately tells us what core each process is actually running on
[22:46:34] <jepler> CaptHindsight: for userspace processes, cpuset can display the mark of CPUs where the process is allowed to run.
[22:48:51] <jepler> one of the fieids in /proc/<PID>/stat is "which CPU the task is scheduled on"
[22:50:02] <jepler> field number 39 by my count
[22:51:34] <jepler> (1-based)
[22:52:02] <jepler> er taskset, not cpuset
[22:52:48] <jepler> and setting process affinity via taskset and then retrieving it via /proc/PID/stat matches
[22:54:14] <jepler> and of course the default affinity mask matches isolcpus=
[22:55:19] <jepler> and running a CPU-bound program when RT is (de)tuned to use 50% CPU showed 50% lower throughput of userspace program when bound to the same CPU that the rtai realtime thread was bound to
[22:55:37] <jepler> I'm pretty sure the CPU numbers in taskset, /proc, and rtai really do agree
[22:56:48] <jepler> and even in your new table, it looks like the advice "isolate both CPUs in a pair in bulldozer+" correctly predicts the latency figures you got
[22:57:17] <jepler> you must have misplaced a digit in quoting some of those numbers, though
[22:57:41] <jepler> > 172,081,81
[22:59:10] <CaptHindsight> let me fix those
[23:01:19] <CaptHindsight> updated
[23:03:00] <CaptHindsight> memleak also tested without the isolcpus parameter and every few reboots latency would be as low as if isolcpus was used and set to =2,3
[23:04:02] <CaptHindsight> so the proposed master branch might randomly choose cores for real time if it's not forced
[23:04:14] <CaptHindsight> by isolcpus=
[23:04:59] <jepler> no it doesn't
[23:05:01] <jepler> read the code
[23:05:11] <jepler> the code does not look at isolcpus= in selecting what CPU to bind to
[23:05:17] <jepler> the code binds to the highest numbered CPU
[23:05:41] <memleak> what section of the code?
[23:05:50] <memleak> file, line area?
[23:06:10] <jepler> rtai_rtapi.c in the linuxcnc source tree
[23:06:14] <memleak> ok
[23:06:15] <jepler> 187 /* on SMP machines, we want to put RT code on the last CPU */
[23:06:16] <jepler> 188 n = NR_CPUS-1;
[23:06:16] <jepler> 189 while ( ! cpu_online(n) ) {
[23:06:16] <jepler> 190 n--;
[23:06:16] <jepler> 191 }
[23:06:18] <jepler> 192 rtapi_data->rt_cpu = n;
[23:06:43] <jepler> also while linuxcnc is running, if you have working *rtai* /proc, you can read there the CPU that the realtime task actually bound to
[23:06:53] <jepler> it matches what the algorithm's comment says: last (highest numbered) CPU
[23:07:05] <memleak> ok
[23:07:11] <jepler> (a cpu is "online" even if it is in isolcpus, at least in kernels I tested)
[23:08:30] <jepler> (mostly 3.2 preempt-rt and 3.4 rtai in this go-around)
[23:09:12] <jepler> /proc/interrupts is interesting too, it'll tell you how often each IRQ has been delivered to each CPU
[23:10:02] <jepler> goodnight
[23:10:38] <CaptHindsight> so does setting isolcpus=3 keeps other (non real time) threads from using core 3?
[23:10:51] <CaptHindsight> tomorrow
[23:11:44] <jepler> boot without isolcpus, and then with isolcpus=3, and look at the cpu mask of your general linux processes
[23:11:47] <jepler> they'll exclude CPU 3
[23:12:00] <CaptHindsight> latency will change
[23:12:01] <jepler> some processes are still allowed to be scheduled on CPU 3 though, because linux wants to
[23:12:21] <CaptHindsight> 3 out of 4 tries it will be poor and then it will be really low
[23:12:22] <jepler> e.g., taskset -pc $$ or taskset -pc 1
[23:12:55] <CaptHindsight> thats another odd behavior
[23:14:12] <CaptHindsight> g'nite