#avr Logs

Sep 02 2017

#avr Calendar

12:05 AM Casper: I don't use it but what's your actual issue?
12:13 AM enh: the code I 'm debugging is driven by the spi interrupt
12:13 AM enh: otherwise it loops
12:33 AM day__ is now known as daey
06:09 AM Emil: enh Casper of course not
06:10 AM Emil: Casper: the thing is picky as fuck
06:10 AM Emil: I hate it
06:10 AM Emil: Also I hate dacs with passion
06:12 AM Emil: My code is solid though
06:59 AM _ami_: comptroller: hey, did u solve your spi dma problem?
05:44 PM Lambda_Aurigae: https://www.maximintegrated.com/en/products/power/battery-management/MAX77950.html?utm_source=Eloqua&utm_medium=email&utm_content=MAX77950&utm_campaign=FY18_Q1_2017_AUG_Mobile-Power-AMER-PowerReceiver_EN
05:44 PM Lambda_Aurigae: super nifty chip.
05:44 PM Lambda_Aurigae: wireless power transmitter/receiver unit...
08:24 PM _enhering_ is now known as enh
08:27 PM enh: hi
08:27 PM Lambda_Aurigae: iH
08:29 PM NoHitWonder: hi
08:30 PM enh: you guys all right?
08:31 PM enh: do any of you know how to run avr-gdb?
08:32 PM NoHitWonder: no
08:33 PM Lambda_Aurigae: nope.
08:33 PM Lambda_Aurigae: I just write code that I know.
08:35 PM enh: I'm using simavr as an avr simulator, driven remotely by avr-gdb
08:35 PM Casper: I debug with my own code
08:35 PM Casper: and simulator suck hard, don'T use them
08:36 PM enh: I'd like to have an option.
08:36 PM enh: My option is installing window and avrstudio
08:36 PM enh: s
08:36 PM Lambda_Aurigae: undestand your code and debug the old fashioned way.
08:37 PM enh: tried for two weeks now
08:38 PM enh: I need a memory map
08:38 PM enh: I just do not know how to trigger an interrupt using gdb
08:38 PM Lambda_Aurigae: don't know that you can.
08:38 PM Lambda_Aurigae: you trigger it with the simulator.
08:39 PM Lambda_Aurigae: not sure how though.
08:39 PM enh: simavr has no documentation on that. or on anything else
08:39 PM Lambda_Aurigae: one must read the documentation.
08:39 PM cehteh: haha .. i toild that yesterday already
08:40 PM Lambda_Aurigae: use debug on the real hardware and feed it through gdb.
08:40 PM cehteh: i bet we'll talk in 1-2 weeks about the same issue without your code fixed :)
08:40 PM enh: hope not, cehteh
08:40 PM cehteh: the way you aproach it, yes
08:41 PM cehteh: you dont need more tools, you need to look at your code wich is already there and broken
08:41 PM enh: you must be right, then
08:41 PM Lambda_Aurigae: break the code down into discreet segments...work on one bit at a time.
08:41 PM enh: some bugs cannot be found by looking at code, cehteh
08:41 PM Lambda_Aurigae: make one thing work at a time.
08:42 PM Lambda_Aurigae: all bugs can be found by looking at the code if you know the hardware you are working with and have the proper documentation on said hardware.
08:42 PM cehteh: all bugs can be found by looking (and perhaps instrumentating) the code
08:42 PM enh: everything works. Just erratically.
08:42 PM enh: you are WRONG, cehteh
08:42 PM enh: definetely wrong.
08:43 PM cehteh: well it needs some experience and skill
08:43 PM cehteh: but you gain that over time
08:43 PM enh: compiler implementation problems, for example, are not your code's problem.
08:43 PM cehteh: i mean software bugs .. happen with one commit and disappear when you remove that commit, not flaky hardware bugs of course
08:43 PM enh: i have some programming experience, i must say.
08:44 PM Lambda_Aurigae: so, look at the assembly the compiler puts out
08:44 PM Lambda_Aurigae: the problem is in there.
08:44 PM cehteh: even if its not my fault (which happens extremely rarely) i can isolate the cause in my code and prolly work around
08:45 PM enh: like when you change the position of a statement and the behavor changes?
08:45 PM cehteh: its easy to blame the compiler/os/lib on faults .. but 99.999999999999% of all times its your bug (or mine .. )
08:45 PM cehteh: then there is a problem
08:45 PM enh: yep. there is.
08:45 PM cehteh: i mean problem in your code
08:45 PM cehteh: depending on what statement
08:46 PM enh: unfruitful discussion this one
08:46 PM cehteh: if course order of statements matter quite much
08:46 PM cehteh: and when you have some cli/sei around you better read the avr programmers manual once more and understand memory barriers
08:46 PM enh: really?
08:47 PM cehteh: and much more to learn
08:47 PM enh: always
08:47 PM Lambda_Aurigae: always more to learn.
08:47 PM Lambda_Aurigae: I'm constantly re-reading datasheets.
08:47 PM Lambda_Aurigae: not just skimming but cover to cover reading.
08:48 PM cehteh: some registers have to be used in a specific order, when you have threads, signals or interrupts there can always be race conditions
08:48 PM cehteh: and 'volatile' is often misunderstood too
08:49 PM Lambda_Aurigae: underused and overused
08:49 PM enh: This problem does not make sense. No sense at all. An important variable value is being changed in memory, not by any segment of my code.
08:49 PM cehteh: do you have the case that changing order of (obliviously unrelated) statements trigger that bug .. then you have a start
08:49 PM cehteh: well yes that makes sense
08:50 PM enh: stack overflow, buffer overflow...
08:50 PM cehteh: that tells me that you either overrun a buffer or fuck with pointers
08:50 PM cehteh: and yes .. you do, dont say you checked you dont
08:50 PM enh: i checked those many times. Can be a stack/heal collision
08:50 PM cehteh: because if you dont, then the program should work right?
08:50 PM enh: heap
08:50 PM cehteh: add a canary
08:50 PM Lambda_Aurigae: happens so many times.
08:51 PM cehteh: use asserts
08:51 PM Lambda_Aurigae: limited ram.
08:51 PM enh: i suspect that
08:51 PM Lambda_Aurigae: that's why I keep larger chips around for testing stuff like that..
08:51 PM cehteh: and dont use C++ :D
08:51 PM Lambda_Aurigae: atmega1284p is a great tool.
08:51 PM Lambda_Aurigae: 16K sram.
08:51 PM enh: give me a good reason for not using c++ and I will abandon it.
08:51 PM enh: Have not read a good reason yet
08:52 PM cehteh: also you may try to compile your program on your pc (kindof simulation) with stubs for hardware specific things
08:52 PM enh: I do not have one here, Lambda_Aurigae. I'd like to
08:52 PM cehteh: this way you can a) learn to write portable code wich works on 8, 32 and 64 bit platforms
08:52 PM enh: I can find this problem now on the simavr if I can triger the SPI interrupt.
08:52 PM cehteh: and b) easier test and implement alorithms
08:52 PM enh: but there is no documentation on how to do it.
08:53 PM cehteh: there are symbols for stack and heap end, o forgotten the names, read the avr-libc doc
08:53 PM Lambda_Aurigae: documentation on compiling code for different platforms?
08:54 PM cehteh: now add 2 functions one which fills the space between heap and stack with some canary (0xdeadbabe) ...
08:54 PM cehteh: another one which checks that these canarys are still ok
08:54 PM enh: no, docs on trigering hw interrupts on avr-gdb
08:54 PM cehteh: or just do pea counting .. how much heap do you need, how much stack do you need
08:55 PM cehteh: and dont use C++ ...
08:55 PM cehteh: you can not program a microcontroller with C++ truckloads
08:55 PM enh: cehteh: I have thousands of lines of code in c++.
08:55 PM enh: don't expect me to move them anywhere
08:55 PM cehteh: yes? .. and you have 1000 problems :)
08:56 PM enh: nope. I got one.
08:56 PM enh: and I will find it.
08:56 PM cehteh: sure .. just do
08:56 PM enh: boring, you are.
08:56 PM cehteh: nah, i am serious, i am positive this one can be fixed
08:57 PM enh: me too.
08:58 PM cehteh: but i am not so positive about your way to do is is long lasting, you may fix it this time, optimize some bits now .. but the same problem may/will appear once again soon
08:58 PM cehteh: depends on what it is of course
08:58 PM enh: i disagree
08:58 PM cehteh: maybe its just some race condition or logic error
08:58 PM cehteh: and did you put all your static strings and data in progmem?
08:59 PM enh: i have almost no strings or data
09:00 PM cehteh: calling functions is somewhat expensive too, look at the assembler output and check how much data it puts on the stack
09:00 PM enh: the module reads sensors, store read data (20 floats per sensor), do some calculations and make them available for SPI transfer when requested.
09:01 PM cehteh: when function calls are nested litte deeper you easily eat a lot ram
09:01 PM tpw_rules: are you using atomic sections correctly?
09:01 PM cehteh: and when you have some unbounded recursion then it eventually explodes for sure
09:01 PM enh: no recursions
09:01 PM tpw_rules: i bet this is an interrupt race condition
09:02 PM cehteh: ah and using floats on a AVR is the other case :D
09:02 PM cehteh: what sensors are that?
09:02 PM enh: the interrupt calls only trigger flags. On the main cycle the flags are processed. I minimized all processing in interrupts.
09:02 PM cehteh: no sensor i know outputs floats
09:02 PM tpw_rules: floats are sloooooooooooow
09:02 PM enh: tpw_rules: it is not
09:02 PM Lambda_Aurigae: floats work just fine...they are just expensive in time and memory.
09:02 PM tpw_rules: are you doing the hardware correctly?
09:02 PM cehteh: Lambda_Aurigae: and space :D
09:02 PM tpw_rules: like have you messed up some sequence in hardware registers and you're waiting for a flag that can't happen?
09:03 PM tpw_rules: that ate up like a week of my life with an stm32
09:03 PM enh: i do not wait for flags
09:03 PM cehteh: eh yes memory :)
09:03 PM tpw_rules: i think it was a silicon bug
09:03 PM tpw_rules: do you share any of your code?
09:03 PM cehteh: why do you use floats anyway .. what sensors are that?
09:03 PM Lambda_Aurigae: floats take a lot of ram....and floating point math routines take a lot of flash on avr.
09:03 PM enh: all open source, tpw_rules
09:03 PM tpw_rules: neat
09:04 PM enh: https://bitbucket.org/enhering/yauvc
09:04 PM enh: Lambda_Aurigae: i agree. I can change them for longs very easily.
09:05 PM enh: some sensors output floats, cehteh.
09:05 PM enh: others don't
09:05 PM cehteh: which ones?
09:05 PM cehteh: tell me
09:05 PM cehteh: i dont know any
09:05 PM enh: you are like a dog sometimes, cehteh
09:05 PM cehteh: i just want to know
09:06 PM enh: bmp085
09:06 PM cehteh: maybe i am wrong, but i havent seen any sensors outputting floats yet
09:06 PM enh: you read longs, but the calibration it suggests output floats
09:06 PM * cehteh checks datasheet
09:07 PM cehteh: on ARM you may use floats, these have enough power and most have hardware floating point
09:07 PM enh: we are missing the point here.
09:08 PM cehteh: otherwise you can (and should) do calibraiton in integer on avr
09:08 PM enh: I can process all the data as floats or as longs. That is not my problem now.
09:08 PM cehteh: no, when you overflow the stack, it may be already using floats
09:08 PM enh: I implemented both ways
09:08 PM enh: a float takes 4 bytes
09:08 PM enh: a long takes four bytes
09:08 PM cehteh: m(
09:08 PM tpw_rules: write a small asm stub that runs at __init or so then fills memory with a pattern
09:09 PM tpw_rules: dump it periodically to see usage
09:09 PM Lambda_Aurigae: processing floats takes more than processing longs
09:09 PM enh: much more
09:09 PM cehteh: AVR has no native floating point unit you pull in a massive library which emulates that
09:09 PM enh: like 10 times slower
09:09 PM tpw_rules: and ram too
09:09 PM cehteh: calling functions for each floatng point operation
09:09 PM enh: that may be a reason.
09:10 PM cehteh: see... first step would be to avoid floats
09:10 PM Lambda_Aurigae: I still suspect you are ending up with a stack/heap crash that is triggering a reboot.
09:10 PM enh: but now, for example, I disabled all sensor code, and the problem persists.
09:10 PM enh: no reboots.
09:10 PM cehteh: yes i suspect some other bug too
09:10 PM tpw_rules: is your power clean?
09:10 PM cehteh: trashing your memory
09:10 PM enh: I have a signal analyzer attached.
09:10 PM enh: power is clean
09:10 PM enh: i can see the SPI protocol working
09:10 PM cehteh: but still floats on avr are almost nogo
09:11 PM enh: but the transfer size byte is getting corrupted, or, when it works, the transfer data is corrupted.
09:11 PM cehteh: so debug your code step by step. analyze that faulty commit
09:11 PM cehteh: do you have a link to the diff of that commit?
09:11 PM cehteh: you spi in interrupts?
09:12 PM enh: i said before. it was a fucking major commit, because all was working before.
09:12 PM cehteh: yes still
09:12 PM enh: my slave modules are spi interrupt driven.
09:12 PM enh: the master module is not.
09:12 PM cehteh: no matter if the commit was 3 lines or 3000 .. somewhere there lies the bug
09:12 PM cehteh: the more you refuse to look at it the less you find it
09:12 PM enh: sure?
09:14 PM cehteh: often enough just looking at the diff helps
09:15 PM enh: I'm sure you have been through a condition in which you look at something a thousand times and cannot see the problem
09:16 PM enh: i feel no pleasure in this situation
09:16 PM cehteh: https://bitbucket.org/enhering/yauvc/commits/aa90bfae47c258b56f421b2024ba3249e5834c51/raw
09:16 PM cehteh: thats the commit?
09:16 PM cehteh: i told you yesterday i even binned such a commit (well maybe tag it) .. go back
09:17 PM cehteh: restart, step by step, test eveything
09:19 PM tpw_rules: you've deepened a bunch of calls
09:19 PM tpw_rules: i'm sure you're using far more stack
09:19 PM cehteh: bool COM::BuildModuleCapacityMatrix() {
09:19 PM cehteh: for (volatile uint8_t nSlot = 1; nSlot <= NUM_SLOTS; nSlot++) {
09:19 PM cehteh: the volatile there looks fishy btw :D
09:20 PM cehteh: char achNumber[NUMBER_BUFFER_SIZE];
09:20 PM cehteh: what is NUMBER_BUFFER_SIZE ?
09:22 PM enh: #define NUMBER_BUFFER_SIZE 15
09:23 PM enh: is a buffer I use to translate numbers to strings and send them via uart
09:23 PM tpw_rules: this is a 328p?
09:23 PM cehteh: your code looks a bit flaky there (gut feeling i havent checked it)
09:23 PM cehteh: isolate that function and write tests
09:23 PM cehteh: yes i see that
09:24 PM tpw_rules: this just feels like way too much for one
09:24 PM cehteh: yes :)
09:24 PM enh: sorry, tpw_rules. Wayy too much for one what?
09:24 PM cehteh: commit
09:24 PM tpw_rules: an atmega328p
09:24 PM enh: yep
09:24 PM tpw_rules: the complexity of the code and solution
09:25 PM tpw_rules: i still think stack is a problem. you've added a lot more pointers
09:25 PM enh: you are looking at the COM module code. This module is behaving well
09:25 PM tpw_rules: yeah but when others call into it, it's using more stack
09:25 PM cehteh: well how can we kniow
09:26 PM enh: i believe that too. tpw_rules.
09:26 PM enh: each module runs on an avr
09:26 PM enh: the AMGP module is the problem
09:27 PM tpw_rules: why didn't you say that earlier
09:27 PM cehteh: you just converted a lot shit to floats? :D
09:27 PM cehteh: i'd recommend to scrap that commit and start over
09:27 PM cehteh: AVR's are not meant to handle floats well, esp not for what you are doung
09:27 PM Lambda_Aurigae: I don't see that that chip puts out floats...puts out longs, yes, but no floats that I saw in a quick peruse of the datasheet.
09:28 PM tpw_rules: one float doesn't consume any more data than a long, but it's far more generated code per manipulation, and far more stack depth
09:29 PM enh: that may be the problem
09:29 PM cehteh: a flight controller loop is somewhat timing critical and you want headroom
09:29 PM tpw_rules: the datasheet even helpfully offers integer math solutions
09:29 PM enh: i need to be sure it is a stack/heap collision
09:29 PM cehteh: i doubt you will ever get a decent loop time with floating point
09:29 PM cehteh: nah you dont .. you need to find the bug
09:30 PM tpw_rules: enh: hook __init and write a little asm routine to fill memory with a pattern
09:30 PM cehteh: (or go back that commit and start anew)
09:30 PM enh: I'm getting 10 HZ with floats. With longs, I get 100 HZ
09:30 PM tpw_rules: then dump the memory
09:30 PM cehteh: yes i told already you can check the stack
09:30 PM cehteh: that costs a bit but is possible
09:30 PM tpw_rules: or switch to a cortex m3 for like 20 cents more and you can use floats all you want
09:31 PM enh: tpw_rules: I don't know how to write such a routine. I'll have to find out how.
09:31 PM tpw_rules: although hm i don't think many of the smaller cortex cores have hardware floats
09:31 PM tpw_rules: but they're far more performant
09:31 PM cehteh: http://www.nongnu.org/avr-libc/user-manual/index.html
09:31 PM cehteh: there is a page of the memory layout somwhere there
09:31 PM tpw_rules: https://pastebin.com/VGM8v02A
09:32 PM enh: i have PCBs ready fot atmega328p now. When this thing lifts off, I'll invest in stm32 modules.
09:32 PM tpw_rules: just link that in
09:32 PM cehteh: http://www.nongnu.org/avr-libc/user-manual/malloc.html tada
09:32 PM enh: tpw_rules: https://hackaday.io/project/11724-yauvec-yet-another-unmanned-vehicle-controller
09:32 PM cehteh: http://www.nongnu.org/avr-libc/user-manual/malloc-std.png
09:33 PM enh: cehteh: I've seen that.
09:34 PM enh: see, the module talks to other modules via SPI, when called.
09:34 PM cehteh: so you fill the space between heap and stack first maybe as easy as memset(.. 0x5a)
09:34 PM enh: I can set it to give me the stack/heap difference
09:34 PM cehteh: and you add a routine which checks if these 0x5a vanish or not
09:34 PM enh: but it is not sending the data out...
09:34 PM cehteh: you dont need to send data out
09:35 PM enh: ...
09:35 PM tpw_rules: enh: https://pastebin.com/VGM8v02A link in this code
09:35 PM cehteh: as soon you detect a fault you just go into a while(1){blink_led} loop
09:35 PM tpw_rules: it fills up the memory at boot
09:35 PM cehteh: this is only debugging
09:35 PM tpw_rules: then just dump out memory
09:36 PM cehteh: you can also regulary check the heap and stack pointer if they get close together and then issue a warning
09:36 PM tpw_rules: cehteh: i don't think that's a useful metric
09:36 PM tpw_rules: he's not doing malloc, i don't think
09:36 PM enh: no mallocs
09:37 PM cehteh: and .. other question: do you use new/delete or malloc()/free() ?
09:37 PM tpw_rules: and the stack pointer is going to be way shallower than in the gnarly data processing stuff
09:37 PM tpw_rules: enh: https://pastebin.com/VGM8v02A link in this asm file and then dump the memory after doing hard work
09:37 PM cehteh: well then then heap pointer stays static (no allocation)
09:37 PM enh: I will, tpw_rules.
09:37 PM tpw_rules: look for where that 0xDEADF00D pattern doesn't exist to see what memory was used
09:37 PM cehteh: that will be easy
09:37 PM cehteh: yes
09:38 PM enh: sorry... I do two mallocs. I redefined new.
09:38 PM tpw_rules: if you don't have many left, get suspicious
09:38 PM cehteh: much easier
09:38 PM cehteh: redefined to what?
09:38 PM tpw_rules: mallocs are "fine" as long as you don't free. but there's more overhead
09:38 PM enh: void * operator new(size_t size) {
09:38 PM enh: return malloc(size);
09:38 PM enh: }
09:38 PM enh: i do not free them.
09:38 PM cehteh: how often do you call them?
09:38 PM enh: I got two classes that must be shared beteen sensors. TWI and MSPIM
09:39 PM enh: call them once. No more.
09:39 PM cehteh: ok
09:39 PM enh: no fragmentation.
09:39 PM tpw_rules: statically allocate them and you'll save some bytes
09:39 PM cehteh: wel then write that canary after the allocation and check it regulary
09:40 PM cehteh: and yes static alloc .. saves some memory (no need to link malloc no datastructures for managing memory)
09:40 PM tpw_rules: i wonder what causes it to reboot when you run over the stack
09:40 PM tpw_rules: it seems very good about that
09:40 PM tpw_rules: is there some built in protection?
09:45 PM cehteh: huh? most bugs lead (coincidentally) to a reboot
09:45 PM tpw_rules: i mean yeah but like that never happens on a nes
09:45 PM tpw_rules: which is what i've programmed a lot of
09:45 PM Lambda_Aurigae: and in your reboot, you can check things to see what caused the reboot/bootup.
09:46 PM tpw_rules: crashes almost always end in an infinite loop
09:46 PM cehteh: can happen on AVR too
09:46 PM Lambda_Aurigae: hence watchdog timer.
09:46 PM tpw_rules: i mean but like the core extremely reliably ends up resetting cleanly
09:46 PM cehteh: but often you end up with some illegal thing which causes a jump to the reset vector
09:47 PM tpw_rules: oh so that is the core behavior
09:47 PM cehteh: there is no special magic on AVR's
09:47 PM Lambda_Aurigae: illegal operation,,,reset.
09:47 PM cehteh: well the havard architecture prevents that code gets overwritten
09:47 PM tpw_rules: yeah
09:47 PM cehteh: but you run with bad/undefined data which eventually pukes
09:48 PM tpw_rules: but how does "puke" always end up at the reset vector
09:48 PM cehteh: so no illegal operation but something else, stack oveflow, access out of bound, division by zero etc
09:48 PM cehteh: just coincidence
09:48 PM cehteh: it is not always
09:48 PM cehteh: unless you have the watchdog running
09:49 PM tpw_rules: yeah
09:49 PM cehteh: all undefined vectors point either into nirvarna or to reset
09:49 PM cehteh: either causes a reboot
09:50 PM cehteh: but you can have hung loops as well
09:50 PM tpw_rules: why would nirvana cause a reboot
09:50 PM cehteh: thats illegal instruction (out or address range) or in the middle of some other code then
09:50 PM tpw_rules: does it not have open bus or something?
09:51 PM cehteh: dunno have to check the details
09:51 PM tpw_rules: well so it must be defined that illegal instructions cause a processor reset
09:51 PM cehteh: but i agree that it most often goes into reboot, but thats (maybe intentional) coincidence
09:51 PM cehteh: there are no much illegal instructions
09:52 PM tpw_rules: in 6502, illegal instructions either do something weird but continue normally, or hang the instruction decoder, forcing a reset
09:52 PM cehteh: some instructions take more than 1 word so you can accidentally jump inbetween
09:52 PM tpw_rules: jumping into nirvana either gets you open bus, or a mirror of somewhere else
09:52 PM cehteh: that will crash
09:52 PM tpw_rules: or large data tables in rom
09:52 PM cehteh: or that
09:52 PM tpw_rules: i implemented a scientific calculator and yeah i ran into out of stack space
09:52 PM cehteh: but otherwise there is some code .. which then operates on undefined data
09:53 PM tpw_rules: i had to revamp the data architecture
09:53 PM tpw_rules: (that was an ultra fun project btw. 40 digits of trig precision from like 400 bytes!)
09:53 PM tpw_rules: cordic
09:54 PM tpw_rules: float math sucks :)
11:02 PM woddy2 is now known as woddy