Menu

Main Menu
Talk Get Daily Search

Member's Online

    User Name
    Password

    Lots of ECC errors in dmesg

    Reply
    sponka | # 1 | 2013-08-28, 22:24 | Report

    Hello,

    I ran dmesg and got quite a lot of errors like this:

    Code:
    correctable ECC error = 0x5555, addr1 0xa, addr8 0x0
    Complete output is here: https://dl.dropboxusercontent.com/u/1420887/dmesg.txt

    Rebooted approx. 3 times and output is always like this.

    My N9 is a bit older than a year and I really hope that doesn't mean it's failing?

    Thanks,
    b.

    Edit | Forward | Quote | Quick Reply | Thanks

     
    rainisto | # 2 | 2013-08-29, 08:24 | Report

    its a feature, just ignore those lines

    Edit | Forward | Quote | Quick Reply | Thanks
    The Following User Says Thank You to rainisto For This Useful Post:
    sponka

     
    Fuzzillogic | # 3 | 2013-08-29, 16:45 | Report

    I've had them too. But it disappeared. Flash memory will fail, that's why the ECC is there. I guess at some point the controller will swap the faulty block for a fresh spare one, so the errors go away.

    Edit | Forward | Quote | Quick Reply | Thanks

     
    wicket | # 4 | 2013-08-30, 01:50 | Report

    Originally Posted by Fuzzillogic View Post
    I've had them too. But it disappeared. Flash memory will fail, that's why the ECC is there. I guess at some point the controller will swap the faulty block for a fresh spare one, so the errors go away.
    Those errors relate to main memory, not flash memory. It basically means that a bit flip was detected and corrected. It does not mean the memory failing and won't affect performance unless you are getting at least somewhere in the region of tens of thousands of errors a day. There are a number of reasons why correctable memory errors may occur. They can even be caused by cosmic rays! Don't worry about about them.

    Edit | Forward | Quote | Quick Reply | Thanks

     
    Fuzzillogic | # 5 | 2013-08-30, 12:20 | Report

    Correct me if I'm wrong, but I looked for those errors in code and they originated from a piece of code used for Samsung's OneNAND, which is flash memory.

    AFAIK there's OMAP's 512MiB internal/embedded flash, and the 16/64GB "external". Is that what you meant?

    I've read that worn out flash cells could be revitalized by heating them. You can try putting your device in the oven (kidding here ofc. But flash-heating is a valid way to fix it.)

    Edit | Forward | Quote | Quick Reply | Thanks
    The Following 2 Users Say Thank You to Fuzzillogic For This Useful Post:
    peterleinchen, wicket

     
    mikecomputing | # 6 | 2013-08-30, 22:41 | Report

    Originally Posted by Fuzzillogic View Post
    Correct me if I'm wrong, but I looked for those errors in code and they originated from a piece of code used for Samsung's OneNAND, which is flash memory.

    AFAIK there's OMAP's 512MiB internal/embedded flash, and the 16/64GB "external". Is that what you meant?

    I've read that worn out flash cells could be revitalized by heating them. You can try putting your device in the oven (kidding here ofc. But flash-heating is a valid way to fix it.)
    That techonlogy will never appear. Simply because manufactors want not to sell products that lives as long as "end of human civilization" that would kill theyr bussiness. Because they need us to buy new products all the time...

    Edit | Forward | Quote | Quick Reply | Thanks

     
    juiceme | # 7 | 2013-08-31, 11:54 | Report

    Originally Posted by mikecomputing View Post
    That techonlogy will never appear. Simply because manufactors want not to sell products that lives as long as "end of human civilization" that would kill theyr bussiness. Because they need us to buy new products all the time...
    Well I'd say this does not relate to flash memory technologies, fortunetely.
    See, the densities are growing anyway so mfg's will offer larger capacity devices all the time, obsoleting the smaller devices. There's no need to obsolete devices by building in faults...

    Edit | Forward | Quote | Quick Reply | Thanks
    The Following User Says Thank You to juiceme For This Useful Post:
    XiliX

     
    wicket | # 8 | 2013-08-31, 20:57 | Report

    Originally Posted by Fuzzillogic View Post
    Correct me if I'm wrong, but I looked for those errors in code and they originated from a piece of code used for Samsung's OneNAND, which is flash memory.
    Thanks for pointing that out. They do indeed originate from OneNAND. I should really have looked at the attached dmesg output before posting. My post came from previous experience having seen main memory ECC errors in hundreds of servers (before flash memory was commonplace) and it never occurred to me that ECC would now be available in flash memory devices.

    The same ECC principles should still apply though regardless of whether the memory is volatile or non-volatile.

    Interestingly enough, OneNAND is actually known as "fusion" memory which not only consists of flash memory but also includes a 5KB SRAM buffer (as well as controller logic and hardware ECC) on the same chip so it's possible (but not likely) that the errors come from the SRAM buffer.

    Edit | Forward | Quote | Quick Reply | Thanks
    The Following User Says Thank You to wicket For This Useful Post:
    Fuzzillogic

     
vBulletin® Version 3.8.8
Normal Logout