Reply
Thread Tools
Posts: 330 | Thanked: 556 times | Joined on Oct 2012
#11
An update on my latest tests, which is quite positive.

The key (which in hindsight is quite logical) is to run fsck.ext3 after mkfs.ext3, and to run it with the -c flag (very important).

What the -c flag in fsck.ext3 does is to check for bad blocks and add them to the bad blocks list. This means you are "blanking out" the bad blocks from the point of view of the filesystem, thereby creating a virtual eMMC without error.

My mistake was believing that simply running mkfs.ext3 with the -c flag would be enough. It is not. You need to run fsck.ext3 with the -c flag a few times, until no bad blocks are reported.

Then, run fsck.ext3 a few more times without the -c flag. Reboot the N900, and run again.

The exact invocation line I used is:
fsck.ext3 -vfc /dev/<your_partition>

Then, when no bad blocks are reported,
fsck.ext3 -vf /dev/<your_partition>

(NOTE: Omitting the -a flag in both command lines above is on purpose, as I didn't want to automatically answer "yes" to all questions, and I wanted to see what was going on. Doing this, however, will be time consuming and tedious if there is a lot of issues to change, especially when a large number of errors is found. Once you are comfortable, it's probably better to add the -a flag as well).

In my experience, you can get to a point where this last command keeps reporting the same error (for example, invalid inode_resize). When that happens, reboot the N900 and run it again.

I don't know if this kind of thing is implemented to run in the background in the default configuration. But if you change the partition table (especially if you add custom partitions that are not in the default configuration) you should probably run these checks.

Out of the 2 ext3 partitions I was testing yesterday, I could only copy 1 instance of the large file in the second partition, and none in the first partition.

I just copied 2 instances of the large 3GB file to each partition, and successfully ran md5sum on each of those instances.

And, I'd like to add that it's not a bad idea to do this, as if you get some bad sectors that your filesystem can't account for, you will lose data (and you won't even realize it's happening).

To check if you have bad blocks (just using the read test), the command is (as root):
badblocks -b 1024 -sv /dev/<your_partition>

If for some reason your partition has blocks of size other than 1024 bytes, change it. -sv is what I use in order to get verbose feedback in the terminal as the command is running. Don't be startled if you see bad blocks. But if you do, it's time for some fscking

And, if your ext3 system reports no bad blocks, it doesn't mean the eMMC has no bad blocks. It just means that the filesystem has added them to its list, and is avoiding them. I think it's likely the case every single N900 in existence has some bad blocks, even from the factory.

Last edited by malfunctioning; 2014-10-24 at 00:01.
 

The Following 2 Users Say Thank You to malfunctioning For This Useful Post:
javispedro's Avatar
Posts: 2,355 | Thanked: 5,249 times | Joined on Jan 2009 @ Barcelona
#12
Originally Posted by malfunctioning View Post
It just means that the filesystem has added them to its list, and is avoiding them. I think it's likely the case every single N900 in existence has some bad blocks, even from the factory.
Yes, every N900 has bad blocks from the factory. However, as said, the eMMC is the one doing wear level and error correction. If you're seeing "bad blocks" at the filesystem level, something is broken, period. No one's N900 is "silently corrupting" data every so often.

You're seeing what looks very much like random read errors, which points to cables or contact points or sth else like that. The kernel message log (ie dmesg) will have additional info.

Last edited by javispedro; 2014-10-24 at 15:12.
 

The Following 4 Users Say Thank You to javispedro For This Useful Post:
Posts: 330 | Thanked: 556 times | Joined on Oct 2012
#13
Originally Posted by javispedro View Post
Yes, every N900 has bad blocks from the factory. However, as said, the eMMC is the one doing wear level and error correction. If you're seeing "bad blocks" at the filesystem level, something is broken, period. No one's N900 is "silently corrupting" data every so often.

You're seeing what looks very much like random read errors, which points to cables or contact points or sth else like that. The kernel message log (ie dmesg) will have additional info.
I think you are absolutely right, thank you for pointing this out.

My N900 has been so reliable for the past 3 years, that I wasn't expecting this sort of problem. This is my main N900, in constant use. Only flashed a couple of times.

I did drop the phone twice to concrete. The Otterbox case protected the phone to the point that it looks as new externally, but it is obvious that the possibility of microfracture somewhere in the eMMC chip is real.

I ran badblocks on my 8GB MyDocs FAT32 partition, and I got 3474 bad blocks (3.5MB out of the 8GB). The pattern exhibits clusters of bad blocks (clusters of a few hundred bad blocks at a time, mainly, so flash wear doesn't fit the pattern.

Also, I ran the same test on another N900's 27GB MyDocs partition, and I got 0 bad blocks. This one is running CSSU Thumb and mildly overclocked (500Mhz min / 805Mhz max).

For now, what I will do is attempt to get around this issue by manually running fsck.ext3 and tagging bad blocks into the filesystem table. I will monitor the progression of bad blocks as well, to see if new ones develop (which I expect to be the case).

Thank you again for clarifying that in a healthy N900 the eMMC chip itself manages bad blocks transparently to the filesystem.
 
Posts: 330 | Thanked: 556 times | Joined on Oct 2012
#14
I might end up putting all this information in a wiki page for reference.

The two ext3 partitions that were giving me trouble are now reporting 0 bad blocks. I have copied large files covering these whole partitions several times, and no new bad blocks have appeared yet.

A couple things worth noting:
1. Sometimes, fsck.ext3 doesn't report the same number of bad blocks as badblocks, even though from my reading of the manpages it looked as if fsck.ext3 was calling badblocks from the backend.

2. Given (1.), the preferred approach would seem to first do:
badblocks -b 1024 -o ./badblocklist /dev/<my_partition>
and then:
fsck.ext3 -b 1024 -l ./badblocklist /dev/<my_partition>

However, this not always works. Sometimes I have gotten a "this bad block is out of range" from fsck.ext3, which kind of explains why the number of bad blocks reported by fsck.ext3 and badblocks is not necessarily the same.

Now I'm fixing my /home partition, and I'm testing a different approach: First run badblocks and save the bad block list to a file, then instead of running fsck.ext3, run mkfs.ext3, providing it the bad block file generated by badblocks:
mkfs.ext3 -b 1024 -l ./badblocklist /dev/<my_partition>
We'll see if this approach works.

3. If you intend to fix your MyDocs or /home partitions, first back up all the data:
cp -ax /home/user/MyDocs/* /path/to/directory1/
cp -ax /home/* /path/to/directory2/
(Where the backup directories are in your Linux computer).
Note: The -x flag is not necessary if you are running these commands from a Linux computer, since the external computer only mounts the filesystems corresponding to the partitions themselves, and none of the other filesystems the N900 mounts.
If you back up the data from the N900, things get a little more complicated, since
cp -ax /home/*
wouldn't copy /home/opt, because that is mounted under /opt by the N900. So you would have to do that as well.

Then, when the partitions are fixed, copy the data back from your Linux computer to your N900.

4. When all else fails, since bad blocks resulting from damage to eMMC (at least in my case) seem to be in very defined clusters, you can repartition to avoid using the bad blocks, leaving the bad block clusters in unallocated space.

I'm in the process of running (2.)/(3.). We'll see how that goes.

Last edited by malfunctioning; 2014-10-25 at 16:42.
 
Posts: 1,258 | Thanked: 672 times | Joined on Mar 2009
#15
As for typical dd and emmc, keep in mind that the typical block size is much larger than filesystem blocksize. The physical block, when erased, can only be written to once, before it must be erased again.

What does this mean for badblocks? If a read test finds 1k bad, and you avoid using that sector, but write to anywhere else inside same physical block, it will put that physical block aside, and copy contents to a new one, along with the requested modification. The bad sector is now invisible, but will reappear elsewhere once that same physical block comes up in rotation again.


What's the size of the physical block? That's vendor proprietary and secret information. On the order of 512k to 16M though.

As for edt2 vs ext3, in my experience, ext2 is slower, which suggests it's triggering many read-modify-write cycles, causing excessive write amplification.
 

The Following 3 Users Say Thank You to shadowjk For This Useful Post:
Posts: 330 | Thanked: 556 times | Joined on Oct 2012
#16
Originally Posted by shadowjk View Post
As for typical dd and emmc, keep in mind that the typical block size is much larger than filesystem blocksize. The physical block, when erased, can only be written to once, before it must be erased again.

What does this mean for badblocks? If a read test finds 1k bad, and you avoid using that sector, but write to anywhere else inside same physical block, it will put that physical block aside, and copy contents to a new one, along with the requested modification. The bad sector is now invisible, but will reappear elsewhere once that same physical block comes up in rotation again.


What's the size of the physical block? That's vendor proprietary and secret information. On the order of 512k to 16M though.

As for edt2 vs ext3, in my experience, ext2 is slower, which suggests it's triggering many read-modify-write cycles, causing excessive write amplification.
Thank you for explaining the difference between physical block and filesystem block. When fsck marks a block as a bad block, that's obviously just the filesystem block.

From what you say, if tagging a block at the filesystem level as bad (what badblocks does) doesn't prevent the eMMC logic from using the containing physical block, then it looks as if tagging bad filesystem blocks on eMMC memory is pointless. As a matter of fact, even if we could tag every single filesystem block within that same physical block as bad, that wouldn't make any difference to the eMMC, since it would still use that physical block. Did I understand correctly?

If so, then badblocks seems pointless (except to determine if your eMMC is bad, and if so, to what degree). Also, the implication is that if a single bad block is reported at the filesystem level, like Javispedro pointed out, then there is no way to avoid it.

Thank you again for explaining how this works, shadowjk. I won't waste my time anymore.

I am going to continue using my N900, however, Obviously, after the previous discussion I don't need to point out that my attempts to rid my home partition from (filesystem) bad blocks were futile. I have a very small fraction of bad blocks, which means that I could experience sporadic program failures or data corruption. As long as I understand that and accept it, I think I can use the phone satisfactorily for many tasks.
 
Posts: 330 | Thanked: 556 times | Joined on Oct 2012
#17
One more question, shadowjk. This is regarding your comment regarding typical dd. Since I'm going to assume dd does not buffer write operations (and it doesn't even know what the physical block size is anyway), does this mean dd would cause a huge amount of physical block write operations, and therefore it's probably not the best thing to do on eMMC?
 
pichlo's Avatar
Posts: 6,445 | Thanked: 20,981 times | Joined on Sep 2012 @ UK
#18
That depends on the block size (the -bs parameter). It is generally a good idea to set it as large as possible, MMC or not.

BTW I am not sure about erase blocks going as large as shadowjk indicated. Wikipedia mentions sizes up to 512kB, although it may not be entirely up to date. 16MB seems a bit excessive.

I usually set the block size (-bs) to something between 1 and 4 MB, depending on how I feel on the day. YMMV.
 

The Following User Says Thank You to pichlo For This Useful Post:
Posts: 330 | Thanked: 556 times | Joined on Oct 2012
#19
Originally Posted by pichlo View Post
That depends on the block size (the -bs parameter). It is generally a good idea to set it as large as possible, MMC or not.

BTW I am not sure about erase blocks going as large as shadowjk indicated. Wikipedia mentions sizes up to 512kB, although it may not be entirely up to date. 16MB seems a bit excessive.

I usually set the block size (-bs) to something between 1 and 4 MB, depending on how I feel on the day. YMMV.
How careless of me to forget about the block size flag. Thanks for reminding me of it!
 
Posts: 1,258 | Thanked: 672 times | Joined on Mar 2009
#20
Dd was a typo.

Although we don't really know the exact algorithm used by the emmc, I believe if we actually tag the whole physical block, it's unlikely it wojld be used again.

A bigger issue is the blocks not currently in use, that might or might not have bad areas.
 

The Following User Says Thank You to shadowjk For This Useful Post:
Reply

Thread Tools

 
Forum Jump


All times are GMT. The time now is 14:05.