Sunday, April 27, 2008

Cleverly Repair Large Corrupted Files with BitTorrent Client Checksum Hash Scans


I admit, I have a very fast broadband connection at home. My link speed is more than 6 Mbps. You’d think I can’t complain. But there are some files that seem to download for an eternity. Take, for instance, the Windows Vista Beta. On my connection, that 4 GB bloatware *.ISO mammoth took about two hours to download. Even, when I used DownThemAll!, it took about an hour and a half! That is way too much time to spend for downloading.

hash.jpgCorrupted *.ISO Image File
When I was writing one of my most recent articles on free T-Mobile Hotspots and Microsoft Windows Beta virtualization, I needed a copy of Windows Vista Beta. I wasn’t exactly looking forward to another 2-hour wait of download time. So, I checked my stack of burned CDs and DVDs for an old copy. To my good luck, I found the original copy of Vista. I just had to rip it to my hard drive for faster virtualization. Unfortunately, there were about 5 unrecoverable disk sector read errors. “Maybe,” I rationalized, “Vista won’t really care if there are just a couple of corrupted data bytes. Besides, I just need it for one small task.” I was wrong. The Windows Vista installation program actually performs a corruption test to make sure everything is set before actually proceeding with the installation.

To think that 4 tiny sectors in the DVD hampered my schemes of exploiting T-Mobile! Would I have to redownload the ISO image online to continue with my plans? No! Fortunately, I found a forum thread from Locker Gnome (bless Chris Pirillo) that helped deal with corrupted Vista ISO image error 80070241 code. The solution was simply ingenious!

BitTorrent Checksum Hash Scans
To repair large corrupted files, you have to understand how hash scans and torrents work. A *.torrent file contains two important pieces of information, the torrent tracker address and the hash key. The address of the torrent tracker contains all the IP addresses that have pieces of the desired file. The torrent client downloads data segments of the desired file from different IP addresses referred to by the tracker. Depending on the size of the desired file (in my case 4 GB), there can be hundreds of file segments to download. After each segment is downloaded, the torrent client uses the hash key to validate the data so each file segment is exactly the same as the one in the original file. This ensures that any corrupted data, dummy data, or malicious data isn’t mixed into the final product.

utorrent_hash.jpg

Repair Files with the Checksum
This hash check occurs several times in a torrent client but the most important time, in this case, is when the program starts up. Before clients begin to download, upload, or seed a torrent, the programs do a complete hash check of the data already downloaded and available. With large corrupted files, you can take advantage of this hash check.

  1. Just find a torrent of the exact file in The Pirate Bay, ISOHunt, or TorrentSpy and download it for about a minute.
  2. Then, stop and close the client.
  3. Replace the torrent data file with the corrupted file.
  4. Reopen the client and start the download. The client will perform a hash MD5 check on the current data to see what file segment downloads are necessary. If you correctly found a torrent of the exact file, the client will redownload the segment of the corrupted data and rebuild the file. In the end, you should be good to go!

Cons
There are a couple of caveats for using this method to repair corrupted files.

  • It’s kinda shady. Depending on the file, legality of torrenting may be questionable.
  • You have to find a torrent of the exact file.
  • The torrent tracker must have some IP addresses participating. You may find the torrent of the exact file but that torrent may also be dead.

Leave a comment on your experience with this method or post any other suggestions!

No comments: