After Using PhotoRec
From CGSecurity
english version
deutsche Version
version française
It may be hard to sort the file recovered by PhotoRec. You can find here some ideas to help you in this process.
Contents |
Sort files by extension
- builtBackwards created an open source standalone executable script for Windows with AutoIt v3 called PhotoRec Sorter.
PhotoRec Sorter is executed from the same directory as the “recup_dir” folders and moves each file into a new folder matching the name of the file extension (in upper case, ex. PDF, DOC, PPT) So you end up with all the recovered files being sorted into folders by file extension.
Download Source and Compiled Executable: PhotoRec Sorter Project Page --BuiltBackwards 02:10, 25 October 2008 (UTC)
- You can use this Python script to sort found files by extension:
- Save the following code as a file (recovery.py) and then run it with the parameters of 'source' & 'destination'
Example: $ python recovery.py /home/me/recovered_files /home/me/sorted_files
#!/usr/bin/env python import os import os.path import shutil import string import sys source = sys.argv[1] destination = sys.argv[2] while os.path.exists(source) != True: source = raw_input('Enter a valid source directory\n') while os.path.exists(destination) != True: destination = raw_input('Enter a valid destination directory\n') for root, dirs, files in os.walk(source, topdown=False): for file in files: extension = string.upper(os.path.splitext(file)[1][1:]) destinationPath = os.path.join(destination,extension) if os.path.exists(destinationPath) != True: os.mkdir(destinationPath) if os.path.exists(os.path.join(destinationPath,file)): print 'WARNING: this file was not copied :' + os.path.join(root,file) else: shutil.copy2(os.path.join(root,file), destinationPath)
Jpeg
- JPEG file sorting using Exif meta-data.
- Canon PowerShot models store their image sequence numbers in the Exif data, so using a program that can dump Exif data to text like jhead, and the following Perl script, you can essentially restore all the JPG files to their original names. --Vees 01:59, 8 January 2007 (CET)
$working_dir = '.'; $jhead_bin = '/usr/local/bin/jhead'; @recovered_files = `ls $working_dir`; foreach $file (@recovered_files) { chomp $file; @exif = `$jhead_bin -v $working_dir/$file`; foreach $line (@exif) { if ($line =~ /Canon maker tag 0008 Value = 100(\d{1,8})$/) { system("mv $working_dir/$file $working_dir/IMG_$1.JPG"); print "IMG_$1.JPG from $file\n"; last; } } }
- The following is a batch file for Windows that recreates the original directory layout and file names present on the card (for Canon cameras, tested with numerous photos from an EOS 20D), using the file number EXIF info (by using ExifTool, much like the above shell script. --Joey 08:36, 17 July 2008 (CEST)
@echo off for %%f in (*.jpg) do call :process %%f goto :eof :process for /f "usebackq delims=- tokens=1,2" %%a in (`exiftool -p ^"^$FileNumber^" %1`) do set gnum=%%a&set fnum=%%b if "%gnum%"=="" goto :eof if "%fnum%"=="" goto :eof if not exist %gnum%CANON ( echo Creating directory %gnum%CANON mkdir %gnum%CANON ) echo Moving %1 to %gnum%CANON\_mg_%fnum%.jpg ren %1 _mg_%fnum%.jpg>NUL move _mg_%fnum%.jpg %gnum%CANON>NUL goto :eof
Remove the junk (not visible) at the end of recovered image
- Under Linux, for file extensions that ImageMagick can handle, to remove the junk data after the end of the files, you can run something like
for file in recup_dir*/*; do convert $file $file; done
- Under Linux (or with perl and 'convert'), you can automate the above 'for' loop and do many other batch image processing with fix_img
Finding duplicate
- Under Linux, md5sum can used to find duplicate file, maybe just md5'ing only the first x bytes
In this example, we check for the first 80k of recup_dir*/*.sib
for file in recup_dir.*/*.sib; do MD5=`dd count=20 bs=4k if="$file" 2> /dev/null|md5sum`; echo "$MD5 $file"; done|sort 1a07198de3486ff2ecab7859612fe7ba - Box Clever.sib 33105f4a7997b2e2681e404b3ac895f2 - Random, Matching - 2 bars.sib 376e0c53e78e56ba6f2858d9680f8c6b - 01aIdentifyCommonInst.sib b0b40a516a1e26660748a0a09cdf3207 - 01ArticulationFlashcards.sib
Each checksum is unique, there is no duplicate
- Under Linux (or with perl and 'sum'), you can find duplicates in a hierarchy using find_dup or finddup from fslint.
- On Windows you can use the fc utility to find duplicates, the following batch file (does not work on Win9x/ME) might help: --Joey 08:36, 17 July 2008 (CEST)
@echo off SETLOCAL ENABLEEXTENSIONS ENABLEDELAYEDEXPANSION SET FILELIST= FOR %%i IN (*) DO ( FOR %%j IN (!FILELIST!) DO ( IF %%~zi EQU %%~zj ( fc /b "%%~i" "%%~j">NUL && echo "%%~i" = "%%~j" ) ) SET FILELIST=!FILELIST! "%%~i" ) ENDLOCAL
- On Windows you may add a "/r" (without the quotes) after the both "for"s in the above batch file
- On Unix machines, you can use fdupes and the following script to generate a shell script with rm statements to remove all duplicate files:
#!/bin/sh OUTF='rm-dups.sh' if [ -e $OUTF ]; then echo "File $OUTF already exists." exit 1; fi echo "#!/bin/sh" > $OUTF fdupes -r -f . |sed -r 's/(.+)/rm \1/' >> $OUTF chmod +x $OUTF
MP3, Ogg vorbis...
Most mp3 and ogg files have embedded information about Title, Album and Author. You can use EasyTag to automatically rename the recovered mp3 and ogg using this information.
MS Office
- To read broken MS Office document (doc/xls/ppt/...) that MS Office failed to read, you can try OpenOffice. OpenOffice.org is a multiplatform and multilingual office suite and an open-source project. Compatible with all other major office suites, the product is free to download, use, and distribute.
- Some MS Office document (xls/ppt/...) may be recovered with a Word .doc extension, you may need to rename these files.
MS Outlook
- To recover broken Outlook PST file, try Microsoft Scanpst