Removing PNFS (Chimera) Ghosts

There is the possibility that the chimera DB can become out-of-sync with the actual files stored on disk. The t_dirs table holds the "tree" of the /pnfs namespace in Chimera. The t_locationinfo table holds the physical location for a specific ipnfsid. "Ghosts" are when there is an entry in the t_dirs table but not a corresponding location in the t_locationinfo table.

To find this I proceed as follows:

  • Connect to the chimera DB and output the ipnfsid values from both t_dirs and t_locationinfo:
chimera=> \o /tmp/pnfsid_dirs.log
chimera=> select ipnfsid from t_dirs;
chimera=> \o /tmp/pnfsid_locationinfo.log
chimera=> select ipnfsid from t_locationinfo;
chimera=> \q
  • Check the output files
root@head02 ~# wc /tmp/pnfsid_dirs.log
 3517723  3517723 94580872 /tmp/pnfsid_dirs.log
root@head02 ~# wc /tmp/pnfsid_locationinfo.log
 2979423  2979423 79808840 /tmp/pnfsid_locationinfo.log
  • Edit the above output files to remove the header and blank lines
  • Sort the output files for use with the comm utility on Linux:
sort -u /tmp/pnfsid_dirs.log > /tmp/pnfsid_dirs.log.sorted
sort -u /tmp/pnfsid_locationinfo.log > /tmp/pnfsid_locationinfo.log.sorted

Check the unique count:
root@head02 ~# wc /tmp/pnfsid_dirs.log.sorted
 3164344  3164344 84768704 /tmp/pnfsid_dirs.log.sorted
root@head02 ~# wc /tmp/pnfsid_locationinfo.log.sorted
 2978103  2978103 79773966 /tmp/pnfsid_locationinfo.log.sorted

NOTE: At this point we have two sets of PNFSIDs to reconcile BUT we can't simply assume the if the ipnfsid present in t_dirs is NOT present in t_locationinfo it is a "ghost". The reason is that directories won't have any entry in t_locationinfo. So we need to find the list of POSSIBLE "ghosts" but then remove any directories which are contained in the list.

We can get a list of PNFSIDs for "directories" as follows: \o /tmp/pnfsid_directories.log; select ipnfsid from t_inodes where itype=16384; Next sort these:
sort -u /tmp/pnfsid_directories.log > /tmp/pnfsid_directories.log.sorted

Check the count: root@head02 ~# wc /tmp/pnfsid_directories.log.sorted 176727 176727 4747829 /tmp/pnfsid_directories.log.sorted

We can use this list to remove directory entries from any potential "ghost" list.

  • Next get a list of POTENTIAL ghosts: comm -2 -3 /tmp/pnfsid_dirs.log.sorted /tmp/pnfsid_locationinfo.log.sorted > /tmp/pnfsid_ghosts_v1.log

  • Use the possible list of ghosts to create a final list of "ghosts" by removing the pnfsids of the directories: comm -2 -3 /tmp/pnfsid_ghosts_v1.log /tmp/pnfsid_directories.log.sorted > /tmp/pnfsid_ghosts_final.log

Here is the actual results:

root@head02 ~# wc /tmp/pnfsid_directories.log.sorted                            
176727  176727 4747829 /tmp/pnfsid_directories.log.sorted
root@head02 ~# comm -2 -3 /tmp/pnfsid_dirs.log.sorted /tmp/pnfsid_locationinfo.log.sorted > /tmp/pnfsid_ghosts_v1.log
root@head02 ~# wc /tmp/pnfsid_ghosts_v1.log
 186261  186262 4995475 /tmp/pnfsid_ghosts_v1.log
root@head02 ~# comm -2 -3 /tmp/pnfsid_ghosts_v1.log /tmp/pnfsid_directories.log.sorted > /tmp/pnfsid_ghosts_final.log
root@head02 ~# wc /tmp/pnfsid_ghosts_final.log
  9572   9573 249029 /tmp/pnfsid_ghosts_final.log

So we have 9572 "ghosts" in our PNFS space.

We then use a simple perl/DBI script to remove these entries from the t_dirs, t_level_2 AND the t_inodes tables, in that order.

root@head02 ~# perl remove_pnfsid_ghosts.pl /tmp/pnfsid_ghosts_final.log
 Starting at Fri May 22 15:38:18 2009
 Processed 1 entries at Fri May 22 15:38:18 2009
 Processed 101 entries at Fri May 22 15:40:09 2009
 Processed 201 entries at Fri May 22 15:42:10 2009
...
 Processed 9001 entries at Fri May 22 18:36:03 2009
 Processed 9101 entries at Fri May 22 18:38:01 2009
 Processed 9201 entries at Fri May 22 18:39:59 2009
 Processed 9301 entries at Fri May 22 18:41:57 2009
 Processed 9401 entries at Fri May 22 18:43:54 2009
 Processed 9501 entries at Fri May 22 18:45:52 2009
 Finished deleting 9571 records from CHIMERA DB at Fri May 22 18:47:17 2009

The script looks like:
#!/usr/bin/perl
#
# remove_pnfsid_ghosts.pl  -  This script reads from a file the list
#   of ghost pnfsids to be removed from the chimera DB
#
# Shawn McKee <smckee@umich.edu> on May 22, 2009
####################################################

use DBI;
use DBD::Pg;

my $verbose=1;
my $dbh=DBI->connect("DBI:Pg:dbname=chimera;host=head02.aglt2.org","<user>","",{ RaiseError => 1});
my $sth1=$dbh->prepare('delete from t_dirs where ipnfsid=?') or die "Couldn't prepare statement: " . $dbh->errstr;
my $sth2=$dbh->prepare('delete from t_level_2 where ipnfsid=?') or die "Couldn't prepare statement: " . $dbh->errstr;
my $sth3=$dbh->prepare('delete from t_inodes where ipnfsid=?') or die "Couldn't prepare statement: " . $dbh->errstr;

print " Starting at ".localtime(time())."\n";
my $infile=$ARGV[0];
chomp($infile);
# The input file contains a list of PNFSIDs (1/line) of ghost PNFS  entries
# to be removed from the set of CHIMERA tables.
open(IN,"<$infile") or die "Unable to open $infile: $!\n";

$cnt=0;
while (<IN>) {
    if ( ($cnt++ % 100) == 0 ) {
        print " Processed $cnt entries at ".localtime(time())."\n";
    }
    chomp;
    /\s*([\S]+)/;
    $pnfsid=$1;
#    print " Found PNFSID=$pnfsid\n";
    $sth1->execute($pnfsid) or die "Couldn't execute statement: " .$sth->errstr;
    $sth2->execute($pnfsid) or die "Couldn't execute statement: " .$sth->errstr;
    $sth3->execute($pnfsid) or die "Couldn't execute statement: " .$sth->errstr;
#    last;
}
close(IN);
$dbh->disconnect;
print " Finished deleting $cnt records from CHIMERA DB at ".localtime(time())."\n";
exit;

-- ShawnMcKee - 22 May 2009
Topic revision: r2 - 22 May 2009, ShawnMcKee
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback