]> andersk Git - moira.git/blame - afssync/INSTRUCTIONS
break out of case statement, don't fall through.
[moira.git] / afssync / INSTRUCTIONS
CommitLineData
26efe406 1The executables are in /moira/bin/ on the moira server, with sources
2in /mit/moiradev/src/afssync/. Most of the commands are run on the
3Moira server.
4
a46edefa 5#### Set up a workspace ####
6
7mkdir -p /moira/sync
8cd /moira/sync
9
26efe406 10#### This is preparation for the resync, to save non-Moira users. ####
11First, get a recent copy of the prdb, and extract non-Moira entries:
12
a46edefa 13 /moira/bin/udebug aggy -port 7002
9eba5bbc 14 rcp -px root@aggy:/usr/afs/db/prdb.DB0 prdb.old
a46edefa 15 /moira/bin/udebug aggy -port 7002
26efe406 16If the two udebugs show that the version changed, lather-rinse-repeat.
a46edefa 17(udebug can be found in afsuser; "aggy" here and below is some DB server)
26efe406 18(Also check for "0 of them for write" at the end. It might matter.)
19
a46edefa 20 /moira/bin/pt_util -x -m -u -g -d prdb.extra -p prdb.old
21 perl /moira/bin/pt_util.pl < prdb.extra > prdb.extra.sort
26efe406 22to extract and prepare the personal groups and special user entries in
23the old prdb for being reincorporated into the new prdb.
24
d93d3bb6 25 awk -F\| '$9 == 3 {print $1}' /backup/backup_1/users > /tmp/deactivated
7827a830 26 perl -e 'for(`cat /tmp/deactivated`) { chop; $ex{$_}=1;} \
27 $punt=0; foreach $L (`cat prdb.extra.sort`){ \
28 @w=split(/ /,$L); $_=$w[0]; if ( /:/ ) \
29 {@x=split(/:/,$w[0]); if($ex{$x[0]}) {$punt=1;}else{$punt=0;}} \
30 print $L unless $punt==1;}' > prdb.extra.trimmed
31to remove the personal groups for users who are deactivated
32
9dbd9d2b 33 awk '/^[^ ][^:]*@/ {printf "KERBEROS:%s\n",$1}' prdb.extra.trimmed \
34 > foreign
26efe406 35 blanche afs-foreign-users -f foreign
36Get a list of all the @andrew.cmu.edu type (non- athena.mit.edu cell)
37users, and sync the Moira list afs-foreign-users to this list.
38Moira then adds those entries to the group system:afs-foreign-users,
39thus keeping them from being lost in the prdb resync.
7827a830 40Sanity checking the diffs before running the blanche command is recommended.
26efe406 41
7827a830 42 awk '/^[^ 0-9][^:@]*$/ {printf "KERBEROS:%s@ATHENA.MIT.EDU\n",$1}' \
9dbd9d2b 43 prdb.extra.trimmed > oddities
44 awk '/^[^ ][0-9.]* .*$/ {printf "KERBEROS:%s\n",$1}' prdb.extra.trimmed\
45 >> oddities
26efe406 46 echo "LIST:afs-foreign-users" >> oddities
47 blanche afs-odd-entities -f oddities
48Do the equivalent of afs-foreign-users for domestic users. We make
49the afs-foreign-users list a member of the more general afs-odd-entities.
7827a830 50Sanity checking the diffs before running the blanche command is recommended.
51
26efe406 52WAIT for the incremental updates from the `blanche` changes to complete.
53
54#### Now the actual resync begins. Incremental updates must stop. ####
55
56 touch /moira/afs/noafs
57to disable AFS incremental updates during the synchronization. The
58afs.incr (?) will wait 30 minutes on an incremental update before
59timing out, so the resync should complete in that time, or list
60changes in Moira might need to be propagated by hand.
61
a46edefa 62 /moira/bin/afssync prdb.moira
26efe406 63to dump the prdb data that is in Moira (users, groups, and group
64memberships). This step takes about ten minutes, but can be done
65concurrently with the next few steps.
66
7827a830 67REPEAT the above commands, thus regenerating prdb.trimmed from a now
68completely-up-to-date prdb.
3d8d4b36 69
70*** Make sure the "afssync" command has completed ***
3d8d4b36 71
a46edefa 72 cp prdb.moira prdb.new
7827a830 73 /moira/bin/pt_util -w -d prdb.extra.trimmed -p prdb.new \
74 >& prdb.extra.err
26efe406 75This use of pt_util will presumably log errors about failed user
76creations and list additions. (To start over, do both the `cp` and
77`pt_util` again.) You can filter out the "User or group doesn't exist"
78type of lines that were caused by a user deactivation with something
79like:
d93d3bb6 80 awk -F\| '$9 == 3 {print $1}' /backup/backup_1/users > /tmp/deactivated
7827a830 81 perl -e 'for(`cat /tmp/deactivated`){ chop; $ex{$_}=1;} \
26efe406 82 foreach $L (`cat prdb.extra.err`){ $f=0; \
83 @w=split(/[ :]/,$L); for(@w){ $f=1 if $ex{$_}; } \
84 next if $f; print $L; }'
85Now, back to the resync.
86
7827a830 87The only remaining errors should be errors creating system:foo groups,
88be cause they already exist. These generally mean that that group has
89an odd user on it (root instance, IP acl, etc.) and can safely be
90ignored.
91
92Errors of the form:
93Error while creating dcctdw:foo: Badly formed name (group prefix doesn't match owner?)
94are probably an indication that a user with personal groups had a
95username change (in the past they have also meant that a user with
96personal groups was deactivated and the uid was re-used (this was
97becasue we didn't trim the prdb.extra.sort file in the past.))
98Assuming htese errors are due to a username change, the groups should
99be renamed, and you should regenerate prdb.extra.trimmed starting with
100a fresh prdb from aggy. (You may want to abort and
101rm /moira/afs/noafs and try again later.)
102
9eba5bbc 103 pts listmax > prdb.listmax
26efe406 104 foreach i ( <db servers> )
7827a830 105 rsh $i -l root -x /bin/athena/detach -a # detach packs
106 rsh $i -l root -x rm -f /usr/afs/db/{prdb.new,pre-resync-prdb}
107 rcp -px prdb.new root@${i}:/usr/afs/db/prdb.new
108 end # staging
109 foreach i ( <db servers> )
110 bos shutdown $i ptserver -wait
111 bos exec $i "mv /usr/afs/db/prdb.DB0 /usr/afs/db/pre-resync-prdb; rm /usr/afs/db/prdb.DB*; mv /usr/afs/db/prdb.new /usr/afs/db/prdb.DB0"
26efe406 112 end
113 foreach i ( <db servers> )
114 bos restart $i ptserver
115 end
116
117 /moira/bin/udebug prill -port 7002
118to watch the status of the servers to make sure things are going well,
119where "prill" is preferred db server (the sync site).
120
d93d3bb6 121Make sure the beacons are working, and that once quorum is established
26efe406 122(~90 seconds) that the servers are resynchronizing their notions of
123the databases and that the "dbcurrent" and "up" fields all become set
124and the state goes to "1f". Also, if "sdi" isn't running, watch out
125for large rx packet queues on port 7002 using rxdebug, as the
126fileservers may get excessively backlogged, and restart servers, if
127necessary, if the congestion remains excessive.
128
129 pts listmax
9eba5bbc 130 cat prdb.listmax
26efe406 131and if the id maxima are lower than the saved ones, reset them
132appropriately to the saved ones using `pts setmax`.
133
134 pts ex system:administrators
135as a good spot check, especially since it has special people.
3d8d4b36 136(also spot check one of the personal groups and perhaps, something like
d93d3bb6 137the membership of rcmd.reynelda)
3d8d4b36 138
26efe406 139 rm /moira/afs/noafs
140to remove the lock file and let Moira's afs incrementals continue.
3d8d4b36 141
983926ee 142 The afssync program doesn't deal with null instance KERBEROS
143members of lists which are groups (example: if LIST zacheiss contains
144KERBEROS zacheiss@ATHENA.MIT.EDU). To get around this, run:
145
146/moira/bin/sync.pl
147
148Which will create /var/tmp/sync.out, which contains the pts commands
149needed to add all the null instance KERBEROS members back to the pts
150groups they belong in. If it looks sane, run:
151
152sh /var/tmp/sync.out
153
154Any failed additions are probably from lists that contain both USER
155username and KERBEROS username@ATHENA.MIT.EDU.
3d8d4b36 156
26efe406 157NOTES
3d8d4b36 158
26efe406 1591. Don't do this when you're tired... There may be no cleanup procedure
3d8d4b36 160available, with certain mistakes.
161
26efe406 1622. /moira/afs/noafs is only good for 30 minutes. Keep track of the
3d8d4b36 163critical log, and you may have to do some operations by hand when the
164operation is complete. Also, if requests depend on other requests, they
165may be processed out of order, and fail, and may need to be done by hand.
This page took 0.128775 seconds and 5 git commands to generate.