]> andersk Git - moira.git/blame - afssync/INSTRUCTIONS
Bigger UID space.
[moira.git] / afssync / INSTRUCTIONS
CommitLineData
26efe406 1The executables are in /moira/bin/ on the moira server, with sources
2in /mit/moiradev/src/afssync/. Most of the commands are run on the
3Moira server.
4
a46edefa 5#### Set up a workspace ####
6
7mkdir -p /moira/sync
8cd /moira/sync
9
26efe406 10#### This is preparation for the resync, to save non-Moira users. ####
11First, get a recent copy of the prdb, and extract non-Moira entries:
12
a46edefa 13 /moira/bin/udebug aggy -port 7002
9eba5bbc 14 rcp -px root@aggy:/usr/afs/db/prdb.DB0 prdb.old
a46edefa 15 /moira/bin/udebug aggy -port 7002
26efe406 16If the two udebugs show that the version changed, lather-rinse-repeat.
a46edefa 17(udebug can be found in afsuser; "aggy" here and below is some DB server)
26efe406 18(Also check for "0 of them for write" at the end. It might matter.)
19
a46edefa 20 /moira/bin/pt_util -x -m -u -g -d prdb.extra -p prdb.old
21 perl /moira/bin/pt_util.pl < prdb.extra > prdb.extra.sort
26efe406 22to extract and prepare the personal groups and special user entries in
23the old prdb for being reincorporated into the new prdb.
24
d93d3bb6 25 awk -F\| '$9 == 3 {print $1}' /backup/backup_1/users > /tmp/deactivated
e3d53038 26
27and the following perl script:
28
29#!/usr/athena/bin/perl -w
30
31open(OUT, ">prdb.extra.trimmed");
32
33for ( `cat /tmp/deactivated` ) {
34 chop;
35 $ex{$_} = 1;
36}
37
38$punt = 0;
39
40foreach $L ( `cat prdb.extra.sort` ) {
41 @w = split(/ /,$L);
42 $_ = $w[0];
43 if ( /:/ ) {
44 @x = split(/:/,$w[0]);
45 if ($ex{$x[0]}) {
46 $punt=1;
47 } else {
48 $punt=0;
49 }
50 } else {
51 # If we got here, we're either a user, a prefixless
52 # group, or a group member.
53 $punt = 0 if $w[0];
54 }
55 print OUT $L unless $punt == 1;
56}
57
58close(OUT);
59exit 0;
60
7827a830 61to remove the personal groups for users who are deactivated
62
9dbd9d2b 63 awk '/^[^ ][^:]*@/ {printf "KERBEROS:%s\n",$1}' prdb.extra.trimmed \
64 > foreign
26efe406 65 blanche afs-foreign-users -f foreign
66Get a list of all the @andrew.cmu.edu type (non- athena.mit.edu cell)
67users, and sync the Moira list afs-foreign-users to this list.
68Moira then adds those entries to the group system:afs-foreign-users,
69thus keeping them from being lost in the prdb resync.
7827a830 70Sanity checking the diffs before running the blanche command is recommended.
26efe406 71
7827a830 72 awk '/^[^ 0-9][^:@]*$/ {printf "KERBEROS:%s@ATHENA.MIT.EDU\n",$1}' \
9dbd9d2b 73 prdb.extra.trimmed > oddities
74 awk '/^[^ ][0-9.]* .*$/ {printf "KERBEROS:%s\n",$1}' prdb.extra.trimmed\
75 >> oddities
26efe406 76 echo "LIST:afs-foreign-users" >> oddities
77 blanche afs-odd-entities -f oddities
78Do the equivalent of afs-foreign-users for domestic users. We make
79the afs-foreign-users list a member of the more general afs-odd-entities.
7827a830 80Sanity checking the diffs before running the blanche command is recommended.
81
26efe406 82WAIT for the incremental updates from the `blanche` changes to complete.
83
84#### Now the actual resync begins. Incremental updates must stop. ####
85
86 touch /moira/afs/noafs
87to disable AFS incremental updates during the synchronization. The
88afs.incr (?) will wait 30 minutes on an incremental update before
89timing out, so the resync should complete in that time, or list
90changes in Moira might need to be propagated by hand.
91
a46edefa 92 /moira/bin/afssync prdb.moira
26efe406 93to dump the prdb data that is in Moira (users, groups, and group
94memberships). This step takes about ten minutes, but can be done
95concurrently with the next few steps.
96
7827a830 97REPEAT the above commands, thus regenerating prdb.trimmed from a now
98completely-up-to-date prdb.
3d8d4b36 99
100*** Make sure the "afssync" command has completed ***
3d8d4b36 101
a46edefa 102 cp prdb.moira prdb.new
7827a830 103 /moira/bin/pt_util -w -d prdb.extra.trimmed -p prdb.new \
104 >& prdb.extra.err
26efe406 105This use of pt_util will presumably log errors about failed user
106creations and list additions. (To start over, do both the `cp` and
107`pt_util` again.) You can filter out the "User or group doesn't exist"
108type of lines that were caused by a user deactivation with something
109like:
d93d3bb6 110 awk -F\| '$9 == 3 {print $1}' /backup/backup_1/users > /tmp/deactivated
7827a830 111 perl -e 'for(`cat /tmp/deactivated`){ chop; $ex{$_}=1;} \
26efe406 112 foreach $L (`cat prdb.extra.err`){ $f=0; \
113 @w=split(/[ :]/,$L); for(@w){ $f=1 if $ex{$_}; } \
114 next if $f; print $L; }'
115Now, back to the resync.
116
7827a830 117The only remaining errors should be errors creating system:foo groups,
118be cause they already exist. These generally mean that that group has
119an odd user on it (root instance, IP acl, etc.) and can safely be
120ignored.
121
122Errors of the form:
123Error while creating dcctdw:foo: Badly formed name (group prefix doesn't match owner?)
124are probably an indication that a user with personal groups had a
125username change (in the past they have also meant that a user with
126personal groups was deactivated and the uid was re-used (this was
127becasue we didn't trim the prdb.extra.sort file in the past.))
128Assuming htese errors are due to a username change, the groups should
129be renamed, and you should regenerate prdb.extra.trimmed starting with
130a fresh prdb from aggy. (You may want to abort and
131rm /moira/afs/noafs and try again later.)
132
9eba5bbc 133 pts listmax > prdb.listmax
26efe406 134 foreach i ( <db servers> )
7827a830 135 rsh $i -l root -x /bin/athena/detach -a # detach packs
136 rsh $i -l root -x rm -f /usr/afs/db/{prdb.new,pre-resync-prdb}
137 rcp -px prdb.new root@${i}:/usr/afs/db/prdb.new
138 end # staging
139 foreach i ( <db servers> )
140 bos shutdown $i ptserver -wait
141 bos exec $i "mv /usr/afs/db/prdb.DB0 /usr/afs/db/pre-resync-prdb; rm /usr/afs/db/prdb.DB*; mv /usr/afs/db/prdb.new /usr/afs/db/prdb.DB0"
26efe406 142 end
143 foreach i ( <db servers> )
144 bos restart $i ptserver
145 end
146
147 /moira/bin/udebug prill -port 7002
148to watch the status of the servers to make sure things are going well,
149where "prill" is preferred db server (the sync site).
150
d93d3bb6 151Make sure the beacons are working, and that once quorum is established
26efe406 152(~90 seconds) that the servers are resynchronizing their notions of
153the databases and that the "dbcurrent" and "up" fields all become set
154and the state goes to "1f". Also, if "sdi" isn't running, watch out
155for large rx packet queues on port 7002 using rxdebug, as the
156fileservers may get excessively backlogged, and restart servers, if
157necessary, if the congestion remains excessive.
158
159 pts listmax
9eba5bbc 160 cat prdb.listmax
26efe406 161and if the id maxima are lower than the saved ones, reset them
162appropriately to the saved ones using `pts setmax`.
163
164 pts ex system:administrators
165as a good spot check, especially since it has special people.
3d8d4b36 166(also spot check one of the personal groups and perhaps, something like
d93d3bb6 167the membership of rcmd.reynelda)
3d8d4b36 168
26efe406 169 rm /moira/afs/noafs
170to remove the lock file and let Moira's afs incrementals continue.
3d8d4b36 171
983926ee 172 The afssync program doesn't deal with null instance KERBEROS
173members of lists which are groups (example: if LIST zacheiss contains
174KERBEROS zacheiss@ATHENA.MIT.EDU). To get around this, run:
175
176/moira/bin/sync.pl
177
178Which will create /var/tmp/sync.out, which contains the pts commands
179needed to add all the null instance KERBEROS members back to the pts
180groups they belong in. If it looks sane, run:
181
182sh /var/tmp/sync.out
183
184Any failed additions are probably from lists that contain both USER
185username and KERBEROS username@ATHENA.MIT.EDU.
3d8d4b36 186
26efe406 187NOTES
3d8d4b36 188
26efe406 1891. Don't do this when you're tired... There may be no cleanup procedure
3d8d4b36 190available, with certain mistakes.
191
26efe406 1922. /moira/afs/noafs is only good for 30 minutes. Keep track of the
3d8d4b36 193critical log, and you may have to do some operations by hand when the
194operation is complete. Also, if requests depend on other requests, they
195may be processed out of order, and fail, and may need to be done by hand.
This page took 0.102584 seconds and 5 git commands to generate.