]>
Commit | Line | Data |
---|---|---|
26efe406 | 1 | The executables are in /moira/bin/ on the moira server, with sources |
2 | in /mit/moiradev/src/afssync/. Most of the commands are run on the | |
3 | Moira server. | |
4 | ||
a46edefa | 5 | #### Set up a workspace #### |
6 | ||
7 | mkdir -p /moira/sync | |
8 | cd /moira/sync | |
9 | ||
26efe406 | 10 | #### This is preparation for the resync, to save non-Moira users. #### |
11 | First, get a recent copy of the prdb, and extract non-Moira entries: | |
12 | ||
8381b984 | 13 | /moira/bin/udebug prill -port 7002 |
14 | rcp -px root@prill:/usr/afs/db/prdb.DB0 prdb.old | |
15 | /moira/bin/udebug prill -port 7002 | |
26efe406 | 16 | If the two udebugs show that the version changed, lather-rinse-repeat. |
8381b984 | 17 | (udebug can be found in /usr/athena/bin; "prill" here and below is some |
18 | DB server) | |
26efe406 | 19 | (Also check for "0 of them for write" at the end. It might matter.) |
20 | ||
a46edefa | 21 | /moira/bin/pt_util -x -m -u -g -d prdb.extra -p prdb.old |
22 | perl /moira/bin/pt_util.pl < prdb.extra > prdb.extra.sort | |
26efe406 | 23 | to extract and prepare the personal groups and special user entries in |
24 | the old prdb for being reincorporated into the new prdb. | |
25 | ||
d93d3bb6 | 26 | awk -F\| '$9 == 3 {print $1}' /backup/backup_1/users > /tmp/deactivated |
e3d53038 | 27 | |
28 | and the following perl script: | |
29 | ||
30 | #!/usr/athena/bin/perl -w | |
31 | ||
32 | open(OUT, ">prdb.extra.trimmed"); | |
33 | ||
34 | for ( `cat /tmp/deactivated` ) { | |
35 | chop; | |
36 | $ex{$_} = 1; | |
37 | } | |
38 | ||
39 | $punt = 0; | |
40 | ||
41 | foreach $L ( `cat prdb.extra.sort` ) { | |
42 | @w = split(/ /,$L); | |
43 | $_ = $w[0]; | |
44 | if ( /:/ ) { | |
45 | @x = split(/:/,$w[0]); | |
46 | if ($ex{$x[0]}) { | |
47 | $punt=1; | |
48 | } else { | |
49 | $punt=0; | |
50 | } | |
51 | } else { | |
52 | # If we got here, we're either a user, a prefixless | |
53 | # group, or a group member. | |
54 | $punt = 0 if $w[0]; | |
55 | } | |
56 | print OUT $L unless $punt == 1; | |
57 | } | |
58 | ||
59 | close(OUT); | |
60 | exit 0; | |
61 | ||
7827a830 | 62 | to remove the personal groups for users who are deactivated |
63 | ||
9dbd9d2b | 64 | awk '/^[^ ][^:]*@/ {printf "KERBEROS:%s\n",$1}' prdb.extra.trimmed \ |
65 | > foreign | |
26efe406 | 66 | blanche afs-foreign-users -f foreign |
67 | Get a list of all the @andrew.cmu.edu type (non- athena.mit.edu cell) | |
68 | users, and sync the Moira list afs-foreign-users to this list. | |
69 | Moira then adds those entries to the group system:afs-foreign-users, | |
70 | thus keeping them from being lost in the prdb resync. | |
7827a830 | 71 | Sanity checking the diffs before running the blanche command is recommended. |
26efe406 | 72 | |
7827a830 | 73 | awk '/^[^ 0-9][^:@]*$/ {printf "KERBEROS:%s@ATHENA.MIT.EDU\n",$1}' \ |
9dbd9d2b | 74 | prdb.extra.trimmed > oddities |
75 | awk '/^[^ ][0-9.]* .*$/ {printf "KERBEROS:%s\n",$1}' prdb.extra.trimmed\ | |
76 | >> oddities | |
26efe406 | 77 | echo "LIST:afs-foreign-users" >> oddities |
78 | blanche afs-odd-entities -f oddities | |
79 | Do the equivalent of afs-foreign-users for domestic users. We make | |
80 | the afs-foreign-users list a member of the more general afs-odd-entities. | |
7827a830 | 81 | Sanity checking the diffs before running the blanche command is recommended. |
82 | ||
26efe406 | 83 | WAIT for the incremental updates from the `blanche` changes to complete. |
84 | ||
85 | #### Now the actual resync begins. Incremental updates must stop. #### | |
86 | ||
87 | touch /moira/afs/noafs | |
88 | to disable AFS incremental updates during the synchronization. The | |
89 | afs.incr (?) will wait 30 minutes on an incremental update before | |
90 | timing out, so the resync should complete in that time, or list | |
91 | changes in Moira might need to be propagated by hand. | |
92 | ||
a46edefa | 93 | /moira/bin/afssync prdb.moira |
26efe406 | 94 | to dump the prdb data that is in Moira (users, groups, and group |
95 | memberships). This step takes about ten minutes, but can be done | |
96 | concurrently with the next few steps. | |
97 | ||
7827a830 | 98 | REPEAT the above commands, thus regenerating prdb.trimmed from a now |
99 | completely-up-to-date prdb. | |
3d8d4b36 | 100 | |
101 | *** Make sure the "afssync" command has completed *** | |
3d8d4b36 | 102 | |
a46edefa | 103 | cp prdb.moira prdb.new |
7827a830 | 104 | /moira/bin/pt_util -w -d prdb.extra.trimmed -p prdb.new \ |
105 | >& prdb.extra.err | |
26efe406 | 106 | This use of pt_util will presumably log errors about failed user |
107 | creations and list additions. (To start over, do both the `cp` and | |
108 | `pt_util` again.) You can filter out the "User or group doesn't exist" | |
109 | type of lines that were caused by a user deactivation with something | |
110 | like: | |
d93d3bb6 | 111 | awk -F\| '$9 == 3 {print $1}' /backup/backup_1/users > /tmp/deactivated |
7827a830 | 112 | perl -e 'for(`cat /tmp/deactivated`){ chop; $ex{$_}=1;} \ |
26efe406 | 113 | foreach $L (`cat prdb.extra.err`){ $f=0; \ |
114 | @w=split(/[ :]/,$L); for(@w){ $f=1 if $ex{$_}; } \ | |
115 | next if $f; print $L; }' | |
116 | Now, back to the resync. | |
117 | ||
7827a830 | 118 | The only remaining errors should be errors creating system:foo groups, |
119 | be cause they already exist. These generally mean that that group has | |
120 | an odd user on it (root instance, IP acl, etc.) and can safely be | |
121 | ignored. | |
122 | ||
123 | Errors of the form: | |
124 | Error while creating dcctdw:foo: Badly formed name (group prefix doesn't match owner?) | |
125 | are probably an indication that a user with personal groups had a | |
126 | username change (in the past they have also meant that a user with | |
127 | personal groups was deactivated and the uid was re-used (this was | |
128 | becasue we didn't trim the prdb.extra.sort file in the past.)) | |
129 | Assuming htese errors are due to a username change, the groups should | |
130 | be renamed, and you should regenerate prdb.extra.trimmed starting with | |
8381b984 | 131 | a fresh prdb from prill. (You may want to abort and |
7827a830 | 132 | rm /moira/afs/noafs and try again later.) |
133 | ||
9eba5bbc | 134 | pts listmax > prdb.listmax |
26efe406 | 135 | foreach i ( <db servers> ) |
7827a830 | 136 | rsh $i -l root -x /bin/athena/detach -a # detach packs |
137 | rsh $i -l root -x rm -f /usr/afs/db/{prdb.new,pre-resync-prdb} | |
138 | rcp -px prdb.new root@${i}:/usr/afs/db/prdb.new | |
139 | end # staging | |
140 | foreach i ( <db servers> ) | |
141 | bos shutdown $i ptserver -wait | |
142 | bos exec $i "mv /usr/afs/db/prdb.DB0 /usr/afs/db/pre-resync-prdb; rm /usr/afs/db/prdb.DB*; mv /usr/afs/db/prdb.new /usr/afs/db/prdb.DB0" | |
26efe406 | 143 | end |
144 | foreach i ( <db servers> ) | |
145 | bos restart $i ptserver | |
146 | end | |
147 | ||
148 | /moira/bin/udebug prill -port 7002 | |
149 | to watch the status of the servers to make sure things are going well, | |
150 | where "prill" is preferred db server (the sync site). | |
151 | ||
d93d3bb6 | 152 | Make sure the beacons are working, and that once quorum is established |
26efe406 | 153 | (~90 seconds) that the servers are resynchronizing their notions of |
154 | the databases and that the "dbcurrent" and "up" fields all become set | |
155 | and the state goes to "1f". Also, if "sdi" isn't running, watch out | |
156 | for large rx packet queues on port 7002 using rxdebug, as the | |
157 | fileservers may get excessively backlogged, and restart servers, if | |
158 | necessary, if the congestion remains excessive. | |
159 | ||
160 | pts listmax | |
9eba5bbc | 161 | cat prdb.listmax |
26efe406 | 162 | and if the id maxima are lower than the saved ones, reset them |
163 | appropriately to the saved ones using `pts setmax`. | |
164 | ||
165 | pts ex system:administrators | |
166 | as a good spot check, especially since it has special people. | |
3d8d4b36 | 167 | (also spot check one of the personal groups and perhaps, something like |
d93d3bb6 | 168 | the membership of rcmd.reynelda) |
3d8d4b36 | 169 | |
26efe406 | 170 | rm /moira/afs/noafs |
171 | to remove the lock file and let Moira's afs incrementals continue. | |
3d8d4b36 | 172 | |
983926ee | 173 | The afssync program doesn't deal with null instance KERBEROS |
174 | members of lists which are groups (example: if LIST zacheiss contains | |
175 | KERBEROS zacheiss@ATHENA.MIT.EDU). To get around this, run: | |
176 | ||
177 | /moira/bin/sync.pl | |
178 | ||
179 | Which will create /var/tmp/sync.out, which contains the pts commands | |
180 | needed to add all the null instance KERBEROS members back to the pts | |
181 | groups they belong in. If it looks sane, run: | |
182 | ||
183 | sh /var/tmp/sync.out | |
184 | ||
185 | Any failed additions are probably from lists that contain both USER | |
186 | username and KERBEROS username@ATHENA.MIT.EDU. | |
3d8d4b36 | 187 | |
26efe406 | 188 | NOTES |
3d8d4b36 | 189 | |
26efe406 | 190 | 1. Don't do this when you're tired... There may be no cleanup procedure |
3d8d4b36 | 191 | available, with certain mistakes. |
192 | ||
26efe406 | 193 | 2. /moira/afs/noafs is only good for 30 minutes. Keep track of the |
3d8d4b36 | 194 | critical log, and you may have to do some operations by hand when the |
195 | operation is complete. Also, if requests depend on other requests, they | |
196 | may be processed out of order, and fail, and may need to be done by hand. |