User:Razzi/deployment train 5-18
Let's look at the following for a specific job restart: virtualpageview hourly
When .properties file has changed
You need to kill the existing job and spawn a new one.
Here is the procedure to follow (not to forget any step ;)
- Finding the new
start_time
One important thing before killing the old job is to check the new jobstart_time
you should use.- If the job to restart is a coordinator, pick the last finished time +1 coordinator frequency unit (coordinators define their frequency units, we usually have hour, day or month, seldom we use weeks).
- If the job to restart is a bundle, apply the coordinator method for every coordinator the bundle runs, and pick the oldest time you find. This check is needed for two reasons. First, jobs take time to finish, therefore when you kill a job, the chances you kill a currently running job is almost 100%, and you should rerun it. Second, our oozie jobs are dependent on data being present, and it is natural that jobs wait for some time before having their data available (previsou job to finish for instance). Both in-flight and waiting jobs are present in a coordinator job queue, and when the coordinator is killed, it is more difficult to know which of the jobs have actually finished. For this reason, checking for finished jobs before killing the parent coordinator/bundle is best practice.
Ok, so gotta find the running job. Let's look on hue for virtualpageview
Ok so 5-18-11 is the last that is done.
5-18-12 will be the next.
- Kill the existing job
How do I find the existing job?
going to one of the jobs I found, I found the metadata
oozie.job.id 0034706-210426062240701-oozie-oozi-W
But that's a W, not a C for coordinator or B for bundle.
Ok in the breadcrumbs part, I see
0026935-210426062240701-oozie-oozi-C
ok I killed it https://hue.wikimedia.org/hue/jobbrowser/#!id=0026935-210426062240701-oozie-oozi-C
by clicking kill
and it threw a gnarly error...
-
- If you use hue, click the kill button (left panel, Manage part at the bottom, red button).
- If you prefer CLI: Find the job id (something like
0014015-161020124223818-oozie-oozi-C
for instance - Notice that coordinators id contain aC
at the end while bundles id have aB
). Runoozie job -kill <job_id>
- Restart a new replacing job While hue provide a way to define/run oozie jobs, we do it with files and CLI and the two don't collaborate well. So you'll have to go for CLI :) The two values that change in a production job oozie command in our environment are
start_time
, and the path of the.properties
file to run (this file actually defines which job will be started). The path of.properties
file to use is most probably in/srv/deployment/analytics/refinery/oozie
. Also, notice there is no=
sign between-config
and the.properties
file path.
sudo -u analytics kerberos-run-command analytics oozie job --oozie $OOZIE_URL \ -Drefinery_directory=hdfs://analytics-hadoop$(hdfs dfs -ls -d /wmf/refinery/$(date +"%Y")* | tail -n 1 | awk '{print $NF}') \ -Dqueue_name=production \ -Doozie_launcher_queue_name=production \ -Dstart_time=<YOUR_START_TIME> \ -config <YOUR_PROPERTIES_PATH> \ -run
Almost there.
/srv/deployment/analytics/refinery/oozie/virtualpageview/hourly/coordinator.properties
sudo -u analytics kerberos-run-command analytics oozie job --oozie $OOZIE_URL \ -Drefinery_directory=hdfs://analytics-hadoop$(hdfs dfs -ls -d /wmf/refinery/$(date +"%Y")* | tail -n 1 | awk '{print $NF}') \ -Dqueue_name=production \ -Doozie_launcher_queue_name=production \ -Dstart_time=2021-05-18T12:00Z \ -config /srv/deployment/analytics/refinery/oozie/virtualpageview/hourly/coordinator.properties \ -run
razzi@an-launcher1002:/srv/deployment/analytics/refinery$ sudo -u analytics kerberos-run-command analytics oozie job --oozie $OOZIE_URL \ > -Drefinery_directory=hdfs://analytics-hadoop$(hdfs dfs -ls -d /wmf/refinery/$(date +"%Y")* | tail -n 1 | awk '{print $NF}') \ > -Dqueue_name=production \ > -Doozie_launcher_queue_name=production \ > -Dstart_time=2021-05-18T12:00Z \ > -config /srv/deployment/analytics/refinery/oozie/virtualpageview/hourly/coordinator.properties \ > -run job: 0035056-210426062240701-oozie-oozi-C