Gerrit/Operations
Restarting
Restarting Gerrit is a last resort. We used to have to restart it often due to misunderstanding of some of its behavior as well as nasty memory leak. As of February 2021, restart should not be conducted without a thorough review of the current behavior and taking traces. They will be of dramatic help to identify a potential bug or a configuration tuning.
If after all investigations you get clueless or really have no other options, you can restart Gerrit through systemd: sudo systemctl restart gerrit
.
The service will take a few seconds before it comes back during which any end user operations would error out (some Puppet catalogues, CI, developers).
Monitoring
JavaMelody monitors the state of the Gerrit JVM. They are collected by Prometheus from https://gerrit.wikimedia.org/r/monitoring?prometheus
- JavaMelody in Gerrit (only accesible to logged-in Gerrit Administrators/Gerrit Managers)
- Grafana "Gerrit" folder"
Important Graphs
- Gerrit overview dashboard
- Memory Usage
- GC Timing
- Grafana GC Timing
- Garbage collection metrics. Times in the 100s of milliseconds, rather than in the 10s of milliseconds can be indicative of a problem (running low on memory)
- Active Threads
- Gerrit Active Threads
- Grafana Active Threads
- Usually there are less than 20 active threads at any given time — more than that typically means that you should take a Thread Dump and restart.
Gerrit metrics
On top of the JavaMelody data, Gerrit has internal metrics.
For users having the | viewCaches or View Metrics capabilities, various internal Gerrit metrics can be retrieved via:
Which obviously requires authentication. That complements gerrit show-caches
.
We use the metrics-reporter-prometheus plugin which exposes collected by Prometheus from https://gerrit.wikimedia.org/r/plugins/metrics-reporter-prometheus/metrics . Those Gerrit metrics can also be seen on the JavaMelody MBeans page under the metrics
branch.
See Gerrit Grafana dashboards folder.
Logs
They are consumed by our logging infrastructure and available in the Kibana dashboard for Gerrit (application logs) and Apache access logs.
Main logs
Logs are available on the gerrit servers at: /var/log/gerrit/
. There are a number of logfiles:
gerrit.log
: This is the main log file and will show stacktraces and errorsgerrit.json
: Likegerrit.log
bug not really human readable. For sending structured logs to logstash.sshd_log
: Log of sshd eventsgc_log
: Logs forgit gc
not the JVM garbage collection (those logs are available in/srv/gerrit/jvmlogs
)plugin_log
: Info about plugins being loaded and reloaded, this information is also ingerrit.log
HTTP Logs
Gerrit sits behind Apache, access and error logs are both in /var/log/apache2
:
gerrit.wikimedia.org.https.access.log
gerrit.wikimedia.org.https.error.log
find its logs by searching with type:log4j
.
JVM
Thread Dump
A thread dump is often useful in troubleshooting. To capture a thread dump use jstack
. This code should be safe to run at any time, and is run frequently while Gerrit is running:
sudo -u gerrit2 jstack -l $(pgrep java) > "/srv/gerrit/jstack-$(date +%Y-%m-%d-%H%M%S).dump"
It's often useful to upload the resulting file to https://fastthread.io/ to detect problems.
Java trace
Display a summary of garbage collection statistics every 1000 ms:
sudo -u gerrit2 /usr/lib/jvm/java-8-openjdk-amd64/bin/jstat -gcutil "$(pgrep -u gerrit2 java)" 1000
Java heap usage
Requires openjdk-X-dbg for the debugging symbols
sudo /usr/lib/jvm/java-8-openjdk-amd64/bin/jmap -heap "$( pgrep -u gerrit2 java)"
Access h2 account_patch_reviews
On copies of account_patch_reviews* files:
java -cp h2-1.3.176.jar org.h2.tools.Shell -url jdbc:h2:/home/hashar/account_patch_reviews
Which gives you a sql prompt:
sql> show columns from ACCOUNT_PATCH_REVIEWS ...> ; FIELD | TYPE | NULL | KEY | DEFAULT ACCOUNT_ID | INTEGER(10) | NO | PRI | 0 CHANGE_ID | INTEGER(10) | NO | PRI | 0 PATCH_SET_ID | INTEGER(10) | NO | PRI | 0 FILE_NAME | VARCHAR(255) | NO | PRI | '' (4 rows, 16 ms)
Firewalling Gerrit
On gerrit1001.wikimedia.org
there is /root/firewall.sh
and /root/unfirewall.sh
, which will shut off access to Gerrit HTTP(S) and SSH.
Opsen will find them fairly self-explanatory.
Blocking misbehaving bots / IPs
If necessary either IP addresses or user agents that are misbehaving can be blocked by making edits to modules/profile/templates/gerrit/apache.erb in the operations/puppet public git repository and merging them.
Throttling IPs
Since September 2024, implemented in phab:T365259 there is another method of throttling abusive traffice using nftables. See Firewall#Throttling_with_nftables and the profile::firewall::nftables_throttling keys in Hiera.
You can also observe data related to this on the grafana dashboard for gerrit.
Killing ssh connections
It can happen that a user reaches the limit of 8 concurrent ssh connections and then says they can't push to Gerrit anymore over ssh.
A member of Gerrit admins can run commands like these to kill connections for them:
ssh user@gerrit.wikimedia.org -p 29418 gerrit show-connections ssh user@gerrit.wikimedia.org -p 29418 gerrit close-connection <connection ID>
Switch over
Process to switch over to a replica or to migrate to a new host.
Example migration ticket (cobalt -> gerrit1001): https://phabricator.wikimedia.org/T222391
Topic branch from gerrit1001 migration: https://gerrit.wikimedia.org/r/q/topic:%22gerrit1001%22+(status:open%20OR%20status:merged)
Follow these steps to migrate from one server to another:
Make sure you run rsync with --delete
- Rsync /srv/gerrit/git/ , /srv/gerrit/plugins and /var/lib/gerrit2/review_site/ from <old_host> to <new_host>
rsync --archive --verbose --delete /srv/gerrit/git/ rsync://<new_host>.wikimedia.org/gerrit-data/git/
rsync --archive --verbose --delete /srv/gerrit/plugins/ rsync://<new_host>.wikimedia.org/gerrit-data/plugins/
rsync --archive --verbose --delete /var/lib/gerrit2/review_site/ rsync://<new_host>.wikimedia.org/gerrit-var-lib/
Note: We are using rsync protocol directly, gerrit-data
and gerrit-var-lib
are the names of rsync modules, not file system paths. They are defined in Puppet class profile::gerrit::migration
.
- Stop gerrit && disable puppet on <new_host>
- Create something similar to https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/535966/
- Stop puppet on <old_host> + <replica>
- Create something similar to https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/541110/
- Create something similar to https://gerrit.wikimedia.org/r/#/c/operations/dns/+/541111/
- Stop gerrit on <old_host>
- Repeat the rsync commands above.
- Rename /var/lib/gerrit2/review_site/data/javamelody/r_<old_host> to /var/lib/gerrit2/review_site/data/javamelody/r_<new_host> on <new_host>.
- Run puppet on <new_host> + <old_host>
- Start gerrit on <new_host>
- Hack DNS authdns-update to clone from gerrit-replica temporarily, deploy DNS change
- Manually copy apache2 site config for gerrit.wm.org with scp from <old_host> to <new_host>, restart apache
- Manually run command from list_mediawiki_extensions cron to create /var/www/mediawiki-extensions.txt
- Run the online reindexer
- Decom <old_host> (create a ticket like https://phabricator.wikimedia.org/T236187)