Jump to content

Gerrit/Operations

From Wikitech

Restarting

Restarting Gerrit is a last resort. We used to have to restart it often due to misunderstanding of some of its behavior as well as nasty memory leak. As of February 2021, restart should not be conducted without a thorough review of the current behavior and taking traces. They will be of dramatic help to identify a potential bug or a configuration tuning.

If after all investigations you get clueless or really have no other options, you can restart Gerrit through systemd: sudo systemctl restart gerrit.

The service will take a few seconds before it comes back during which any end user operations would error out (some Puppet catalogues, CI, developers).

Monitoring

JavaMelody monitors the state of the Gerrit JVM. They are collected by Prometheus from https://gerrit.wikimedia.org/r/monitoring?prometheus

Important Graphs

Gerrit metrics

On top of the JavaMelody data, Gerrit has internal metrics.

For users having the | viewCaches or View Metrics capabilities, various internal Gerrit metrics can be retrieved via:

Which obviously requires authentication. That complements gerrit show-caches.

We use the metrics-reporter-prometheus plugin which exposes collected by Prometheus from https://gerrit.wikimedia.org/r/plugins/metrics-reporter-prometheus/metrics . Those Gerrit metrics can also be seen on the JavaMelody MBeans page under the metrics branch.

See Gerrit Grafana dashboards folder.

Logs

They are consumed by our logging infrastructure and available in the Kibana dashboard for Gerrit (application logs) and Apache access logs.

Main logs

Logs are available on the gerrit servers at: /var/log/gerrit/. There are a number of logfiles:

  • gerrit.log: This is the main log file and will show stacktraces and errors
  • gerrit.json: Like gerrit.log bug not really human readable. For sending structured logs to logstash.
  • sshd_log: Log of sshd events
  • gc_log: Logs for git gc not the JVM garbage collection (those logs are available in /srv/gerrit/jvmlogs)
  • plugin_log: Info about plugins being loaded and reloaded, this information is also in gerrit.log

HTTP Logs

Gerrit sits behind Apache, access and error logs are both in /var/log/apache2:

  • gerrit.wikimedia.org.https.access.log
  • gerrit.wikimedia.org.https.error.log

find its logs by searching with type:log4j.

JVM

Thread Dump

A thread dump is often useful in troubleshooting. To capture a thread dump use jstack. This code should be safe to run at any time, and is run frequently while Gerrit is running:

sudo -u gerrit2 jstack -l $(pgrep java) > "/srv/gerrit/jstack-$(date +%Y-%m-%d-%H%M%S).dump"

It's often useful to upload the resulting file to https://fastthread.io/ to detect problems.

Java trace

This command isn't run very often, unsure how safe it is to run; kept here for folks who are familiar with jstat

Display a summary of garbage collection statistics every 1000 ms:

sudo -u gerrit2 /usr/lib/jvm/java-8-openjdk-amd64/bin/jstat -gcutil "$(pgrep -u gerrit2 java)" 1000

Java heap usage

Requires openjdk-X-dbg for the debugging symbols

  sudo /usr/lib/jvm/java-8-openjdk-amd64/bin/jmap -heap "$( pgrep -u gerrit2 java)"

Access h2 account_patch_reviews

On copies of account_patch_reviews* files:

java -cp h2-1.3.176.jar org.h2.tools.Shell -url jdbc:h2:/home/hashar/account_patch_reviews

Which gives you a sql prompt:

sql> show columns from ACCOUNT_PATCH_REVIEWS
...> ;
FIELD        | TYPE         | NULL | KEY | DEFAULT
ACCOUNT_ID   | INTEGER(10)  | NO   | PRI | 0
CHANGE_ID    | INTEGER(10)  | NO   | PRI | 0
PATCH_SET_ID | INTEGER(10)  | NO   | PRI | 0
FILE_NAME    | VARCHAR(255) | NO   | PRI | ''
(4 rows, 16 ms)

Firewalling Gerrit

On gerrit1001.wikimedia.org there is /root/firewall.sh and /root/unfirewall.sh, which will shut off access to Gerrit HTTP(S) and SSH. Opsen will find them fairly self-explanatory.

Blocking misbehaving bots / IPs

If necessary either IP addresses or user agents that are misbehaving can be blocked by making edits to modules/profile/templates/gerrit/apache.erb in the operations/puppet public git repository and merging them.

example change

Throttling IPs

Since September 2024, implemented in phab:T365259 there is another method of throttling abusive traffice using nftables. See Firewall#Throttling_with_nftables and the profile::firewall::nftables_throttling keys in Hiera.

You can also observe data related to this on the grafana dashboard for gerrit.

Killing ssh connections

It can happen that a user reaches the limit of 8 concurrent ssh connections and then says they can't push to Gerrit anymore over ssh.

A member of Gerrit admins can run commands like these to kill connections for them:


ssh user@gerrit.wikimedia.org -p 29418 gerrit show-connections
ssh user@gerrit.wikimedia.org -p 29418 gerrit close-connection <connection ID>

Switch over

Process to switch over to a replica or to migrate to a new host.

Example migration ticket (cobalt -> gerrit1001): https://phabricator.wikimedia.org/T222391

Topic branch from gerrit1001 migration: https://gerrit.wikimedia.org/r/q/topic:%22gerrit1001%22+(status:open%20OR%20status:merged)

Follow these steps to migrate from one server to another:

Make sure you run rsync with --delete

  • Rsync /srv/gerrit/git/ , /srv/gerrit/plugins and /var/lib/gerrit2/review_site/ from <old_host> to <new_host>
    • rsync --archive --verbose --delete /srv/gerrit/git/ rsync://<new_host>.wikimedia.org/gerrit-data/git/
    • rsync --archive --verbose --delete /srv/gerrit/plugins/ rsync://<new_host>.wikimedia.org/gerrit-data/plugins/
    • rsync --archive --verbose --delete /var/lib/gerrit2/review_site/ rsync://<new_host>.wikimedia.org/gerrit-var-lib/

Note: We are using rsync protocol directly, gerrit-data and gerrit-var-lib are the names of rsync modules, not file system paths. They are defined in Puppet class profile::gerrit::migration.