User:Ottomata/Notes
Appearance
Coding preferences
Organize your code!
Misc
kubernetes / helm
get k8s node metrics
akosiaris@prometheus1006:~$ curl dse-k8s-worker1005.eqiad.wmnet:10255/metrics/cadvisor
get network context for container
sudo nsenter -t 3548838 -n netstat -nlpt
get container/application logs for all pods
kube_env eventgate-analytics eqiad
kubectl logs -c eventgate-analytics -l app=eventgate --max-log-requests=50 --since 5m
MediaWiki
Timo's amazing page undelete explanation
https://phabricator.wikimedia.org/T351411#9338177
Running phpunit tests with docker-compose
docker-compose up
# ...
docker-compose exec mediawiki composer phpunit:entrypoint extensions/EventBus/tests/phpunit/
Troubleshooting
Refine
When a strange failure happens for a Refine, sometimes I find it is easiest to launch a spark scala shell to manually inspect the input dataset, and run some Refine code to see if I can reproduce errors
// 15:18:09 [@stat1004:/home/otto] $ spark3-shell --jars /srv/deployment/analytics/refinery/artifacts/refinery-job-shaded.jar
import org.wikimedia.analytics.refinery.job.refine._
// Create a Spark schemaLoader. There are other ways to do this,
// e.g. for old eventlogging metawiki schemas, or explict schemas.
// See Refine.scala getRefineTargetsFromFS.
val schemaLoader = EventSparkSchemaLoader(
Seq(
"https://schema.discovery.wmnet/repositories/primary/jsonschema",
"https://schema.discovery.wmnet/repositories/secondary/jsonschema",
),
loadLatest=true,
Some(Refine.Config.default.schema_field)
)
/*
* RefineTarget object apply has a helper method to instantiate a single RefineTarget.
* If you have the input and output paths of a failed hourly dataset, you can use these
* to create a RefineTarget.
*
* E.g. a Refine failure alert email might say:
*
* The following 1 of 2 dataset partitions for output table `event`.`mediawiki_content_translation_event` failed refinement:
* org.wikimedia.analytics.refinery.job.refine.RefineTargetException:
* Failed refinement of
* hdfs://analytics-hadoop/wmf/data/raw/event/codfw.mediawiki.content_translation_event/year=2023/month=06/day=16/hour=22 ->
* `event`.`mediawiki_content_translation_event`
* /wmf/data/event/mediawiki_content_translation_event/datacenter=codfw/year=2023/month=6/day=16/hour=22. Original exception: org.wikimedia.eventutilities.core.json.JsonLoadingException: Failed reading JSON/YAML data from /analytics/mediawiki/content_translation_event/latest
*
* So you've got the intput and output path in that email.
*/
val refineTarget = RefineTarget(
spark,
"hdfs://analytics-hadoop/wmf/data/raw/event/codfw.mediawiki.content_translation_event/year=2023/month=06/day=16/hour=22",
"/wmf/data/event/mediawiki_content_translation_event/datacenter=codfw/year=2023/month=6/day=16/hour=22",
schemaLoader
)
// With this RefineTarget, you can try to load and inspect the input dataframe,
// which is usually where Refine errors happen, due to corrupt records, or bad schemas or something.
val df = refineTarget.inputDataFrame
// Once you have the DataFrame of the input dataset, you can examine it with
// the usual Spark DataFrame API.
// https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html