From 7fae5350528474c29b664ebb4df5bbc8104b48ec Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Tue, 20 Jul 2021 00:32:02 -0700 Subject: [PATCH] add troubleshooting docs (#12791) --- .circleci/TROUBLESHOOT.md | 7 +++++++ .github/workflows/TROUBLESHOOT.md | 9 +++++++++ 2 files changed, 16 insertions(+) create mode 100644 .circleci/TROUBLESHOOT.md create mode 100644 .github/workflows/TROUBLESHOOT.md diff --git a/.circleci/TROUBLESHOOT.md b/.circleci/TROUBLESHOOT.md new file mode 100644 index 0000000000..c662a921ba --- /dev/null +++ b/.circleci/TROUBLESHOOT.md @@ -0,0 +1,7 @@ +# Troubleshooting + +This is a document explaining how to deal with various issues on Circle-CI. The entries may include actually solutions or pointers to Issues that cover those. + +## Circle CI + +* pytest worker runs out of resident RAM and gets killed by `cgroups`: https://github.com/huggingface/transformers/issues/11408 diff --git a/.github/workflows/TROUBLESHOOT.md b/.github/workflows/TROUBLESHOOT.md new file mode 100644 index 0000000000..616ba8e55b --- /dev/null +++ b/.github/workflows/TROUBLESHOOT.md @@ -0,0 +1,9 @@ +# Troubleshooting + +This is a document explaining how to deal with various issues on github-actions self-hosted CI. The entries may include actually solutions or pointers to Issues that cover those. + +## GitHub Actions (self-hosted CI) + +* Deepspeed + + - if jit build hangs, clear out `rm -rf ~/.cache/torch_extensions/` reference: https://github.com/huggingface/transformers/pull/12723