Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions content/docs/01.getting-started/01.quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,6 @@ tasks:

This flow uses the [Kestra Log plugin](/plugins/core/tasks/log/io.kestra.plugin.core.log.log) to log a message to the console. Click **Save**, then click **Execute** to start your first execution.


:::next-link
[For a more detailed introduction to Kestra, check the tutorial](../03.tutorial/index.md)
:::


## Next steps

Congratulations! You've just installed Kestra and executed your first flow! :clap:
Expand Down
1 change: 0 additions & 1 deletion content/docs/03.tutorial/03.outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,6 @@ This example flow passes data between tasks using outputs. The `inputFiles` argu

![Preview](/docs/tutorial/outputs/preview.png)


To sum up, our flow extracts data from an API, uses that data in a Python script, executes a SQL query, and generates a downloadable artifact.

:::alert{type="info"}
Expand Down
3 changes: 0 additions & 3 deletions content/docs/03.tutorial/06.errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ To get notified on a workflow failure, you can leverage Kestra's built-in notifi
- [Microsoft Teams](/plugins/plugin-notifications/teams/io.kestra.plugin.notifications.teams.teamsexecution)
- [Email](/plugins/plugin-notifications/mail/io.kestra.plugin.notifications.mail.mailexecution)


For a centralized namespace-level alerting, we recommend adding a dedicated monitoring workflow with one of the above mentioned notification tasks and a Flow trigger. Below is an example workflow that automatically sends a Slack alert as soon as any flow in the namespace `company.team` fails or finishes with warnings.

```yaml
Expand Down Expand Up @@ -130,8 +129,6 @@ errors:
format: This will never be executed as retries will fix the issue
```



### Adding a retry configuration to our tutorial workflow

Returning to the example from the [Fundamentals](./01.fundamentals.md) section. We will add a retry configuration to the `api` task. API calls are prone to transient errors, so we will retry that task up to 10 times, for at most 1 hour of total duration, every 10 seconds (i.e., with a constant interval of 10 seconds in between retry attempts).
Expand Down
5 changes: 0 additions & 5 deletions content/docs/03.tutorial/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,5 @@ This tutorial guides you through **key concepts** in Kestra. We start with the s

We then dive into `parallel` task execution, error handling, as well as custom scripts and microservices running in isolated containers. Let's get started!


:::next-link
[Fundamentals: build a "Hello World" flow](./01.fundamentals.md)
:::

:::ChildCard
:::
114 changes: 114 additions & 0 deletions content/docs/15.how-to-guides/github-repo-backup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
---
title: Back Up Kestra GitHub Repositories to Google Cloud Storage
icon: /docs/icons/github.svg
stage: Intermediate
topics:
- Integrations
- Object Storage
- Version Control
---

Clone every repository in the `kestra-io` GitHub organization, zip each repo, and upload the archives to Google Cloud Storage (GCS) for safekeeping.

---

## Why run this backup?

Organizations often mirror source control data outside GitHub to satisfy compliance, enable disaster recovery drills, or seed analytics and search workloads. This flow collects every repository, produces portable zip artifacts, and stores them in GCS so you have an off-platform copy you can restore or inspect independently of GitHub.

---

## Prerequisites

- GitHub personal access token stored as `GITHUB_TOKEN`.
- GCP service account JSON key stored as `GCP_SERVICE_ACCOUNT`.
- A target bucket such as `gs://your_gcs_bucket/kestra-backups/`.
- The [Google Cloud Storage plugin](/plugins/plugin-gcp/gcs/io.kestra.plugin.gcp.gcs.upload) available to your workers.

---

## Flow Definition

```yaml
id: github_repo_backup
namespace: company.team
description: Clones Kestra GitHub repositories and backs them up to Google Cloud Storage.

tasks:
- id: search_kestra_repos
type: io.kestra.plugin.github.repositories.Search
description: Search for all repositories under the 'kestra-io' GitHub organization.
query: "user:kestra-io"
oauthToken: "{{ secret('GITHUB_TOKEN') }}"

- id: for_each_repo
type: io.kestra.plugin.core.flow.ForEach
description: Iterate over each found repository.
values: "{{ outputs.search_kestra_repos.uri | internalStorage.get() | jq('.items[].clone_url') }}"
tasks:
- id: working_dir
type: io.kestra.plugin.core.flow.WorkingDirectory
description: Create a temporary working directory for cloning and zipping each repository.
tasks:
- id: clone_repo
type: io.kestra.plugin.git.Clone
description: Clone the current repository from GitHub.
url: "{{ taskrun.value }}"
directory: "{{ taskrun.value | split('/') | last | split('.') | first }}"

- id: zip_repo
type: io.kestra.plugin.scripts.shell.Commands
description: Zip the cloned repository's contents.
beforeCommands:
- apk add zip > /dev/null 2>&1 || true
commands:
- |
REPO_DIR="{{ outputs.clone_repo.directory }}"
REPO_NAME="{{ REPO_DIR | split('/') | last }}"
cd "${REPO_DIR}"
zip -r "../${REPO_NAME}.zip" .
outputFiles:
- "{{ outputs.clone_repo.directory | split('/') | last }}.zip"

- id: upload_to_gcs
type: io.kestra.plugin.gcp.gcs.Upload
description: Upload the zipped repository to Google Cloud Storage.
from: "{{ outputs.zip_repo.outputFiles['' ~ (outputs.clone_repo.directory | split('/') | last) ~ '.zip'] }}"
to: "gs://your_gcs_bucket/kestra-backups/{{ outputs.clone_repo.directory | split('/') | last }}.zip"
serviceAccount: "{{ secret('GCP_SERVICE_ACCOUNT') }}"
```

---

## How It Works

The flow stages data once and reuses it through internal storage to avoid repeated GitHub calls. The search task writes its JSON response to storage, and the ForEach loop reads only the `clone_url` fields, keeping the payload small. Each iteration runs in a fresh working directory so cloned files and archives stay isolated. The zip command emits a predictable filename via `outputFiles`, which the GCS upload consumes directly, reducing chances of path mismatches. Secrets supply tokens to GitHub and GCP at runtime without embedding credentials in the flow definition.

1. `search_kestra_repos` fetches all repositories in the `kestra-io` organization and stores the JSON results in internal storage.
2. `for_each_repo` loops over each `clone_url` extracted via `jq` from the stored JSON.
3. `working_dir` isolates each iteration, keeping cloned data and archives scoped to a temporary folder.
4. `clone_repo` clones the current repository URL.
5. `zip_repo` compresses the cloned repository and exposes the zip file through `outputFiles` so the next task can read it.
6. `upload_to_gcs` uploads each archive to the chosen bucket path using the GCP service account key.

For a smaller dry run, narrow the search query (for example, add `topic:cli`) or wrap the `jq` extraction with a slice such as `jq('.items[:2].clone_url')` to process only a few repositories.

---

## Use Playground to Test Safely

Playground mode helps you validate expensive steps incrementally. Start with the search task to confirm authentication and inspect the JSON before any cloning. When refining the zip or upload steps, slice the list to a single repository so you can replay those tasks without hitting GitHub or GCS repeatedly. Because Playground keeps prior task outputs, you can iterate on shell commands and storage paths while reusing the same search result and clone, keeping feedback fast and low-risk.

Playground mode lets you validate one task at a time without running the whole backup loop. Follow [the Playground guide](../08.ui/10.playground.md) and use this flow as follows:

1. Toggle Playground mode in the editor.
2. Run only `search_kestra_repos` to confirm your GitHub token works and inspect the search output.
3. Temporarily limit the `values` expression (for example, `jq('.items[:1].clone_url')`) so downstream tasks operate on a single repository while you iterate.
4. Play `zip_repo` and `upload_to_gcs` individually inside Playground; Kestra reuses outputs from previous played tasks, so you avoid recloning every repository.
5. When satisfied, revert any temporary limits and use **Run all tasks** for a full backup execution.

This approach prevents unnecessary GitHub calls and GCS writes while you refine the flow logic.

---

You now have a reusable flow that continuously backs up the `kestra-io` GitHub organization to GCS with secrets-managed authentication and a safe Playground workflow for testing.