Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion source/en/user_guide/internnav/quick_start/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \
--index-url https://download.pytorch.org/whl/cu118

# install InternNav with model dependencies
pip install -e .[model]
pip install -e .[model] --no-build-isolation

```

Expand Down
50 changes: 45 additions & 5 deletions source/en/user_guide/internnav/quick_start/train_eval.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,39 @@ The training pipeline is currently under preparation and will be open-sourced so
Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory. Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1).

#### Evaluation on Isaac Sim
[UPDATE] We support using local model and isaac sim in one process now. Evaluate on Single-GPU:

```bash
python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
```

For multi-gpu inference, currently we support inference on environments that expose a torchrun-compatible runtime model (e.g., Torchrun or Aliyun DLC).

```bash
# for torchrun
./scripts/eval/bash/torchrun_eval.sh \
--config scripts/eval/configs/h1_internvla_n1_async_cfg.py

# for alicloud dlc
./scripts/eval/bash/eval_vln_distributed.sh \
internutopia \
--config scripts/eval/configs/h1_internvla_n1_async_cfg.py
```

The main architecture of the whole-system evaluation adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then submits tasks to the Ray distributed framework based on the corresponding cfg file, enabling the entire evaluation process to run.

First, change the 'model_path' in the cfg file to the path of the InternVLA-N1 weights. Start the evaluation server:
```bash
# from one process
conda activate <model_env>
python scripts/eval/start_server.py --config scripts/eval/configs/h1_internvla_n1_cfg.py
python scripts/eval/start_server.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
```

Then, start the client to run evaluation:
```bash
# from another process
conda activate <internutopia>
MESA_GL_VERSION_OVERRIDE=4.6 python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_cfg.py
MESA_GL_VERSION_OVERRIDE=4.6 python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_async_cfg.py
```

The evaluation results will be saved in the `eval_results.log` file in the output_dir of the config file. The whole evaluation process takes about 10 hours at RTX-4090 graphics platform.
Expand All @@ -36,13 +55,23 @@ The simulation can be visualized by set `vis_output=True` in eval_cfg.
Evaluate on Single-GPU:

```bash
python scripts/eval/eval_habitat.py --model_path checkpoints/InternVLA-N1 --continuous_traj --output_path result/InternVLA-N1/val_unseen_32traj_8steps
python scripts/eval/eval.py --config scripts/eval/configs/habitat_dual_system_cfg.py
```

For multi-gpu inference, currently we only support inference on SLURM.
For multi-gpu inference, currently we support inference on SLURM as well as environments that expose a torchrun-compatible runtime model (e.g., Aliyun DLC).

```bash
# for slurm
./scripts/eval/bash/eval_dual_system.sh

# for torchrun
./scripts/eval/bash/torchrun_eval.sh \
--config scripts/eval/configs/habitat_dual_system_cfg.py

# for alicloud dlc
./scripts/eval/bash/eval_vln_distributed.sh \
habitat \
--config scripts/eval/configs/habitat_dual_system_cfg.py
```


Expand Down Expand Up @@ -125,7 +154,18 @@ Currently we only support evaluate single System2 on Habitat:
Evaluate on Single-GPU:

```bash
python scripts/eval/eval_habitat.py --model_path checkpoints/InternVLA-N1-S2 --mode system2 --output_path results/InternVLA-N1-S2/val_unseen \
python scripts/eval/eval.py --config scripts/eval/configs/habitat_s2_cfg.py

# set config with the following fields
eval_cfg = EvalCfg(
agent=AgentCfg(
model_name='internvla_n1',
model_settings={
"mode": "system2", # inference mode: dual_system or system2
"model_path": "checkpoints/<s2_checkpoint>", # path to model checkpoint
}
)
)
```

For multi-gpu inference, currently we only support inference on SLURM.
Expand Down
34 changes: 23 additions & 11 deletions source/en/user_guide/internnav/tutorials/env.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Customizing Environments and Tasks in InternNav
# Environments Design in InternNav

This tutorial provided a step-by-step guide to define a new environment and a new navigation task within the InternNav framework.

Expand All @@ -17,26 +17,24 @@ Because of this separation:

- We can run the same agent in simulation (Isaac / InternUtopia) or on a real robot, as long as both environments implement the same API.

- We can benchmark different tasks (VLN, PointGoalNav, etc.) in different worlds without rewriting the agent.
- We can benchmark different tasks in different worlds without rewriting the agent.

InternNav already ships with two major environment backends:
![img.png](../../../_static/image/internnav_process.png)

InternNav already ships with three major environment backends:

- **InternUtopiaEnv**:
Simulated environment built on top of InternUtopia / Isaac Sim. This supports complex indoor scenes, object semantics, RGB-D sensing, and scripted evaluation loops.
- **HabitatEnv** (WIP): Simulated environment built on top of Habitat Sim.

- **HabitatEnv**: Simulated environment built on top of Habitat Sim. This supports gym style workflow and handles distribution episodes set up.

- **RealWorldEnv**:
Wrapper around an actual robot platform and its sensors (e.g. RGB camera, depth, odometry). This lets you deploy the same agent logic in the physical world.

Both of these are children of the same base [`Env`](https://github.com/InternRobotics/InternNav/blob/main/internnav/env/base.py) class.

## Evaluation Task (WIP)
For the vlnpe benchmark, we build the task based on internutopia. Here is a diagram.

![img.png](../../../_static/image/agent_definition.png)


## Evaluation Metrics (WIP)
### Evaluation Metrics in VLN-PE
For the VLN-PE benchmark in internutopia, InternNav provides comprehensive evaluation metrics:
- **Success Rate (SR)**: The proportion of episodes in which the agent successfully reaches the goal location within a 3-meter radius.
- **Success Rate weighted by Path Length (SPL)**: Measures both efficiency and success. It is defined as the ratio of the shortest-path distance to the actual trajectory length, weighted by whether the agent successfully reaches the goal.
Expand All @@ -47,4 +45,18 @@ A higher SPL indicates that the agent not only succeeds but does so efficiently,
- **Fall Rate (FR)**: The frequency at which the agent falls or loses balance during navigation.
- **Stuck Rate (StR)**: The frequency at which the agent becomes immobile or trapped (e.g., blocked by obstacles or unable to proceed).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can keep the introduction to metrics?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, explanations for metrics registered in vlnpe and vlnce are added.


The implementation is under `internnav/env/utils/internutopia_extensions`, we highly suggested follow the guide of [InternUtopia](../../internutopia).
### Evaluation Metrics in VLN-CE
For the VLN-CE benchmark in Habitat, InternNav keeps the original Habitat evaluation configuration and registers the following metrics:

- **Distance to Goal (DistanceToGoal)**: The geodesic distance from the agent’s current position to the goal location.

- **Success (Success)**: A binary indicator of whether the agent stops within **3 meters** of the goal.

- **Success weighted by Path Length (SPL)**: Measures both success and navigation efficiency. It is defined as the ratio of the shortest-path distance to the actual trajectory length, weighted by whether the agent successfully reaches the goal.
A higher SPL indicates that the agent not only succeeds but does so efficiently, without taking unnecessarily long routes.

- **Oracle Success Rate (OracleSuccess)**: The proportion of episodes in which **any point** along the agent’s trajectory comes within **3 meters** of the goal, representing potential success if the agent were to stop optimally.

- **Oracle Navigation Error (OracleNavigationError)**: The minimum geodesic distance between the agent and the goal over the entire trajectory.

- **Normalized Dynamic Time Warping (nDTW)**: Measures how closely the agent’s trajectory follows the ground-truth demonstration path. Only registered in rxr benchmarks.
1 change: 0 additions & 1 deletion source/en/user_guide/internnav/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ myst:
:caption: Tutorials
:maxdepth: 2

core
dataset
model
training
Expand Down