[feat] fov, immersive, pass-through mode.

4 months ago · 562a9ca27d
6 changed files with 449 additions and 142 deletions
--- a/README.md
+++ b/README.md
@ -4,13 +4,32 @@ The TeleVuer library is a specialized version of the [Vuer](https://github.com/v

 Currently, this module serves as a core component of the [xr_teleoperate](https://github.com/unitreerobotics/xr_teleoperate) library, offering advanced functionality for teleoperation tasks. It supports various XR devices, including Apple Vision Pro, Meta Quest3, Pico 4 Ultra Enterprise etc., ensuring compatibility and ease of use for robotic teleoperation applications.

-## Release Note
-V3.0 brings updates:
+The image input of this library works in conjunction with the [teleimager](https://github.com/silencht/teleimager) library. We recommend using both libraries together.
+
+## 0. 🔖 Release Note
+
+### V4.0 🏷️ brings updates:
+
+1. Improved Display Modes
+
+    Removed the old “pass_through” mode. The system now supports three modes:
+
+    - immersive: fully immersive mode; VR shows the robot's first-person view (zmq or webrtc must be enabled).
+
+    - pass-through: VR shows the real world through the VR headset cameras; no image from zmq or webrtc is displayed (even if enabled).
+
+    - fov: a small window in the center shows the robot's first-person view, while the surrounding area shows the real world.
+
+2. Enhanced Immersion
+
+    Adjusted the image plane height for immersive and fov modes to provide a more natural and comfortable VR experience
+
+### V3.0 🏷️ brings updates:
 1. Added `pass_through` interface to enable/disable the pass-through mode.
 2. Support `webrtc` interface to enable/disable the webrtc streaming mode.
 3. Use `render_to_xr` method (adjust from `set_display_image`) to send images to XR device.

-V2.0 brings updates:
+### V2.0 🏷️ brings updates:

 1. Image transport is now by reference instead of external shared memory.
 2. Renamed the get-data function from `get_motion_state_data` to `get_tele_data`.
@ -19,7 +38,7 @@ V2.0 brings updates:
 5. Streamlined the data structure: removed the nested `TeleStateData` and return everything in the unified `TeleData`.
 6. Added new image-transport interfaces such as `set_display_image`.

-## 1. Diagram
+## 1. 🗺️ Diagram

 <p align="center">
  <a href="https://oss-global-cdn.unitree.com/static/5ae3c9ee9a3d40dc9fe002281e8aeac1_2975x3000.png">
@ -27,9 +46,9 @@ V2.0 brings updates:
  </a>
 </p>

-## 2. Install
+## 2. 📦 Install

-### 2.1 Install televuer repository
+### 2.1 📥 Install televuer repository

 ```bash
 git clone https://github.com/silencht/televuer
@ -38,7 +57,7 @@ pip install -e . # or pip install .
 ```


-### 2.2 Generate Certificate Files
+### 2.2 🔑 Generate Certificate Files

 The televuer module requires SSL certificates to allow XR devices (such as Pico / Quest / Apple Vision Pro) to connect securely via HTTPS / WebRTC.

@ -73,13 +92,13 @@ build  cert.pem  key.pem  LICENSE  pyproject.toml  README.md  rootCA.key  rootCA
 # Use AirDrop to copy rootCA.pem to your Apple Vision Pro device and install it manually as a trusted certificate.
 ```

-3. Allow Firewall Access
+3. 🧱 Allow Firewall Access

 ```bash
 sudo ufw allow 8012
 ```

-### 2.3 Configure Certificate Paths (Choose One Method)
+### 2.3 🔐 Configure Certificate Paths (Choose One Method)

 You can tell televuer where to find the certificate files using either environment variables or a user config directory.

@ -105,7 +124,7 @@ cp cert.pem key.pem ~/.config/xr_teleoperate/

 If neither of the above methods is used, televuer will look for the certificate files from the function parameters or fall back to the default paths within the module.

-## 3. Test
+## 3. 🧐 Test

 ```bash
 python test_televuer.py 
@ -120,7 +139,7 @@ python test_tv_wrapper.py
 # Press Enter in the terminal to launch the program.
 ```

-## 4. Version History
+## 4. 📌 Version History

 `vuer==0.0.32rc7`

--- a/pyproject.toml
+++ b/pyproject.toml
@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "televuer"
-version = "3.9.0"
+version = "4.0.0"
 description = "XR vision and hand/controller teleoperate interface for unitree robotics"
 authors = [
    { name = "silencht", email = "silencht@qq.com" }
--- a/src/televuer/televuer.py
+++ b/src/televuer/televuer.py
@ -7,38 +7,57 @@ import threading
 import cv2
 import os
 from pathlib import Path
+from typing import Literal


 class TeleVuer:
-    def __init__(self, use_hand_tracking: bool, pass_through:bool=False, binocular: bool=True, img_shape: tuple=None, 
-                       cert_file: str=None, key_file: str=None, webrtc: bool=False, webrtc_url: str=None, display_fps: float=30.0):
+    def __init__(self, use_hand_tracking: bool, binocular: bool=True, img_shape: tuple=None, display_fps: float=30.0,
+                       display_mode: Literal["immersive", "pass-through", "fov"]="immersive", zmq: bool=False, webrtc: bool=False, webrtc_url: str=None, 
+                       cert_file: str=None, key_file: str=None):
        """
        TeleVuer class for OpenXR-based XR teleoperate applications.
        This class handles the communication with the Vuer server and manages image and pose data.

        :param use_hand_tracking: bool, whether to use hand tracking or controller tracking.
-        :param pass_through: bool, controls the VR viewing mode.
-
-            Note:
-            - if pass_through is True, the XR user will see the real world through the VR headset cameras.
-            - if pass_through is False, the XR user will see the images provided by webrtc or render_to_xr method:
-            - webrtc is prior to render_to_xr. if webrtc is True, the class will use webrtc for image transmission.
-            - if webrtc is False, the class will use render_to_xr for image transmission.
-    
        :param binocular: bool, whether the application is binocular (stereoscopic) or monocular.
        :param img_shape: tuple, shape of the head image (height, width).
+        :param display_fps: float, target frames per second for display updates (default: 30.0).
+        
+        :param display_mode: str, controls the VR viewing mode. Options are "immersive", "pass-through", and "fov".
+        :param zmq: bool, whether to use zmq for image transmission.
+        :param webrtc: bool, whether to use webrtc for real-time communication.
+        :param webrtc_url: str, URL for the webrtc offer. must be provided if webrtc is True.
        :param cert_file: str, path to the SSL certificate file.
        :param key_file: str, path to the SSL key file.
-        :param webrtc: bool, whether to use WebRTC for real-time communication. if False, use ImageBackground.
-        :param webrtc_url: str, URL for the WebRTC offer.
-        :param display_fps: float, target frames per second for display updates (default: 30.0).
+
+        Note:
+
+        - display_mode controls what the VR headset displays:
+            * "immersive": fully immersive mode; VR shows the robot's first-person view (zmq or webrtc must be enabled).
+            * "pass-through": VR shows the real world through the VR headset cameras; no image from zmq or webrtc is displayed (even if enabled).
+            * "fov": Field-of-View mode; a small window in the center shows the robot's first-person view, while the surrounding area shows the real world.
+        
+        - Only one image mode is active at a time.
+        - Image transmission to VR occurs only if display_mode is "immersive" or "fov" and the corresponding zmq or webrtc option is enabled.
+        - If zmq and webrtc simultaneously enabled, webrtc will be prioritized.
+
+        --------------              -------------------           --------------       -----------------                     -------
+         display_mode       |        display behavior         |    image to VR     |      image source        |               Notes
+        --------------              -------------------           --------------       -----------------                     ------- 
+           immersive        |   fully immersive view (robot)  |     Yes (full)     |     zmq or webrtc        |   if both enabled, webrtc prioritized
+        --------------              -------------------           --------------       -----------------                     -------
+         pass-through       |       Real world view (VR)      |         No         |          N/A             |  even if image source enabled, don't display
+        --------------              -------------------           --------------       -----------------                     -------
+              fov           |      FOV view (robot + VR)      |    Yes (small)     |     zmq or webrtc        |   if both enabled, webrtc prioritized
+        --------------              -------------------           --------------       -----------------                     -------
+
        """
        self.use_hand_tracking = use_hand_tracking
-        self.display_fps = display_fps
-        self.pass_through = pass_through
-
        self.binocular = binocular
+        if img_shape is None:
+            raise ValueError("[TeleVuer] img_shape must be provided.")
        self.img_shape = (img_shape[0], img_shape[1], 3)
+        self.display_fps = display_fps
        self.img_height = self.img_shape[0]
        if self.binocular:
            self.img_width  = self.img_shape[1] // 2
@ -76,16 +95,29 @@ class TeleVuer:
        else:
            self.vuer.add_handler("CONTROLLER_MOVE")(self.on_controller_move)

+        self.display_mode = display_mode
+        self.zmq = zmq
        self.webrtc = webrtc
        self.webrtc_url = webrtc_url

-        if self.webrtc:
-            if self.binocular:
-                self.vuer.spawn(start=False)(self.main_image_binocular_webrtc)
+        if self.display_mode == "immersive":
+            if self.webrtc:
+                fn = self.main_image_binocular_webrtc if self.binocular else self.main_image_monocular_webrtc
+            elif self.zmq:
+                self.img2display_shm = shared_memory.SharedMemory(create=True, size=np.prod(self.img_shape) * np.uint8().itemsize)
+                self.img2display = np.ndarray(self.img_shape, dtype=np.uint8, buffer=self.img2display_shm.buf)
+                self.latest_frame = None
+                self.new_frame_event = threading.Event()
+                self.stop_writer_event = threading.Event()
+                self.writer_thread = threading.Thread(target=self._xr_render_loop, daemon=True)
+                self.writer_thread.start()
+                fn = self.main_image_binocular_zmq if self.binocular else self.main_image_monocular_zmq
            else:
-                self.vuer.spawn(start=False)(self.main_image_monocular_webrtc)
-        else:
-            if self.pass_through is False:
+                raise ValueError("[TeleVuer] immersive mode requires zmq=True or webrtc=True.")
+        elif self.display_mode == "fov":
+            if self.webrtc:
+                fn = self.main_image_binocular_webrtc_fov if self.binocular else self.main_image_monocular_webrtc_fov
+            elif self.zmq:
                self.img2display_shm = shared_memory.SharedMemory(create=True, size=np.prod(self.img_shape) * np.uint8().itemsize)
                self.img2display = np.ndarray(self.img_shape, dtype=np.uint8, buffer=self.img2display_shm.buf)
                self.latest_frame = None
@ -93,10 +125,15 @@ class TeleVuer:
                self.stop_writer_event = threading.Event()
                self.writer_thread = threading.Thread(target=self._xr_render_loop, daemon=True)
                self.writer_thread.start()
-            if self.binocular:
-                 self.vuer.spawn(start=False)(self.main_image_binocular)
+                fn = self.main_image_binocular_zmq_fov if self.binocular else self.main_image_monocular_zmq_fov
            else:
-                 self.vuer.spawn(start=False)(self.main_image_monocular)
+                raise ValueError("[TeleVuer] fov mode requires zmq=True or webrtc=True.")
+        elif self.display_mode == "pass-through":
+            fn = self.main_pass_through
+        else:
+            raise ValueError(f"[TeleVuer] Unknown display_mode: {self.display_mode}")
+        
+        self.vuer.spawn(start=False)(fn)

        self.head_pose_shared = Array('d', 16, lock=True)
        self.left_arm_pose_shared = Array('d', 16, lock=True)
@ -162,7 +199,7 @@ class TeleVuer:
            self.img2display[:] = latest_frame
    
    def render_to_xr(self, image):
-        if self.webrtc or self.pass_through:
+        if self.webrtc or self.display_mode == "pass-through":
            print("[TeleVuer] Warning: render_to_xr is ignored when webrtc is enabled or pass_through is True.")
            return
        self.latest_frame = image
@ -171,7 +208,7 @@ class TeleVuer:
    def close(self):
        self.process.terminate()
        self.process.join(timeout=0.5)
-        if not self.webrtc and not self.pass_through:
+        if self.display_mode in ("immersive", "fov") and not self.webrtc:
            self.stop_writer_event.set()
            self.new_frame_event.set()
            self.writer_thread.join(timeout=0.5)
@ -274,7 +311,8 @@ class TeleVuer:
        except:
            pass
    
-    async def main_image_binocular(self, session):
+    ## immersive MODE
+    async def main_image_binocular_zmq(self, session):
        if self.use_hand_tracking:
            session.upsert(
                Hands(
@ -296,41 +334,40 @@ class TeleVuer:
                to="bgChildren",
            )
        while True:
-            if self.pass_through is False:
-                session.upsert(
-                    [
-                        ImageBackground(
-                            self.img2display[:, :self.img_width],
-                            aspect=self.aspect_ratio,
-                            height=1,
-                            distanceToCamera=1,
-                            # The underlying rendering engine supported a layer binary bitmask for both objects and the camera. 
-                            # Below we set the two image planes, left and right, to layers=1 and layers=2. 
-                            # Note that these two masks are associated with left eye’s camera and the right eye’s camera.
-                            layers=1,
-                            format="jpeg",
-                            quality=80,
-                            key="background-left",
-                            interpolate=True,
-                        ),
-                        ImageBackground(
-                            self.img2display[:, self.img_width:],
-                            aspect=self.aspect_ratio,
-                            height=1,
-                            distanceToCamera=1,
-                            layers=2,
-                            format="jpeg",
-                            quality=80,
-                            key="background-right",
-                            interpolate=True,
-                        ),
-                    ],
-                    to="bgChildren",
-                )
+            session.upsert(
+                [
+                    ImageBackground(
+                        self.img2display[:, :self.img_width],
+                        aspect=self.aspect_ratio,
+                        height=1,
+                        distanceToCamera=1,
+                        # The underlying rendering engine supported a layer binary bitmask for both objects and the camera. 
+                        # Below we set the two image planes, left and right, to layers=1 and layers=2. 
+                        # Note that these two masks are associated with left eye’s camera and the right eye’s camera.
+                        layers=1,
+                        format="jpeg",
+                        quality=80,
+                        key="background-left",
+                        interpolate=True,
+                    ),
+                    ImageBackground(
+                        self.img2display[:, self.img_width:],
+                        aspect=self.aspect_ratio,
+                        height=1,
+                        distanceToCamera=1,
+                        layers=2,
+                        format="jpeg",
+                        quality=80,
+                        key="background-right",
+                        interpolate=True,
+                    ),
+                ],
+                to="bgChildren",
+            )
            # 'jpeg' encoding should give you about 30fps with a 16ms wait in-between.
            await asyncio.sleep(1.0 / self.display_fps)

-    async def main_image_monocular(self, session):
+    async def main_image_monocular_zmq(self, session):
        if self.use_hand_tracking:
            session.upsert(
                Hands(
@ -353,22 +390,21 @@ class TeleVuer:
            )

        while True:
-            if self.pass_through is False:
-                session.upsert(
-                    [
-                        ImageBackground(
-                            self.img2display,
-                            aspect=self.aspect_ratio,
-                            height=1,
-                            distanceToCamera=1,
-                            format="jpeg",
-                            quality=80,
-                            key="background-mono",
-                            interpolate=True,
-                        ),
-                    ],
-                    to="bgChildren",
-                )
+            session.upsert(
+                [
+                    ImageBackground(
+                        self.img2display,
+                        aspect=self.aspect_ratio,
+                        height=1,
+                        distanceToCamera=1,
+                        format="jpeg",
+                        quality=80,
+                        key="background-mono",
+                        interpolate=True,
+                    ),
+                ],
+                to="bgChildren",
+            )
            await asyncio.sleep(1.0 / self.display_fps)

    async def main_image_binocular_webrtc(self, session):
@ -394,23 +430,113 @@ class TeleVuer:
            )

        while True:
-            if self.pass_through is False:
-                session.upsert(
-                    WebRTCStereoVideoPlane(
-                        src=self.webrtc_url,
-                        iceServer=None,
-                        iceServers=[], 
-                        key="video-quad",
+            session.upsert(
+                WebRTCStereoVideoPlane(
+                    src=self.webrtc_url,
+                    iceServer=None,
+                    iceServers=[], 
+                    key="video-quad",
+                    aspect=self.aspect_ratio,
+                    height = 7,
+                    layout="stereo-left-right"
+                ),
+                to="bgChildren",
+            )
+            await asyncio.sleep(1.0 / self.display_fps)
+
+    async def main_image_monocular_webrtc(self, session):
+        if self.use_hand_tracking:
+            session.upsert(
+                Hands(
+                    stream=True,
+                    key="hands",
+                    hideLeft=True,
+                    hideRight=True
+                ),
+                to="bgChildren",
+            )
+        else:
+            session.upsert(
+                MotionControllers(
+                    stream=True, 
+                    key="motionControllers",
+                    left=True,
+                    right=True,
+                ),
+                to="bgChildren",
+            )
+
+        while True:
+            session.upsert(
+                WebRTCVideoPlane(
+                    src=self.webrtc_url,
+                    iceServer=None,
+                    iceServers=[],
+                    key="video-quad",
+                    aspect=self.aspect_ratio,
+                    height = 7,
+                ),
+                to="bgChildren",
+            )
+            await asyncio.sleep(1.0 / self.display_fps)
+
+    ## FOV MODE
+    async def main_image_binocular_zmq_fov(self, session):
+        if self.use_hand_tracking:
+            session.upsert(
+                Hands(
+                    stream=True,
+                    key="hands",
+                    hideLeft=True,
+                    hideRight=True
+                ),
+                to="bgChildren",
+            )
+        else:
+            session.upsert(
+                MotionControllers(
+                    stream=True,
+                    key="motionControllers",
+                    left=True,
+                    right=True,
+                ),
+                to="bgChildren",
+            )
+        while True:
+            session.upsert(
+                [
+                    ImageBackground(
+                        self.img2display[:, :self.img_width],
                        aspect=self.aspect_ratio,
-                        height = 7,
-                        layout="stereo-left-right"
+                        height=0.75,
+                        distanceToCamera=2,
+                        # The underlying rendering engine supported a layer binary bitmask for both objects and the camera. 
+                        # Below we set the two image planes, left and right, to layers=1 and layers=2. 
+                        # Note that these two masks are associated with left eye’s camera and the right eye’s camera.
+                        layers=1,
+                        format="jpeg",
+                        quality=80,
+                        key="background-left",
+                        interpolate=True,
                    ),
-                    to="bgChildren",
-                )
-
+                    ImageBackground(
+                        self.img2display[:, self.img_width:],
+                        aspect=self.aspect_ratio,
+                        height=0.75,
+                        distanceToCamera=2,
+                        layers=2,
+                        format="jpeg",
+                        quality=80,
+                        key="background-right",
+                        interpolate=True,
+                    ),
+                ],
+                to="bgChildren",
+            )
+            # 'jpeg' encoding should give you about 30fps with a 16ms wait in-between.
            await asyncio.sleep(1.0 / self.display_fps)

-    async def main_image_monocular_webrtc(self, session):
+    async def main_image_monocular_zmq_fov(self, session):
        if self.use_hand_tracking:
            session.upsert(
                Hands(
@ -433,20 +559,122 @@ class TeleVuer:
            )

        while True:
-            if self.pass_through is False:
-                session.upsert(
-                    WebRTCVideoPlane(
-                        src=self.webrtc_url,
-                        iceServer=None,
-                        iceServers=[],
-                        key="video-quad",
+            session.upsert(
+                [
+                    ImageBackground(
+                        self.img2display,
                        aspect=self.aspect_ratio,
-                        height = 7,
+                        height=0.75,
+                        distanceToCamera=2,
+                        format="jpeg",
+                        quality=80,
+                        key="background-mono",
+                        interpolate=True,
                    ),
-                    to="bgChildren",
-                )
+                ],
+                to="bgChildren",
+            )
+            await asyncio.sleep(1.0 / self.display_fps)
+
+    async def main_image_binocular_webrtc_fov(self, session):
+        if self.use_hand_tracking:
+            session.upsert(
+                Hands(
+                    stream=True,
+                    key="hands",
+                    hideLeft=True,
+                    hideRight=True
+                ),
+                to="bgChildren",
+            )
+        else:
+            session.upsert(
+                MotionControllers(
+                    stream=True, 
+                    key="motionControllers",
+                    left=True,
+                    right=True,
+                ),
+                to="bgChildren",
+            )

+        while True:
+            session.upsert(
+                WebRTCStereoVideoPlane(
+                    src=self.webrtc_url,
+                    iceServer=None,
+                    iceServers=[], 
+                    key="video-quad",
+                    aspect=self.aspect_ratio,
+                    height=3,
+                    layout="stereo-left-right"
+                ),
+                to="bgChildren",
+            )
            await asyncio.sleep(1.0 / self.display_fps)
+
+    async def main_image_monocular_webrtc_fov(self, session):
+        if self.use_hand_tracking:
+            session.upsert(
+                Hands(
+                    stream=True,
+                    key="hands",
+                    hideLeft=True,
+                    hideRight=True
+                ),
+                to="bgChildren",
+            )
+        else:
+            session.upsert(
+                MotionControllers(
+                    stream=True, 
+                    key="motionControllers",
+                    left=True,
+                    right=True,
+                ),
+                to="bgChildren",
+            )
+
+        while True:
+            session.upsert(
+                WebRTCVideoPlane(
+                    src=self.webrtc_url,
+                    iceServer=None,
+                    iceServers=[],
+                    key="video-quad",
+                    aspect=self.aspect_ratio,
+                    height=3,
+                ),
+                to="bgChildren",
+            )
+            await asyncio.sleep(1.0 / self.display_fps)
+
+    ## pass-through MODE
+    async def main_pass_through(self, session):
+        if self.use_hand_tracking:
+            session.upsert(
+                Hands(
+                    stream=True,
+                    key="hands",
+                    hideLeft=True,
+                    hideRight=True
+                ),
+                to="bgChildren",
+            )
+        else:
+            session.upsert(
+                MotionControllers(
+                    stream=True, 
+                    key="motionControllers",
+                    left=True,
+                    right=True,
+                ),
+                to="bgChildren",
+            )
+
+        while True:
+            await asyncio.sleep(1.0 / self.display_fps)
+
    # ==================== common data ====================
    @property
    def head_pose(self):
--- a/src/televuer/tv_wrapper.py
+++ b/src/televuer/tv_wrapper.py
@ -1,7 +1,7 @@
 import numpy as np
 from .televuer import TeleVuer
 from dataclasses import dataclass, field
-
+from typing import Literal
 """
 (basis) OpenXR Convention : y up, z back, x right. 
 (basis) Robot  Convention : z up, y left, x front.  
@ -193,35 +193,51 @@ class TeleData:


 class TeleVuerWrapper:
-    def __init__(self, use_hand_tracking: bool, pass_through: bool=False, binocular: bool=True, img_shape: tuple=(480, 1280),
-                       cert_file: str=None, key_file: str=None, webrtc: bool=False, webrtc_url: str=None, display_fps: float=30.0,
-                       return_hand_rot_data: bool=False):
+    def __init__(self, use_hand_tracking: bool, binocular: bool=True, img_shape: tuple=(480, 1280), display_fps: float=30.0,
+                       display_mode: Literal["immersive", "pass-through", "fov"]="immersive", zmq: bool=False, webrtc: bool=False, webrtc_url: str=None, 
+                       cert_file: str=None, key_file: str=None, return_hand_rot_data: bool=False):
        """
        TeleVuerWrapper is a wrapper for the TeleVuer class, which handles XR device's data suit for robot control.
        It initializes the TeleVuer instance with the specified parameters and provides a method to get motion state data.

        :param use_hand_tracking: bool, whether to use hand tracking or controller tracking.
-        :param pass_through: bool, controls the VR viewing mode.
-
-            Note:
-            - if pass_through is True, the XR user will see the real world through the VR headset cameras.
-            - if pass_through is False, the XR user will see the images provided by webrtc or render_to_xr method:
-            - webrtc is prior to render_to_xr. if webrtc is True, the class will use webrtc for image transmission.
-            - if webrtc is False, the class will use render_to_xr for image transmission.
-    
        :param binocular: bool, whether the application is binocular (stereoscopic) or monocular.
        :param img_shape: tuple, shape of the head image (height, width).
+        :param display_fps: float, target frames per second for display updates (default: 30.0).
+
+        :param display_mode: str, controls the VR viewing mode. Options are "immersive", "pass-through", and "fov".
+        :param zmq: bool, whether to use ZMQ for image transmission.
+        :param webrtc: bool, whether to use webrtc for real-time communication.
+        :param webrtc_url: str, URL for the webrtc offer. must be provided if webrtc is True.
        :param cert_file: str, path to the SSL certificate file.
        :param key_file: str, path to the SSL key file.
-        :param webrtc: bool, whether to use WebRTC for real-time communication. if False, use ImageBackground.
-        :param webrtc_url: str, URL for the WebRTC offer.
-        :param display_fps: float, target frames per second for display updates (default: 30.0).
-        :param return_hand_rot_data: bool, whether to return hand rotation data in TeleData
+
+        Note:
+
+        - display_mode controls what the VR headset displays:
+            * "immersive": fully immersive mode; VR shows the robot's first-person view (zmq or webrtc must be enabled).
+            * "pass-through": VR shows the real world through the VR headset cameras; no image from zmq or webrtc is displayed (even if enabled).
+            * "fov": Field-of-View mode; a small window in the center shows the robot's first-person view, while the surrounding area shows the real world.
+        
+        - Only one image mode is active at a time.
+        - Image transmission to VR occurs only if display_mode is "immersive" or "fov" and the corresponding zmq or webrtc option is enabled.
+        - If zmq and webrtc simultaneously enabled, webrtc will be prioritized.
+
+        --------------              -------------------           --------------       -----------------                     -------
+         display_mode       |        display behavior         |    image to VR     |      image source        |               Notes
+        --------------              -------------------           --------------       -----------------                     ------- 
+           immersive        |   fully immersive view (robot)  |     Yes (full)     |     zmq or webrtc        |   if both enabled, webrtc prioritized
+        --------------              -------------------           --------------       -----------------                     -------
+         pass-through       |       Real world view (VR)      |         No         |          N/A             |  even if image source enabled, don't display
+        --------------              -------------------           --------------       -----------------                     -------
+              fov           |      FOV view (robot + VR)      |    Yes (small)     |     zmq or webrtc        |   if both enabled, webrtc prioritized
+        --------------              -------------------           --------------       -----------------                     -------
        """
        self.use_hand_tracking = use_hand_tracking
        self.return_hand_rot_data = return_hand_rot_data
-        self.tvuer = TeleVuer(use_hand_tracking=use_hand_tracking, pass_through=pass_through, binocular=binocular,img_shape=img_shape,
-                              cert_file=cert_file, key_file=key_file, webrtc=webrtc, webrtc_url=webrtc_url, display_fps=display_fps)
+        self.tvuer = TeleVuer(use_hand_tracking=use_hand_tracking, binocular=binocular, img_shape=img_shape, display_fps=display_fps,
+                              display_mode=display_mode, zmq=zmq, webrtc=webrtc, webrtc_url=webrtc_url, 
+                              cert_file=cert_file, key_file=key_file)
        
    def get_tele_data(self):
        """
--- a/test/test_televuer.py
+++ b/test/test_televuer.py
@ -10,14 +10,39 @@ import logging_mp
 logger_mp = logging_mp.get_logger(__name__, level=logging_mp.INFO)

 def run_test_TeleVuer():
-    # xr-mode
    use_hand_track = False
-    tv = TeleVuer(use_hand_tracking = use_hand_track, pass_through=True, binocular=True, img_shape=(480, 1280))
+    # teleimager, if you want to test real image streaming, make sure teleimager server is running
+    from teleimager.image_client import ImageClient
+    img_client = ImageClient(host="192.168.123.164")
+    camera_config = img_client.get_cam_config()
+    # teleimager + televuer
+    tv = TeleVuer(use_hand_tracking=use_hand_track, 
+                  binocular=camera_config['head_camera']['binocular'],
+                  img_shape=camera_config['head_camera']['image_shape'],
+                  display_fps=camera_config['head_camera']['fps'],
+                  display_mode="immersive",   # "fov" or "immersive" or "pass-through"
+                  zmq=camera_config['head_camera']['enable_zmq'],
+                  webrtc=camera_config['head_camera']['enable_webrtc'],
+                  webrtc_url=f"https://192.168.123.164:{camera_config['head_camera']['webrtc_port']}/offer"
+                  )
+    # pure televuer
+    # tv = TeleVuer(use_hand_tracking=use_hand_track, 
+    #               binocular=True, 
+    #               img_shape=(480, 1280), 
+    #               display_fps=30.0,
+    #               display_mode="fov",      # "fov" or "immersive" or "pass-through"
+    #               zmq=False,
+    #               webrtc=True, 
+    #               webrtc_url="https://192.168.123.164:60001/offer"
+    #               )

    try:
        input("Press Enter to start TeleVuer test...")
        running = True
        while running:
+            img, _= img_client.get_head_frame()
+            tv.render_to_xr(img)
+
            start_time = time.time()
            logger_mp.info("=" * 80)
            logger_mp.info("Common Data (always available):")
@ -62,7 +87,7 @@ def run_test_TeleVuer():

            current_time = time.time()
            time_elapsed = current_time - start_time
-            sleep_time = max(0, 0.3 - time_elapsed)
+            sleep_time = max(0, 0.016 - time_elapsed)
            time.sleep(sleep_time)
            logger_mp.debug(f"main process sleep: {sleep_time}")
    except KeyboardInterrupt:
--- a/test/test_tv_wrapper.py
+++ b/test/test_tv_wrapper.py
@ -11,24 +11,43 @@ logger_mp = logging_mp.get_logger(__name__, level=logging_mp.INFO)


 def run_test_tv_wrapper():
-    # xr-mode
    use_hand_track=False
-    tv_wrapper = TeleVuerWrapper(use_hand_tracking=use_hand_track, pass_through=False,
-                                 binocular=True, img_shape=(480, 1280),
-                                #  webrtc=True, webrtc_url="https://192.168.123.164:60001/offer"
+
+    # teleimager, if you want to test real image streaming, make sure teleimager server is running
+    from teleimager.image_client import ImageClient
+    img_client = ImageClient(host="192.168.123.164")
+    camera_config = img_client.get_cam_config()
+    # teleimager + televuer
+    tv_wrapper = TeleVuerWrapper(use_hand_tracking=use_hand_track, 
+                                binocular=camera_config['head_camera']['binocular'],
+                                img_shape=camera_config['head_camera']['image_shape'],
+                                display_mode="immersive", 
+                                display_fps=camera_config['head_camera']['fps'],
+                                zmq=camera_config['head_camera']['enable_zmq'],
+                                webrtc=camera_config['head_camera']['enable_webrtc'],
+                                webrtc_url=f"https://192.168.123.164:{camera_config['head_camera']['webrtc_port']}/offer"
                                )
+    # pure televuer
+    # tv_wrapper = TeleVuerWrapper(use_hand_tracking=use_hand_track, 
+    #                              binocular=True, 
+    #                              img_shape=(480, 1280),
+    #                              display_fps=30.0,
+    #                              display_mode="fov", 
+    #                              zmq=True,
+    #                              webrtc=True, 
+    #                              webrtc_url="https://192.168.123.164:60001/offer"
+    #                              )
    try:
        input("Press Enter to start tv_wrapper test...")
        running = True
        while running:
            start_time = time.time()
+            img, _= img_client.get_head_frame()
+            tv_wrapper.render_to_xr(img)
+            
            logger_mp.info("---- TV Wrapper TeleData ----")
            teleData = tv_wrapper.get_tele_data()

-            # import cv2
-            # img = cv2.videoCapture(0).read()[1]
-            # tv_wrapper.render_to_xr(img)
-
            logger_mp.info("-------------------=== TeleData Snapshot ===-------------------")
            logger_mp.info(f"[Head Pose]:\n{teleData.head_pose}")
            logger_mp.info(f"[Left Wrist Pose]:\n{teleData.left_wrist_pose}")
@ -62,9 +81,9 @@ def run_test_tv_wrapper():

            current_time = time.time()
            time_elapsed = current_time - start_time
-            sleep_time = max(0, 0.16 - time_elapsed)
+            sleep_time = max(0, 0.016 - time_elapsed)
            time.sleep(sleep_time)
-            logger_mp.debug(f"main process sleep: {sleep_time}")
+            logger_mp.info(f"main process sleep: {sleep_time}")

    except KeyboardInterrupt:
        running = False