VisTrack v1.0 Release Instruction

VisTrack positioning system is the latest developed visual positioning system. It utilizes high-end stereo depth cameras and proprietary positioning algorithms to achieve accurate personnel positioning without the need for any wearable devices. It is capable of functioning in dark environments. The system can output precise 5-dimensional positioning data (XYWDH) through the OSC protocol and also supports PSN protocol for outputting three-dimensional positioning data. It is suitable for multimedia interactive and fixed installation projects.

Highlights :


1. Hardware: The VisTrack system utilizes two models of depth cameras – the regular version and the enhanced version. The system supports up to 12 cameras for stitching purposes. 

– The regular version has a maximum measuring distance of 10 meters, with a recommended measuring range of 0-5 meters. The recommended mounting height is around 6 meters above the stage. With a low-cut position set at 1 meter, the lateral coverage size is approximately 8 meters, and the vertical coverage size is over 5 meters.

– The enhanced version of the depth camera also has a maximum detection distance of 10 meters, with a recommended measuring range of 0-7 meters. The recommended mounting height is within 8 meters. The camera frame rate is typically 30 frames per second, with a high frame rate mode available at 60 frames per second. In the high frame rate mode, the recommended measuring range decreases accordingly. For the regular version, the recommended measuring range in the high frame rate mode is within 2.5 meters, while for the enhanced version, it is within 5 meters.


2. Principles:  The accurate ranging function of the depth camera is used to obtain the real-time distance from the camera to each pixel within its field of view. The system then compares the real-time image with the reference empty frame image captured  to calculate the position and size of each tracking blob in real-time.

When using multiple cameras for stitching purposes, it is important to consider the mounting angles of the cameras. The software provides a remote monitoring function that allows for real-time viewing of the rotation angle value along the XZ axis when adjusting the camera mounting angle on-site by smartphone. This ensures that the cameras are vertically mounted relative to the ground.

Regarding the Y-axis angle, it is ideal to mount the cameras horizontally on the same crossbar as other cameras, ensuring parallel alignment between the cameras and crossbar to maintain consistency along the Y-axis. It is also important to ensure that the crossbar is placed horizontally.

3. Compared to other wearable positioning systems, pure visual positioning offers greater convenience in terms of ease of use. It also provides more stable tracking signals and higher positioning accuracy. The typical positioning jitter range is within ±1 centimeter. Another advantage is that there is no limit to the number of tracking targets, and the number of targets has little impact on the frame rate. Of course, both systems have their own advantages. For example, visual positioning cannot ensure that the tracking ID remains fixed on a single object, while other wireless wearable positioning systems can achieve this. Therefore, they each have their own suitable scenarios.


4. Compared to radar positioning systems, VisTrack not only provides two-dimensional positioning but also tracks the length, width, and height dimensions of the human body in space in real-time. This offers more possibilities for multimedia interaction, which cannot be achieved by other positioning systems.



System Setup:

1. Before starting, we need to determine the size of the tracking area to determine how many cameras are needed for coverage. For open spaces, we can directly set the camera height in the software based on the actual height of the location. By setting the camera height at 5 meters (as shown in the figure below) and adjusting the low-cut position to a distance about 20 centimeters above the ground, you can roughly determine the size of the tracking area covered by a single camera. All dimensions in the software are in meters.


As shown in the figure, if you want to perform multi-camera stitching, it is important to ensure a larger horizontal fusion area and a relatively smaller vertical fusion area. This is because stereo cameras have a small portion of invalid areas at their horizontal edges, while the invalid areas at their vertical edges are relatively smaller. In the figure below, the white areas at the edges represent the invalid regions in a single camera’s frame. Therefore, when determining the size of the tracking area, these invalid areas should be excluded. If the height of your venue allows, you can also increase the mounting height of the cameras to obtain a larger tracking area.

2. If there are other props or set pieces in the tracking area that are shorter than the height of a person, you can raise the low-cut position above the height of these props. However, doing so will correspondingly reduce the size of the tracking area.

If these props are taller than a person and remain stationary, you can still maintain the previous low-cut position. In this case, you can use the  empty frame  capture function to capture a frame without any people present. This enables the system to track dynamic objects entering the tracking area while disregarding the stationary props.

3. Once we have determined the approximate size of the tracking area covered by a single camera, we can calculate how many cameras are needed to cover the entire tracking area when they are stitched together. Adjust the position of each camera in the software to ensure overlap between them.

Before adjusting the image stitching in the software, connect all the cameras to the computer and activate them. Activate the CaliMap/DepthMap buttons for each camera to ensure that the invalid areas of each camera are covered and merged by the valid areas of another camera. By doing so, the final three-dimensional positions of each camera can be determined.

While adjusting the camera positions, you can view the coordinates of the four corners of the coverage area for each camera through the “Corner Coordinates” option. This will give you an idea of the size of the coverage area for all the cameras.

4. Next, we need to adjust the image stitching for  tracking since all the tracking data is calculated based on this stitched image. We need to align the stitched image with the camera’s stitched image in the 3D space.

Start by opening the stitch window and switch the CaliMap/DepthMap image option for each camera to the calibration image mode. Depending on the arrangement of the cameras, different blend options need to be enabled. For example, in a Grid-style arrangement with four cameras, the first camera does not require the left or top blend options to be activated. The second camera needs the left blend option activated because it needs to be blended with the first camera on the left side. The third camera requires the top blend option activated, and the fourth camera needs both the left and top blend options activated. If the blend options are not enabled, the images will overlap instead of being blended .

Refer to the size of the blend area in the camera images shown in the figure above and adjust the pixel blend options to ensure that the horizontal fusion effect in the stitching window is consistent with the real-world 3D representation.

If there is inconsistency in the vertical fusion, you can activate the MoveDown option and adjust the pixel offset value in the respective camera’s parameters. For a grid arrangement of four cameras, only the MoveDown option needs to be enabled and adjusted for the third and fourth cameras.

Note: To avoid interference from the tracking view during the adjustment of the image stitching, you can temporarily disable the activation buttons for the TrackSet in the right sidebar.

5. Perform on-site measurement and installation based on the 3D positions of all cameras in the software. To accurately replicate the software simulation, start by determining the origin coordinates and establishing the entire coordinate system at the site. Use a laser rangefinder to determine the XZ-axis position of each camera, while the height value (Y-axis) can be referenced from the parameter for CenterDistance. Ensure that the cameras are horizontally placed along the X-axis and that the lenses are facing vertically downward. During the camera installation, you can enable the remote monitoring function and use your phone to view real-time X-axis and Z-axis rotation values, as well as the height of the camera above the ground (H – Center Distance parameter). Make sure to set the XZ rotation axis values to 0 and use the height value as a reference when installing other cameras.

 6. Once all the cameras are installed, activate them and switch the calibration map to depth map mode. You can then view the real-time fusion effect of the depth images through the 3D view.

Next, open the track monitor window by the OpenMonitor button and click the active button to enable the tracking function. If there is too much noise in the depth images, you can adjust the camera’s exposure and gain options to ensure image quality. Additionally, you can manipulate the denoise option to enhance tracking stability. However, a higher denoise value will consume more CPU resources.

Afterwards, walk into the fusion area and observe if the fusion effect appears normal. If the tracking area appears completely black when unoccupied, there is no need for an empty frame capture. However, if there are tall objects within the tracking area that should not be cut off by the low-cut option, an empty frame capture will be necessary. By clicking the EmFrameCap button, the empty frame image will be saved in the software. From that point on, once a person enters the tracking area, real-time tracking data will be calculated, and corresponding-sized tracking boxes will be displayed in the appropriate positions in 3D space. The color of these boxes can be changed by modifying the tracking color in the TrackSet tab on the right-hand side.

 The tracking threshold can be adjusted to control the contrast threshold for identifying the difference between the empty frame image and the real-time depth image. A smaller threshold value means that even areas with low contrast between the background and the real-time depth image will be tracked. The default value is 0.2. To avoid tracking small noise or unwanted elements, you can increase the TrackMinSize option. The TrackMaxSize option sets the maximum size of objects that can be tracked. If there are larger objects in your tracking area and you only want to track people, you can reduce this option to avoid tracking those larger moving objects.

The MaxMoveDistance (per frame) option allows you to adjust the pixel distance that determines whether two separate tracking objects are recognized based on their movement within one frame.

The DelNearby option allows you to remove tracking objects that are within the minimum distance of each other. If the tolerance for deletion region is set to 1, other incorrect tracks will still be retained. Reducing this tolerance value will remove tracks that are closer in size and have a more distinct size difference within the minimum distance.

The DelOverlap option removes overlapping tracking objects. When the overlap tolerance value is set to 1, only fully overlapping objects will be deleted. A smaller tolerance value will remove tracking objects with smaller degrees of overlap.

7. All tracking data can be output through OSC protocol, with XYZ representing the three-dimensional position of the tracking object, and WDH representing its width, length, and height. By default, if using the Messaging (UDP) protocol, the IP address in the Local Address field will be automatically cleared, and the default local IP address “” will be used. If you are using OSC IN in TD (TouchDesigner), make sure to leave the Local Address field blank. If you switch the protocol to MultiCastMessaging (UDP), ensure that the Network Address is set to

If using the PSN  protocol, only XYZ three-dimensional data will be outputted, with the Y-axis representing the height value. You can adjust the tracking height scale option to modify the tracking height value. For example, if you only want to track the chest position of a person, you can adjust this parameter to around 0.75. Different software may use different coordinate systems, and you can match different coordinate systems using the two options in the Left/Right Hand Coordinate System Switch.


8. In the SpaceSet, you can set the stage floor size and other options related to the UI elements.



Application direction:

Various performance venues. If you have any suggestions, you can comment below, if you have any questions, please contact us directly.

Leave a Reply