RebuilderAI_Blog

[Tech] Why do we move from 2D to 3D? 본문

Technique

[Tech] Why do we move from 2D to 3D?

다육 2022. 9. 13. 10:15

by Sanghyeon An (AI research engineer / R&D)

 

The advantages and disadvantages of 2D and 3D, and our direction.

 

우리는 왜 2D에서 3D로 이동하는가 | RebuilderAI

Difference of 2D and 3D

RebuilderAI.github.io

 

More available information is here: Home | RebuilderAI

 

Home | RebuilderAI

리빌더에이아이 테크 블로그

RebuilderAI.github.io


Table of Contents

1. Strengths of 2D
2. Weakness of 2D
3. Advantages of 3D
4. Disadvantages of 3D
5. Our direction

Summary

We are in the process of moving on to 3D since we feel the limitations in 2D space as we study Salient Object Detection (SOD) tasks in 2D. The limitations in 2D we felt in the previous study are as follows.

 

1. Consistency with the flow of the frame cannot be guaranteed.

 

2. Cannot estimate mask properly if enlarged.

 

These two limits are a major impediment to our service, so these issues must be solved. After a lot of thought, we realized that we could alleviate or solve these problems if we use 3D knowledge. However, the work in 3D space had its own drawbacks. 

 

Therefore, we thought that we could solve this problem if we fused 2D and 3D to study each domain in a way that compensates for each other's drawbacks, and we are currently in the process of development.

 


1. Strengths of 2D

(1) All you need is an image. 

A brief structure of the 2D SOD

You can see the detected results if you only have an image of input. 

 

(2) There are many related studies, and their performance is high. 

Image-related SOD studies are a field that has been actively studied since before and now, and we can see high-performance results if the data is well prepared. 


2. Weaknesses of 2D

(1) It cannot detect the enlarged object, and the consistency with the flow of the frame gets low. 

Results of 2D SOD
Results of 2D SOD

The photo is the SOD result from the 2D data we are currently applying. In the case of the existing SOD model, they detect the most prominent area within an image, and if we expand like the picture, it can't detect the most prominent area or object within it, so it shows the above results. 

We want the video to consistently detect the region even if the position of the object changes as it flows continuously. However, the above results show that it is not possible now. This is a limitation of the algorithm, and we reinforce these limitations by attaching revised networks to existing models, but you can see that they are not perfect. In this case, the larger the change to referring to the previous mask, the worse the performance will be.

 


3. Advantages of 3D

(1) The detection is possible even if an object is enlarged or moved. 

In general, we get closer to bring out details for the point cloud reconstruction by taking a video. It creates a point cloud by taking all the information, and segmentation in the built 3D point cloud ensures consistency even when detected in the camera position of the enlarged frame. 

The above result is a segmentation of the point cloud based on the video. 

 


4. Disadvantages of 3D

(1) Point cloud should be able to come out properly. 

This is also a precondition for performing this task. Before proceeding the task, segmentation in 3D can be performed when a point cloud is established. However, if there are few or no features, the point cloud itself may not be properly formed and it can be a problem with detection. 

 

(2) The size of the data is large.

The number of concurrent processes can be very few because the size of the data is large. Therefore, we usually extract representative points first and calculate, which results in poor performance due to loss of information.

 


5. Our direction

We'd like to tap into both domains to minimize the disadvantages while taking advantage of both. 

There is BPNet for reference architecture and previous paper which can be shown in the picture as follows. 

When we apply these techniques, performance has been improved over only using 2D or 3D, and it is expected that a more detailed and neater model will be possible if the model is developed and applied to the service. 

 


Reference

1. paperswithcode.com
2. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation https://openaccess.thecvf.com/content_cvpr_2017/papers/Qi_PointNet_Deep_Learning_CVPR_2017_paper.pdf
3. Bidirectional Projection Network for Cross Dimension Scene Understanding https://wbhu.github.io/projects/BPNet/