The Object Detection's view
The Object Detection has to find all relevant information from the images taken from the world viewer. Relevant is where we are in relation to the track, the opponents, the obstacles and what additional information we have to know (traffic signs, crossings etc.). Also we need estimations about properties of the surface (asphalt, sand, grass etc.). To make it easier to find the objects it has to use feedback from the next step, the Object Tracking. (See Details and proposals for object detection)
View of the Object Tracking
It is to help the Object Detection to find known objects in the next time step faster. Here we will have the possibility to use information of our driver about the steering, acceleration, braking since the last (vision) timestep. So we can setup the detection with estimations of our translation and rotation relative to the last images. Also we have to decide, whether our car will get additional information from other sensors (compass, GPS, radar, parking distance sensors etc.).
Vision Hard Problems
OCR Is Google's Tesseract  supersetted by Intel's OpenCV Library? (Aleks)
What is the roadmap for how we will use more and more of the Intel logic? (Aleks)
- For M1 demo, how do we focus it on only finding cars in front of it? Just getting the pipeline running will be work. We will need to configure the right brightness constants so it knows what it is looking at, for example...
- What about doing training and object detection? Their website says we can hand it lots of pictures of cars, and non-cars and it will start to figure things out. How is that data stored? Can we tweak it?
- If we know the size of the car, you could use that to estimate distance without having 3-D images to deal with. (We should sketch out 3-D on the roadmap...)
- I think we should do blob first, and then attack objects. It might be that blobs are useless, and object recognition is much better. Fine, then blobs will only be used for a few days ;-) We should be working our way up.
- Radar is another type of input to a visual system. It has mostly less data than a 3-D image, but we could still build filters which tweak it and call the Intel code in the different way to get it to make sense. For now, let's focus only on 2-d and 3-d images, and we can do radar, sonar, etc. later.
- If we have both visual and radar, how do we build a system which synthesizes the best of both?
Papers and other references
Have a look at the video "Cognitive Loop Jul 2006". It already shows much of what would be needed, to use in the real world what we could use very easy too.