What does Neural Network mean?
Before learning the industry use cases of neural network, it is good to know what is Neural Network? what does it mean?
First take a example of human brain. How human brain works. Human brain consists of 86 billion neurons. These 86 billion neurons are connected somehow and built a human neural network. It looks like the following image.
Basically, a neuron is just a node with many inputs and one output. A neural network consists of many interconnected neurons. In fact, it is a “simple” device that receives data at the input and provides a response. First, the neural network learns to correlate incoming and outcoming signals with each other — this is called learning. And then the neural network begins to work — it receives input data, generating output signals based on the accumulated knowledge.
When a child born, he/she doesn’t know what a car is. Because his/her brain neurons doesn’t know the definition or structure of a car. But when the child sees a car for the first time and someone says to the child that it is a car then the child’s brain neurons make layers of structure with different parts of the car like a car has four wheels, it is large, it has glass windows etc. And second time when the child see the another car, the previous layer of the child’s brain neurons say that it has also four wheels, a huge body, glass windows etc. Then the child’s brain neurons make a response that it is car and that time it stores this structure of car and functionality in the same layer or make a new layer of it. In this way , child brain make a model of car in the brain to predict or response better and faster for the next time.
This is just a example of car, but human brains observe everything, objects in it and make different layers of it.
We can’t say human neural network and artificial neural network same, but the working mechanism of both is almost same. Artificial neural networks (ANNs) are statistical models directly inspired by, and partially modeled on biological neural networks. They are capable of modeling and processing nonlinear relationships between inputs and outputs in parallel.
Artificial neural networks are characterized by containing adaptive weights along paths between neurons that can be tuned by a learning algorithm that learns from observed data in order to improve the model. In addition to the learning algorithm itself, one must choose an appropriate cost function.
Architecturally, an artificial neural network is modeled using layers of artificial neurons, or computational units able to receive input and apply an activation function along with a threshold to determine if messages are passed along.
In a simple model, the first layer is the input layer, followed by one hidden layer, and lastly by an output layer. Each layer can contain one or more neurons.
Models can become increasingly complex, and with increased abstraction and problem solving capabilities by increasing the number of hidden layers, the number of neurons in any given layer, and/or the number of paths between neurons. Note that an increased chance of overfitting can also occur with increased model complexity.
Nowadays who doesn’t know TESLA. Tesla develops and deploys autonomy at scale. An approach based on advanced AI for vision and planning, supported by efficient use of inference hardware is the only way to achieve a general solution to full self-driving.
Neural Network is the major thing that Tesla is using to build proper model of Self-driving car’s brain. Between the vehicles, the lane lines, the road curbs, the crosswalks, and all the other specific environmental variables, Tesla has a lot of work to do. In fact, they must run at least 50 neural networks simultaneously to make it work. That’s just not possible on standard computers.
Tesla uses a specific architecture called HydraNets, where the backbone is shared.
Similar to transfer learning, where you have a common block and train specific blocks for specific related tasks, HydraNets have backbones trained on all objects, and heads trained on specific tasks. This improves the inference speed as well as the training speed.
The neural networks are trained using PyTorch, a deep learning framework you might be familiar with.
- Each image, of dimension (1280, 960, 3), is passed through this specific neural network.
- The backbone is a modified ResNet 50 — The specific modification is the use of “Dilated Convolutions”.
- The heads are based on semantic segmentation — FPN/DeepLab/UNet architectures. However, it doesn’t seem to be the “end task” as the conversion between 2D pixels and 3D is prone to errors.
I teach both these concepts in my course IMAGE SEGMENTATION: Advanced Techniques for aspiring Computer Vision experts. I designed this course for everyone who knows how backpropagation works — that’s the only requirement, along with beginner-level Python.
Something else Tesla uses is Bird’s Eye View
Sometimes the results of a neural network must be interpreted in 3D. The Bird’s Eye View can help estimate distances and provide a much better and more real understanding of the world.
Some tasks run on multiple cameras. For example, Depth estimation is something we generally do on stereo cameras. Having 2 cameras helps estimate distances better. Tesla is doing this using neural networks with a regression on the depth.
Using this stereo vision and sensor fusion, Tesla doesn’t need LiDAR. They can do distance estimation based on these 2 cameras alone. The only trick is that the cameras don’t use the same lenses: on the right, further distances appear much closer.
Tesla also has recurrent tasks such as road layout estimation. The idea is similar: multiple neural networks run separately, and another neural network is making the connection.
Optionally, this neural network can be recurrent so that it involves time.
👉 Tesla’s main problem is that it uses 8 cameras, 16 time steps (recurrent architecture), and a batch size of 32.
It means that for every forward pass, 4096 images are processed. I don’t know about you, but my MacBook Pro could never support this. In fact, a GPU couldn’t do it — not even 2 GPUs!
To solve this problem, Tesla is betting big on the HydraNet architecture. Every camera is processed through a single neural network. Then everything is combined into the middle neural network. The amazing thing is that every task requires only a few parts of this gigantic network.
For example, object detection can require only the front camera, the front backbone, and a second camera. Not everything is processed identically.
Apply cutting-edge research to train deep neural networks on problems ranging from perception to control. The per-camera networks analyze raw images to perform semantic segmentation, object detection and monocular depth estimation. The birds-eye-view networks take video from all cameras to output the road layout, static infrastructure and 3D objects directly in the top-down view. The networks learn from the most complicated and diverse scenarios in the world, iteratively sourced from our fleet of nearly 1M vehicles in real time. A full build of Autopilot neural networks involves 48 networks that take 70,000 GPU hours to train 🔥. Together, they output 1,000 distinct tensors (predictions) at each timestep.
I hope that my story helped you consider how the TESLA’s AI model is dealing with neural network. If you like it please share.