Mixture of Experts
“A neural module can also be called an Expert“. A neural module or expert realizes a function.
Then a Gather assigns credit to each expert (how much a neural module is reliable), result in the final equation:
Where:
- is the credit assigned from the Gather
- The feature region may be partitioned and each partition given to a different expert, usually when used this approach the gather will give a credit of to only one expert at a time (the one that knows about the current region) and all the the other credits will be equal to .
- Overlapping regions: Each expert will express a “likelihood” of being competent over any input , the gather will assign credits according to a pdf under the condition that and imposed during both training and test.
- Instead of training each expert separately, we can train the whole model including the Gather, which can learn automatically the values of all credits ()
Original Files
