Mixture of Experts

A neural module can also be called an Expert“. A neural module or expert realizes a function.

Then a Gather assigns credit to each expert (how much a neural module is reliable), result in the final equation:

Where:

  • is the credit assigned from the Gather
  1. The feature region may be partitioned and each partition given to a different expert, usually when used this approach the gather will give a credit of to only one expert at a time (the one that knows about the current region) and all the the other credits will be equal to .
  2. Overlapping regions: Each expert will express a “likelihood” of being competent over any input , the gather will assign credits according to a pdf under the condition that and imposed during both training and test.
  3. Instead of training each expert separately, we can train the whole model including the Gather, which can learn automatically the values of all credits ()

Original Files