Multilingual Machine Translation has a considerable impact on our everyday communication, and many online free automatic tools provide competitive translations. However, there are many challenges in the race towards achieving high-quality and non-biased translations for all languages. In this talk, we will introduce our proposed multilingual spoken machine translation architecture based on language-specific encoders/decoders, which allows incrementally adding new languages and zero-shot spoken translation. We evaluate this architecture in terms of translation quality and gender bias accuracy.