To what extent do popular voice assistants understand and answer complex questions?

The key to ensuring a satisfactory user experience for speech technologies is the accuracy of comprehention, achieved through testing and training.
To deepen the test process on the response capabilities of speech technologies, Bespoken, an American company specialized in AI testing and training, tried to test the NLU (Natural Language Understanding) capabilities of the three most popular voice assistants on the market.
To establish the benchmark, nearly 1,000 questions were taken from a dataset and submitted to Amazon Alexa, Apple Siri, and Google Assistant. The results were good but not excellent: the questions were not complex and the assistants often handled them accurately, but the percentages of correct answers show that much still needs to be done.
The best was Google Assistant which had a percentage of correct answers of 73.19%, it follows Alexa with 55.38% of correct answers and in the last position is Siri with 43.17% of correct answers.
As you can see, even for the technology giants the voice-machine interaction is still far from perfect and the process will still require constant fine-tuning and updating depending on the areas and applications.
Training and tuning is the process of reducing errors: it means reviewing and retesting the model until it reaches an optimal level.
This is an ongoing process that is essential for creating good user experiences – we typically see significant reductions in errors using monitoring and cross-checking techniques that are consistently carried out over time.