Abstract: Consider end-to-end training of a multi-modal vs. a uni-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or ...
Direct speech Reported speech can could will would must had to While some others don't: Direct speech Reported speech would would should should could could must must might might Pronouns can change in ...