Not small queries then :-) It would take some time to investigate (which I haven't got unfortunately!)
I'd suggest running with set option show_missing_stats on and TF 3604 and see if you might need some stats on some of the join/filter predicates. Other than that I would persist with the deferred parallel as an option, but like I mentioned you will need to enable parallelism at the server level also.
To disable the reformatting strategies which might be hurting you, run it with an abstract plan hint as below
plan '(use store_index off)'
BTW, if you run under compatibility mode it won't give you a 125x parallel plan, full compatiblity mode will only give you serial execution plans.