I'd suggest monitoring the select/into (sp_sysmon while query is running, monProcessWaits, monProcessActivity, etc) for potential performance bottlenecks. Where is most of the 40 mins being used => concentrate on the biggest time consumer(s).
Worst case scenario ... majority of your time spent on disk IOs: 40 mins spread over 339K physical reads comes out to ~7ms (avg) service time per physical IO. I'm guessing there's some physical writes in there somewhere, too (monProcessActivity).
Does sp_sysmon show the server maxing out its outstanding disk IOs? if so, bumping up 'disk i/o structures' and the 'async i/o' settings may help.
If you're not maxing out your physical IO settings, can you ramp up the volume of physical IOs by implementing parallel reads/query-processing? [See Chapter 5: Parallel Query Processing for more details on parallel query processing, including some notes about select/into + parallel query processing.]
Is your large IO pool (or its associated APF) too small? if so, bumping up the large IO pool size and/or the APF percent may help.
Are you reading-from/writing-to the same device? if so, what happens if you read from one device and write to another device? [Granted, this gets pretty murky real quick if you're using a SAN or some sort of logical volume manager that obfuscates the phsyical disk activity; may need to get SAN admins involved to see if there's a way to speed up your concurrent reads and writes.]