Optimizer will contain:
1) Stats used for your query compilation (note that there is no shared global stats cache - so if 10 people are optimizing the same query - you get 10x the proc cache usage)
2) Work plans - all those possible plans (not discarded due to being too expensive) generated during optimization. Depends on query complexity - each plan might be only 10KB - but could easily be ~150KB (probably a good average) or much larger (for DSS queries)
Execution
1) The final query execution plan - figure 150KB on average
2) Execution meta data
Both have other things as well....big thing is the stats & plans.
Backup Server - not sure myself - suspect it needs to track which pages it has read from shared memory - and if so, suspect it uses a hash table - which if you are familiar with cache overhead, likely is fairly big.