I could say [run 'update index stats' on each table] and [make sure you have statistics on every referenced column in each referenced table], but I'll skip that and guesstimate that the high volume of blanks (63.25% of table) has left the A.ref1 column with a good bit of skew in the density stats (total density >> range cell density).
Without any SARGs the optimizer has to make some general guesses about how many matching rows it'll find in each table; so with a large skew in density stats the optimizer assumes that for each row in B it will find a large number of matches in A (total density * #rows), with the total IO cost for reading A (by index) being more expensive than just doing a table scan.
I'd suggest you try running [sp_modifystats 'A','ref1','REMOVE_SKEW_FROM_DENSITY'] to set the total density = range cell density (this should cause the optimizer to think it'll find fewer rows in A for each join from B; 'course, if you have a lot of queries that join on A.ref1=<space> then removing the skew could lead the optimizer to run a lot of expensive index lookups against A.ref1). Keep in mind that the density skew will be reset after any 'update stats' command that processes column ref1, ie, you'll need to run sp_modifystats/REMOVE_SKEW_FROM_DENSITY after each 'update stats'.
NOTE: Your old workaround of using stats from a table with no <space> values should lead to range cell density being (roughly) equal to total density ... which is what sp_modifystats/REMOVE_SKEW_FROM_DENSITY does for you. And yes, you could use this in 12.5 to see if you get the same benefit.
As for why the optimizer chose the join order of B->A instead of A->B ... *shrug*. The optimizer should know that B only has enough data for 1 page so why not join A->B? For 1 row in B the performance should be the same regardless of join order, but for more than 1 row in B (but still only 1 data page) the performance of A->B should be better as you only table scan A once.
If you can modify the code another option you might want to try is enabling allrows_mix/allrows_dss; for these types of joins with highly skewed stats you may find that a merge/hash join is faster (though for largish tables the overhead for worktables may be prohibitive).