Deep-learning based single-channel speech separation has been studied with great success, though evaluations have typically been limited to relatively controlled environments based on clean, near-field, and read speech. This work investigates the robustness of such representative techniques in more realistic environments with multiple and diverse conditions. To this end, we first construct datasets from the Mixer 6 and CHiME-5 corpora, featuring studio interviews and dinner parties respectively, using a procedure carefully designed to generate desirable synthetic overlap data sufficient for evaluation as well as for training deep learning models. Using these new datasets, we demonstrate the substantial shortcomings in mismatched conditions of these separation techniques. Though multi-condition training greatly mitigated the performance degradation in near-field conditions, one of the important findings is that both matched and multi-condition training have significant gaps from the oracle performance in far-field conditions, which advocates a need for extending existing separation techniques to deal with far-field/highly-reverberant speech mixtures.