The estimation of an accurate noise model is a crucial problem for model-based noise suppression including a vector Taylor series (VTS)-based approach. The variation of the speaker characteristics is also a crucial factor as regards the model-based noise suppression. As a result, a speaker adaptation technique plays an important role in the model-based noise suppression. To deal with former problem, we have already proposed an unsupervised estimation method for a noise mixture model. Therefore, this paper proposes a joint processing method that simultaneously achieves speaker adaptation and noise mixture model estimation. This joint processing is realized by using minimum mean squared error (MMSE) estimates of clean speech and noise. Although VTS-based approach involves nonlinear transformation, the MMSE estimates make it possible to flexibly estimate accurate parameters for the joint processing without the influences of non-linear VTS transformation. In the evaluation, the proposed method provided an improvement compared with results obtained using only noise mixture model estimation.