Although IEEE 754-2008 binary128 (with a 15-bit exponent and 113-bit significand, i.e., quadruple-precision) is not currently implemented on x86 in hardware, software emulation is available on some compilers. However, the performance is significantly slower compared to the binary64 operation, which is supported natively in hardware. This study proposes a fast implementation of matrix multiplication on matrices stored in the binary128 format on x86 CPUs. The proposed implementation utilizes the Ozaki scheme, which is an accurate matrix multiplication algorithm proposed by Ozaki et al. in 2012. This scheme enables one to perform most computations using the binary64 matrix multiplication (the DGEMM routine in Basic Linear Algebra Subprograms (BLAS)); it can exploit the high-performance of highly-optimized vendor BLAS. Although the achievable performance depends on the input matrices (the inner-product dimension, the absolute range, and the significand bit length), the proposed implementation can achieve better performance and accuracy compared to naive matrix multiplication performed using the GCC's binary128 emulation in many cases. In addition, we discuss GPU acceleration, performance on reduced precision inputs, an implementation based on binary32 matrix multiplication (SGEMM), application to memory-intensive operations, and the possibility of a distributed parallel implementation.