Performance of machine learning approaches is strongly influenced by choice of misfit penalty, and correct settings of penalty parameters, such as the threshold of the Huber function. These parameter are typically chosen using expert knowledge, cross-validation, or black-box optimization, which are time consuming for large-scale applications. We present a data-driven approach that simultaneously solves inference problems and learns error structure and penalty parameters. We discuss theoretical properties of these joint problems, and present algorithms for their solution. We show numerical examples from the piecewise linear-quadratic (PLQ) family of penalties.