Bjørn-Helge Mevik

Chief Engineer - Scientific Computing Services

Norwegian version of this page

Email b.h.mevik@usit.uio.no

Phone +47 22840631

Username

Visiting address Gaustadalléen 23 A Kristen Nygaards hus 0373 OSLO

Postal address Postboks 1059 Blindern 0316 OSLO

Download business card

Work areas

Titan HPC cluster (queue system, system software, administration)
The Bioportal (development, administration)
Statistics support

Background

PhD (dr.scient) in Statistics, MSc (cand.scient) in Mathematics

Page, Christian; Baranzini, Sergio E.; Mevik, Bjørn-Helge; Bos, Steffan Daniel; Harbo, Hanne Flinstad & Kulle, Bettina (2015). Assessing the Power of Exome Chips. PLOS ONE. ISSN 1932-6203. 10(10). doi: 10.1371/journal.pone.0139642. Full text in Research Archive
Lind, Marianne; Simonsen, Hanne Gram; Hansen, Pernille; Holm, Elisabeth & Mevik, Bjørn-Helge (2015). Norwegian Words: A lexical database for clinicians and researchers. Clinical Linguistics & Phonetics. ISSN 0269-9206. 29(4), p. 276–290. doi: 10.3109/02699206.2014.999952.
Simonsen, Hanne Gram; Lind, Marianne; Hansen, Pernille; Holm, Elisabeth & Mevik, Bjørn-Helge (2013). Imageability of Norwegian nouns, verbs and adjectives in a cross-linguistic perspective. Clinical Linguistics & Phonetics. ISSN 0269-9206. 27(6-7), p. 435–446. doi: 10.3109/02699206.2012.752527.
Lind, Marianne; Simonsen, Hanne Gram; Hansen, Pernille; Holm, Elisabeth & Mevik, Bjørn-Helge (2013). "Ordforrådet" - en leksikalsk database over et utvalg norske ord. Norsk tidsskrift for logopedi. ISSN 0332-7256. 59(1), p. 18–26.
Liland, Kristian Hovde; Almøy, Trygve & Mevik, Bjørn-Helge (2011). Optimal Baseline Correction for Multivariate Calibration Using Open-Source Software. American Laboratory. ISSN 0044-7749. 43(4), p. 13–16. doi: 10.1366/000370210792434350.
Næs, Tormod; Tomic, Oliver; Mevik, Bjørn-Helge & Martens, Harald (2011). Path modelling by sequential PLS regression. Journal of Chemometrics. ISSN 0886-9383. 25(1), p. 28–40. doi: 10.1002/cem.1357.
Kumar, Surendra; Carlsen, Tor; Mevik, Bjørn-Helge; Enger, Pål; Blaalid, Rakel & Shalchian-Tabrizi, Kamran [Show all 7 contributors for this article] (2011). CLOTU: An online pipeline for processing and clustering of 454 amplicon reads into OTUs followed by taxonomic annotation. BMC Bioinformatics. ISSN 1471-2105. 12. doi: 10.1186/1471-2105-12-182. Full text in Research Archive Show summary
Background:The implementation of high throughput sequencing for exploring biodiversity poses high demands on bioinformatics applications for automated data processing. Here we introduce CLOTU, an online and open access pipeline for processing 454 amplicon reads. CLOTU has been constructed to be highly user-friendly and flexible, since different types of analyses are needed for different datasets. Results:In CLOTU, the user can filter out low quality sequences, trim tags, primers, adaptors, perform clustering of sequence reads, and run BLAST against NCBInr or a customized database in a high performance computing environment. The resulting data may be browsed in a user-friendly manner and easily forwarded to downstream analyses. Although CLOTU is specifically designed for analyzing 454 amplicon reads, other types of DNA sequence data can also be processed. A fungal ITS sequence dataset generated by 454 sequencing of environmental samples is used to demonstrate the utility of CLOTU. Conclusions:CLOTU is a flexible and easy to use bioinformatics pipeline that includes different options for filtering, trimming, clustering and taxonomic annotation of high throughput sequence reads. Some of these options are not included in comparable pipelines. CLOTU is implemented in a Linux computer cluster and is freely accessible to academic users through the Bioportal web-based bioinformatics service (http://www.bioportal.uio.no).
Liland, Kristian Hovde; Almøy, Trygve & Mevik, Bjørn-Helge (2010). Optimal Choice of Baseline Correction for Multivariate Calibration of Spectra. Applied Spectroscopy. ISSN 0003-7028. 64(9), p. 1007–1016. doi: 10.1366/000370210792434350. Show summary
Baselines are often chosen by visual inspection of their effect on selected spectra. A more objective procedure for choosing baseline correction algorithms and their parameter values for use in statistical analysis is presented. When the goal of the baseline correction is spectra with a pleasing appearance, visual inspection can be a satisfactory approach. If the spectra are to be used in a statistical analysis, objectivity and reproducibility are essential for good prediction. Variations in baselines from dataset to dataset means we have no guarantee that the best-performing algorithm from one analysis will be the best when applied to a new dataset. This paper focuses on choosing baseline correction algorithms and optimizing their parameter values based on the performance of the quality measure from the given analysis. Results presented in this paper illustrate the potential benefits of the optimization and points out some of the possible pitfalls of baseline correction.
Liland, Kristian Hovde; Mevik, Bjørn-Helge; Rukke, Elling-Olav; Almøy, Trygve & Isaksson, Tomas (2009). Quantitative whole spectrum analysis with MALDI-TOF MS, Part II: Determining the concentration of milk in mixtures. Chemometrics and Intelligent Laboratory Systems. ISSN 0169-7439. 99(1), p. 39–48. doi: 10.1016/j.chemolab.2009.07.008. Show summary
The work summarised in this paper presents the second part of a two-paper series on quantitative whole spectrum analysis with MALDI-TOF MS on skimmed milk. In Part I experiments were carried out to search for optimal sample preparation and instrumental settings in terms of signal-to-noise ratios and repeatability. The results were utilised in the present study when trying to predict concentrations of cow, goat and ewe milk in mixed milk samples. Partial least squares regression was combined with suitable pre- and post-processing of spectra and concentration responses. A plotting method was used where predictions are visualised as a mixture design. The objective was to show that MALDI-TOF MS had potential for being used in quantitative analysis without involving peak comparison or other types of expert guided research. Predictions of a validation data set gave promising results with the best RMSEP values ranging from 5.4% (w/w) to 6.5% (w/w), for the different milk types used, and corresponding Rpred2 values ranging from 94.5% to 96.2%. This indicates that MALDI-TOF is suf¿ciently accurate and repeatable to be used in practical application for quantitative analysis. Three variable selection strategies based on visual inspections and regression modelling were also evaluated. These were all outperformed, with regard to prediction error, by the use of whole spectra and multivariate regression. The results indicate that multivariate regression on whole spectra can be far more effective than using a few selected variables.
Liland, Kristian Hovde; Mevik, Bjørn-Helge; Rukke, Elling-Olav; Almøy, Trygve; Skaugen, Morten & Isaksson, Tomas (2009). Quantitative whole spectrum analysis with MALDI-TOF MS, Part I: Measurement optimisation. Chemometrics and Intelligent Laboratory Systems. ISSN 0169-7439. 96(2), p. 210–218. doi: 10.1016/j.chemolab.2009.02.003. Show summary
The work summarised in this paper presents the first part of a two-paper series on quantitative whole spectrum analysis with MALDI-TOF MS on skimmed milk. To create a reliable foundation for multivariate, quantitative analysis, optimising measurements (Part I) with respect to well chosen criteria is of high importance. This paper describes the whole process from sample acquisition to final mass spectra using skimmed milk from cow, goat and ewe. Experimental design is employed to search for optimal preparation of the milk samples and optimal MALDI-TOF instrumental settings according to well defined quality measures. The results suggest that optimal parameter settings in a quantitative context may differ from the standard settings used to obtain high resolution spectra in traditional mass spectrometry. Improvements from the worst plausible case to optimised factor levels indicate a 6.96 times higher signal-to-noise ratio and 0.63 times lower inter spectra variation (non-repeatability). These results will be the basis for the main experiment with application to quantitative determination of adulterated skimmed milk (Part II). (c) 2009 Elsevier B.V. All rights reserved.
Kumar, Surendra; Skjæveland, Åsmund; Orr, Russell; Enger, Pål; Ruden, Torgeir Andersen & Mevik, Bjørn-Helge [Show all 8 contributors for this article] (2009). AIR: A batch-oriented web program package for construction of supermatrices ready for phylogenomic analyses. BMC Bioinformatics. ISSN 1471-2105. 10. doi: 10.1186/1471-2105-10-357. Full text in Research Archive Show summary
Background: Large multigene sequence alignments have over recent years been increasingly employed for phylogenomic reconstruction of the eukaryote tree of life. Such supermatrices of sequence data are preferred over single gene alignments as they contain vastly more information about ancient sequence characteristics, and are thus more suitable for resolving deeply diverging relationships. However, as alignments are expanded, increasingly numbers of sites with misleading phylogenetic information are also added. Therefore, a major goal in phylogenomic analyses is to maximize the ratio of information to noise; this can be achieved by the reduction of fast evolving sites. Results: Here we present a batch-oriented web-based program package, named AIR that allows 1) transformation of several single genes to one multigene alignment, 2) identification of evolutionary rates in multigene alignments and 3) removal of fast evolving sites. These three processes can be done with the programs AIR-Appender, AIR-Identifier, and AIR-Remover (AIR), which can be used independently or in a semi-automated pipeline. AIR produces user-friendly output files with filtered and non-filtered alignments where residues are colored according to their evolutionary rates. Other bioinformatics applications linked to the AIR package are available at the Bioportal (www.bioportal.uio.no), University of Oslo; together these greatly improve the flexibility, efficiency and quality of phylogenomic analyses. Conclusions: The AIR program package allows for efficient creation of multigene alignments and better assessment of evolutionary rates in sequence alignments. Removing fast evolving sites with the AIR programs has been employed in several recent phylogenomic analyses resulting in improved phylogenetic resolution and increased statistical support for branching patterns among the early diverging eukaryotes.
Måge, Ingrid; Mevik, Bjørn-Helge & Næs, Tormod (2008). Regression models with process variables and parallel blocks of raw material measurements. Journal of Chemometrics. ISSN 0886-9383. 22(07.aug), p. 443–456. Show summary
The topic of this paper is regression models based on designed experiments, where additional spectroscopic measurements are also available. This particular case describes a situation with two spectral blocks with no natural order: The blocks are parallel. Three methods are described, which combine least squares regression of the design variables with PCA or PLS on the spectra. The methods properties are explored in two simulation studies based on real experiments. The results show that the methods are equal when it comes to prediction, but interpretability varies. One of the methods, LS-ParPLS, is especially interesting when it comes to interpretability because it splits the spectral information into two parts; information that is common in both blocks and information that is unique for each block. Copyright (C) 2008 John Wiley & Sons, Ltd.
Berget, Ingunn; Mevik, Bjørn-Helge & Næs, Tormod (2008). New modifications and applications of fuzzy C means methodology. Computational Statistics & Data Analysis. ISSN 0167-9473. 52(5), p. 2403–2418. doi: 10.1016/j.csda.2007.10.020. Show summary
The fuzzy C-means (FCM) algorithm and various modifications of it with focus on practical applications in both industry and science are discussed. The general methodology is presented, as well as some well known and also some less known modifications. It is demonstrated that the simple structure of the FCM algorithm allows for cluster analysis with non-typical and implicitly defined distance measures. Examples are residual distance for regression purposes, prediction sorting and penalised clustering criteria. Specialised applications of fuzzy clustering to be used for a sequential clustering strategy and for semi-supervised clustering are also discussed.
Martens, Harald; Kohler, Achim; Afseth, Nils Kristian; Wold, Jens-Petter; Hersleth, Margrethe & Berget, Ingunn [Show all 19 contributors for this article] (2007). High-throughput Measurements for Functional Genomics of Milk. Journal of Animal and Feed Sciences. ISSN 1230-1388. 16(1). Show summary
Recent developments in analytical technology have simplified a detailed characterization of milk and milk-based samples. A range of powerful new instrumentation types have recently been installed at various institutes at Campus Ås (Norway). At the campus we have recently implemented efficient, multi-channel instrumentation for genomics, transcriptomics, proteomics, biospectroscopy, metabolomics and various quality assessments. The present paper gives an informal outline of various modern analytical tools for characterization of various milk and milk-based samples.
Jørgensen, Kjetil; Mevik, Bjørn-Helge & Næs, Tormod (2007). Combining designed experiments with several blocks of spectroscopic data. Chemometrics and Intelligent Laboratory Systems. ISSN 0169-7439. 88(2). Show summary
Two varieties of a new method of analysis are introduced and tested against a standard PLS method. The method is developed for situations where experimental design is combined with spectroscopic measurements at several points in an industrial process. The method is a combination of ordinary least squares (OLS) and PLS and is called LS-PLS. The purpose of the methodology is to extract relevant information from spectroscopic measurements without having to calibrate the instrument against external reference measurements first. We compare the LS-PLS variants with a straight forward PLS modeling of all the available variables and demonstrate that the LS-PLS method give results that are easier to interpret, The methods are compared by computer simulations based on real data from a cheese making experiment. (c) 2007 Elsevier B.V. All rights reserved.
Mevik, Bjørn-Helge & Wehrens, Ron (2007). The pls package: Principal component and partial least squares regression in R. Journal of Statistical Software. ISSN 1548-7660. 18(2). Show summary
The pls package implements principal component regression ( PCR) and partial least squares regression ( PLSR) in R ( R Development Core Team 2006b), and is freely available from the Comprehensive R Archive Network ( CRAN), licensed under the GNU General Public License ( GPL). The user interface is modelled after the traditional formula interface, as exemplified by 1m. This was done so that people used to R would not have to learn yet another interface, and also because we believe the formula interface is a good way of working interactively with models. It thus has methods for generic functions like predict, update and coef. It also has more specialised functions like scores, loadings and RMSEP, and a flexible cross-validation system. Visual inspection and assessment is important in chemometrics, and the pls package has a number of plot functions for plotting scores, loadings, predictions, coefficients and RMSEP estimates. The package implements PCR and several algorithms for PLSR. The design is modular, so that it should be easy to use the underlying algorithms in other functions. It is our hope that the package will serve well both for interactive data analysis and as a building block for other functions or packages using PLSR or PCR. We will here describe the package and how it is used for data analysis, as well as how it can be used as a part of other packages. Also included is a section about formulas and data frames, for people not used to the R modelling idioms.
Saiz-Abajo, Maria-Jose; Mevik, Bjørn-Helge; Segtnan, Vegard & Næs, Tormod (2005). Ensemble methods and data argumentation by noise addition applied to the analysis of spectroscopic data. Analytica Chimica Acta. ISSN 0003-2670. 533(2), p. 147–159. doi: 10.1016/j.aca.2004.10.086. Show summary
Near-infrared spectroscopy has gained great acceptance in the industry due to its multiple applications and versatility. Sometimes, however, the construction of accurate and robust calibration models involves the collection of a large number of samples with related reference analysis that can complicate and prolong the calibration stage. In this paper, ensemble methods and data augmentation by noise simulation have been applied to spectroscopic data in combination with PLSR to obtain robust models able to handle different types of perturbations likely to affect NIR data. Several types of noise have been investigated as well as different ensemble methods focused on obtaining robust PLS models able to predict both the original and the perturbed test data. The suitability of ensemble methods to perform robust calibration models has been investigated and compared to extended multiplicative signal correction (EMSC) and other calibration approaches in a real case of temperature compensation. Extended multiplicative signal correction (EMSC) and ensemble methods seem to be the most appropriate methods yielding the best results in terms of accuracy and prediction ability with a reduced calibration data set.
Segtnan, Vegard; Mevik, Bjørn-Helge; Isaksson, Tomas & Næs, Tormod (2005). Low-cost approaches to robust temperature compensation in near-infrared calibration and prediction situations. Applied Spectroscopy. ISSN 0003-7028. 59(6), p. 816–825. doi: 10.1366/0003702054280586. Show summary
The traditional way of handling temperature shifts and other perturbations in calibration situations is to incorporate the non-relevant spectral variation in the calibration set by measuring the samples at various conditions. The present paper proposes two low-cost approaches based on simulation and prior knowledge about the perturbations, and these are compared to traditional methods. The first approach is based on augmentation of the calibration matrix through adding simulated noise on the spectra. The second approach is a correction method that removes the non-relevant variation from new spectra. Neither method demands exact knowledge of the perturbation levels. Using the augmentation method it was found that a few, in this case four, selected samples run under different conditions gave approximately the same robustness as running all the calibration samples under different conditions. For the carbohydrate data set, all robustification methods investigated worked well, including the use of pure water spectra for temperature compensation. For the more complex meat data set, only the augmentation method gave comparable results to the full global model.
Berget, Ingunn; Mevik, Bjørn-Helge; Vebø, Heidi & Næs, Tormod (2005). A strategy for finding relevant clusters; with an application to microarray data. Journal of Chemometrics. ISSN 0886-9383. 19(9), p. 482–491. doi: 10.1002/cem.954. Show summary
Cluster analysis is a helpful tool for explorative data analysis of large and complex data. Most clustering methods will, however, find clusters also in random data. An important aspect of cluster analysis is therefore to distinguish real and artificial clusters, as this will make interpretation of the clusters easier. In some cases, certain types of clusters are more interesting than others. When working with gene expression data, examples of such clusters are gene clusters with high between sample variability, and clusters with a certain expression profile. Here we present a strategy with the ability to search for such clusters. The clustering is done sequentially. For each sequence, the data is separated into "interesting" and "rest" using the fuzzy c-means algorithm with noise clustering. The interesting cluster is defined by adding a penalty function to the usual clustering criterion. The penalty function is constructed in such a way that clusters without the interesting properties are given a high penalty. The strategy is presented in a general frame, and can be adjusted by defining different criteria for each type of cluster that is of interest. The methodology is presented in the context of microarray data but can be used for any type of data where cluster analysis may be a helpful tool. The methodology is illustrated with simulated data and microarray data.
Mevik, Bjørn-Helge & Cederkvist, Henrik René (2004). Mean squared error of prediction (MSEP) estimates for principal component regression (PCR) and partial least squaresregression (PLSR). Journal of Chemometrics. ISSN 0886-9383. 18, p. 422–429. Show summary
This paper presents results from simulations based on real data, comparing several competing mean squared error of prediction (MSEP) estimators on principal component regression (PCR) and partial least squares regression (PLSR): leave-one-out cross-validation, K-fold and adjusted K-fold crossvalidation, the ordinary bootstrap estimate, the bootstrap smoothed cross-validation (BCV) estimate and the 0.632 bootstrap estimate. The overall performance of the estimators is compared in terms of their bias, variance and squared error. The results indicate that the 0.632 estimate and leave-one-out cross-validation are preferable when one can afford the computation. Otherwise adjusted 5- or 10-fold cross-validation are good candidates because of theircomputational efficiency.
Dingstad, Bjørg; Egelandsdal, Bjørg; Mevik, Bjørn-Helge & Færgestad, Ellen (2004). Modelling and optimization of quality and costs on empirical data of h. ?. 37(5), p. 527–538.
Dingstad, Gunvor; Egelandsdal, Bjørg; Mevik, Bjørn-Helge & Færgestad, Ellen Mosleth (2004). Modelling and optimization of quality and costs on empirical data of hearth bread. Lebensmittel-Wissenschaft + Technologie. ISSN 0023-6438. 37, p. 527–538. doi: 10.1016/j.lwt.2003.12.002. Show summary
Mathematical programming is an optimization technique, which can be used to simultaneously ensure several qualities of a product (cost, chemical composition, product characteristics, etc.). This requires mathematical models for the qualities, and especially different characteristics of a product may be challenging to model. One approach is to make empirical models based on data from experimental designs. In the present paper hearth breads are studied. The protein quality and protein content of the wheat flour have together with the mixing and proving time been found critical for hearth bread characteristics. By adjusting the process settings according to wheat flour properties, hearth breads within acceptable quality limits may be made from very different flours. A mixture-process design was constructed and 99 hearth bread batches were made. Models for hearth bread characteristics and production costs were estimated and optimized by mathematical programming. The study also considers how model uncertainty and different pricing systems of wheat flours and capacity costs influence the optimal solutions.
Mevik, Bjørn-Helge (2004). Assessing the performance of classifiers when classes arise from a continuum. Computational Statistics & Data Analysis. ISSN 0167-9473. 46, p. 689–705. doi: 10.1016/j.csda.2003.09.010. Show summary
The situation where classes arise by dividing the range of a continuous response variable into intervals is discussed. The focus is on assessing the performance of classifiers. Due to the underlying continuum, all misclassifications are not equally grave. The probability of misclassification (pmc) is not optimal in this situation. An alternative performance measure, the squared error rate (sqerr) is proposed. It is related to the mean squared error of regression, and penalises misclassifications according to their severity. Also, because of measurement errors in the response variable, there are misallocated class labels in data sets used for training and testing. Estimates of the pmc and the sqerr are developed for this situation. The estimates are tested and compared on a real data set and in a simulation.
Hersleth, Margrethe; Mevik, Bjørn-Helge; Næs, Tormod & Guinard, J.-X. (2003). Effect of contextual factors on liking for wine use of robust design methodology. Food Quality and Preference. ISSN 0950-3293. 14, p. 615–655.
Hersleth, Margrethe; Mevik, Bjørn-Helge; Næs, Tormod & Guinard, J.-X. (2003). Effect of contextual factors on liking for wine - use of robust design methodology. Food Quality and Preference. ISSN 0950-3293. 14(7), p. 615–622. doi: 10.1016/S0950-3293(02)00190-8. Show summary
This research investigated the effects of context on the acceptability of Chardonnay wines using the robust design methodology. Robust design methods distinguish between two types of design variables: control factors and noise factors. The control factors in this study were enological variables used to make the wines. The noise factors were the contexts in which the wines were evaluated. Eight Chardonnay wines were produced according to an experimental design with or without (1) malolactic fermentation, (2) oak contact, and (3) sugar addition to the finished wine. The wines were served in a laboratory and in a reception room with or without food, and rated for degree of liking on the nine-point hedonic scale by 55 wine consumers. Analyses of variance showed that the control factors and the noise factors had significant, and similar in size, effects on liking. The robust design methodology affords the product designer the ability to better understand the effects of product variation and context variation on product acceptability.
Mevik, Bjørn-Helge (2002). Robustness of robust process optimization. Quality Engineering. ISSN 0898-2112. 14(1), p. 13–23. doi: 10.1081/QEN-100106882. Show summary
The article treats sensitivity of robust process optimization (robust design) to model estimation errors and uncertainty of the target value. Model estimation errors due to random error in the response leads to variance and bias of the optimization criterion, and uncertainty in the location of the optimal parameter setting. Methods to assess and compensate for such effects are presented. We introduce a technique for finding confidence regions for the location of the optimal parameter setting. Finally, we describe a way to assess the sensitivity to uncertainty in the target value, and present a strategy for dealing with such cases.
Mevik, Bjørn-Helge & Næs, Tormod (2002). Strategies for classification when classes arise from a continuum. Quality Engineering. ISSN 0898-2112. 15(1), p. 113–126. doi: 10.1081/QEN-120006714. Show summary
The situation where classes arise from a continuum is studied. In this situation, both regression and classification can perform the class allocation. It is not obvious how to compare classifiers and regressions, and different performance measures are described and briefly discussed. Several strategies for class allocation in the present situation are discussed and evaluated. Modifications to existing methods are proposed to make them more suitable for the problem. The performance measures and a selection of class allocation methods are tested and compared in simulations, with both low-dimensional data and 100-dimensional spectroscopy-like data. They are also tested on a real spectroscopic data set. The results show that classification by means of an appropriate regression outperforms the classifiers in most situations.
Mevik, Bjørn-Helge; Mosleth, Ellen Færgestad; Ellekjær, Marit Risberg & Næs, Tormod (2001). Using raw material measurements in robust process optimization. Chemometrics and Intelligent Laboratory Systems. ISSN 0169-7439. 55(1-2), p. 133–145. doi: 10.1016/S0169-7439(01)00100-9. Show summary
Unwanted variation due to variable raw material quality is often a problem in production processes. Robust process optimization seeks to reduce the effects of such variation by identifying settings of the adjustable factors that makes the process less sensitive to the variations. This paper develops a unified framework for studying and developing robust process optimization and process control techniques. We divide the factors of the process into groups based on characterizations of their properties. We also develop a robust process optimization technique for batch-wise processes, called batch-wise robust process optimization, which utilizes all available measurements of raw material qualities at the start of each production batch. The technique achieves a reduction of variability due to variation in raw material qualities, compared to ordinary robust process optimization. Two examples taken from baking of hearth bread illustrate the technique. (C) 2001 Elsevier Science B.V. All rights reserved.
Næs, Tormod & Mevik, Bjørn-Helge (2001). Understanding the collinearity problem in regression and discriminant analysis. Journal of Chemometrics. ISSN 0886-9383. 15(4), p. 413–426. doi: 10.1002/cem.676. Show summary
This paper presents a discussion of the collinearity problem in regression and discriminant analysis. The paper describes reasons why the collinearity is a problem for the prediction ability and classification ability of the classical methods. The discussion is based on established formulae for prediction errors. Special emphasis is put on differences and similarities between regression and classification. Some typical ways of handling the collinearity problems based on PCA will be described. The theoretical discussion will be accompanied by empirical illustrations. Copyright (C) 2001 John Wiley & Sons, Ltd.
Rødbotten, Rune; Mevik, Bjørn-Helge & Hildrum, Kjell Ivar (2001). Prediction and classification of tenderness in beef from non-invasive diode array detected NIR spectra. Journal of Near Infrared Spectroscopy. ISSN 0967-0335. 9(3), p. 199–210. doi: 10.1255/jnirs.306. Show summary
NIR absorbance spectra of 48 beef samples were recorded 2, 9 and 21 days post mortem in the wavelength range 950–1700 nm with a Zeiss MCS 511 instrument equipped with diode array detector. These spectra were used to predict tenderness of the meat samples when Warner–Bratzler (WB) shear force was used as the reference method. Two types of prediction models were made. The models were either based on NIR spectra alone or NIR spectra in combination with information about post slaughter treatments. Prediction models from NIR spectra alone gave correlation coefficients in the range 0.52–0.83, but when variables for post slaughter treatments were included in the models the correlation coefficients were in the range 0.71–0.85. The additional variables had no effect on the prediction results when tenderness was predicted at the same time as NIR spectra were acquired, but improvements were found when tenderness was forecast later than the spectral acquisition. Based on these prediction models the beef samples were classified into two or three tenderness groups. When the beef samples were classified into two groups, 73–98% of the samples were correctly classified, while there were 63–75% correct classified samples when they were allocated into three groups.

View all works in Cristin

Liland, Kristian Hovde & Mevik, Bjørn-Helge (2011). Baseline (R package). Show summary
R package available at the Comprehensive R Archive Network. The package contains a collection of baseline correction algorithms, along with a framework and a GUI for optimising baseline algorithm parameters.
Næs, Tormod; Tomic, Oliver; Mevik, Bjørn-Helge & Martens, Harald (2010). Path modelling by sequential PLS regression.
Næs, Tormod; Tomic, Oliver; Mevik, Bjørn-Helge & Martens, Harald (2010). Path modelling by sequential PLS regression.
Liland, Kristian Hovde; Mevik, Bjørn-Helge; Rukke, Elling-Olav; Almøy, Trygve; Skaugen, Morten & Isaksson, Tomas (2009). Quantitative whole spectrum analysis with MALDI-TOF MS. Show summary
Traditional application of MALDI-TOF MS (matrix assisted laser desorption/ionization time-of-flight mass spectrometry) is mainly for peak detection aiming at identification of proteins and peptides. In recent years attempts have been made to use MALDI-TOF MS in a quantitative way, though mostly from comparing selected peaks and selected standards. Our aim is to perform quantitative analysis directly on MALDI-TOF MS based on skimmed milk from cow, goat and ewe. To create a reliable foundation for multivariate, quantitative analysis, we start by optimising measurements with respect to well chosen criteria. Experimental design is employed to search for optimal preparation of the milk samples and optimal MALDI-TOF instrumental settings according to predefined quality measures. The results suggest that optimal parameter settings in a quantitative context may differ from the standard settings used to obtain high resolution spectra in traditional mass spectrometry. Using the optimised parameter settings we generate a dataset of 45 different milk mixtures. In the final analysis we will apply different pre-processing and non-linearity compensation techniques aiming to minimise uncertainty in predictions of milk concentrations.
Liland, Kristian Hovde; Mevik, Bjørn-Helge; Rukke, Elling-Olav; Almøy, Trygve; Skaugen, Morten & Isaksson, Tomas (2009). Quantitative whole spectrum analysis with MALDI-TOF MS.
Mevik, Bjørn-Helge; Jørgensen, Kjetil; Måge, Ingrid & Næs, Tormod (2007). LS-PLS Regression: Combining Design and Spectral Data as Predictors. Show summary
In many situations in the industry and in research one performs designed experiments to find the relationship between a set of predictor variables and one or more responses. Often there are other factors that influence the results in addition to the factors that are included in the design. To obtain information about such factors, one can measure them using spectroscopic methods. One is then faced with the challenge of analyzing data that is a combination of a design matrix and one or more spectroscopic matrices with hundreds of highly collinear variables. One answer to this challenge is LS-PLS, a regression method that is a combination of partial least squares regression (PLSR) and ordinary least squares regression (OLSR). The principal idea underlying LS-PLS is first to regress the responses on the design matrix with OLSR, then use PLSR to regress `the rest' of the responses onto the spectroscopic data. The spectroscopic blocks can be added serially or in parallel, and can be orthogonalised against the preceding matrices. In the end, the results are combined into a single OLS regression. LS-PLS has several advantages: It gives loadings that are easier to interpret It is independent of scaling of the different data matrices It is simple to understand and implement It is easily extended to more complex data situations It gives information about how much each matrix contributes when added to the model In this presentation we describe and illustrate LS-PLS, and compare it with the approach of using a single PLSR on the combined data matrices.
Mevik, Bjørn-Helge; Rukke, Elling-Olav; Skaugen, Morten; Vegarud, Gerd; Devold, Tove Gulbrandsen & Aastveit, Are Halvor [Show all 7 contributors for this article] (2007). Kvantitativ bruk av MALDI-TOF massespektroskopi: avsløring av juks i melkeblandinger. Show summary
Massespektroskopi (MS) er godt egnet til å påvise eksistensen av proteiner og peptider, og MALDI-TOF MS er en rask og sensitiv MS-teknikk. Tradisjonelt har MS blitt brukt kvalitativt, typisk ved at man identifiserer hvilke topper som finnes i spekteret og søker i databaser etter proteiner med disse massene. Det har ofte blitt hevdet at MS ikke er en kvantitativ målemetode; man kan finne ut om et protein eller peptid er til stede i en prøve, men ikke mengden av det. Ved IKBM jobber vi med å bruke MS kvantitativt, og å bruke hele spekteret, ikke bare utvalgte topper. For å få til dette, er en del spesifikke utfordringer som må håndteres, f.eks. skift, intensitetsvariasjoner og grunnlinjevariasjon. I dette foredraget vil vi gjennomgå noen av disse utfordringene og foreslå strategier for å håndtere dem. Saueoster får bare inneholde en viss andel kumelk. Sauemelk er mye dyrere enn kumelk, så det kan være fristende å bytte ut mer av sauemelka med kumelk. En rask og pålitelig metode for å finne det faktiske blandingsforholdet i saueost kan være et viktig verktøy for importører og kontrollmyndigheter. Vi vil presentere resultater fra en modell-studie der vi har kjørt MALDI-TOF MS på blandinger av ku-, geit- og sauemelk, og hvor målet er å kunne predikere blandingsforholdet fra massespektrene. Resultatene er lovende, og vi mener å kunne si at man kan bruke MALDI-TOF MS kvantitativt.
Mevik, Bjørn-Helge; Jørgensen, Kjetil; Måge, Ingrid & Næs, Tormod (2007). LS-PLS Regression: Combining Design and Spectral Data as Predictors. Show summary
In many situations in the industry and in research one performs designed experiments to find the relationship between a set of predictor variables and one or more responses. Often there are other factors that influence the results in addition to the factors that are included in the design. To obtain information about such factors, one can measure them using spectroscopic methods. One is then faced with the challenge of analyzing data that is a combination of a design matrix and one or more spectroscopic matrices with hundreds of highly collinear variables. One answer to this challenge is LS-PLS, a regression method that is a combination of partial least squares regression (PLSR) and ordinary least squares regression (OLSR). The principal idea underlying LS-PLS is first to regress the responses on the design matrix with OLSR, then use PLSR to regress `the rest' of the responses onto the spectroscopic data. The spectroscopic blocks can be added serially or in parallel, and can be orthogonalised against the preceding matrices. In the end, the results are combined into a single OLS regression. LS-PLS has several advantages: It gives loadings that are easier to interpret It is independent of scaling of the different data matrices It is simple to understand and implement It is easily extended to more complex data situations It gives information about how much each matrix contributes when added to the model In this presentation we describe and illustrate LS-PLS, and compare it with the approach of using a single PLSR on the combined data matrices.
Mevik, Bjørn-Helge; Rukke, Elling-Olav; Skaugen, Morten; Vegarud, Gerd; Devold, Tove Gulbrandsen & Aastveit, Are Halvor [Show all 7 contributors for this article] (2007). Kvantitativ bruk av MALDI-TOF massespektroskopi: avsløring av juks i melkeblandinger. Show summary
Massespektroskopi (MS) er godt egnet til å påvise eksistensen av proteiner og peptider, og MALDI-TOF MS er en rask og sensitiv MS-teknikk. Tradisjonelt har MS blitt brukt \emph{kvalitativt}, typisk ved at man identifiserer hvilke topper som finnes i spekteret og søker i databaser etter proteiner med disse massene. I kjemometrien er det et sterkere fokus på \emph{kvantitativ} bruk av spektra, ofte ved at man relaterer hele spekteret til en kontinuerlig respons. Ved IKBM jobber vi med å bruke MS kvantitativt, og å bruke hele spekteret, ikke bare utvalgte topper. For å få til dette, er en del spesifikke utfordringer som må håndteres, f.eks. skift, intensitetsvariasjoner og grunnlinjevariasjon. I dette foredraget vil vi gjennomgå noen av disse utfordringene og foreslå strategier for å håndtere dem. Saueoster får bare inneholde en viss andel kumelk. Sauemelk er mye dyrere enn kumelk, så det kan være fristende å bytte ut mer av sauemelka med kumelk. En rask og pålitelig metode for å finne det faktiske blandingsforholdet i saueost kan være et viktig verktøy for importører og kontrollmyndigheter. Vi vil presentere resultater fra en modell-studie der vi har kjørt MALDI-TOF MS på blandinger av ku-, geit- og sauemelk, og hvor målet er å kunne predikere blandingsforholdet fra massespektrene.
Mevik, Bjørn-Helge (2007). Using the Whole Mass Spectrum in Data Analysis: Challenges and Possibilities.
Mevik, Bjørn-Helge (2006). Partial Least Squares Regression og Principal Component Regression med R-pakken pls.
Mevik, Bjørn-Helge (2006). The pls package. R News. ISSN 1609-3631. 6(3), p. 12–17.
Mevik, Bjørn-Helge; Jørgensen, Kjetil; Måge, Ingrid & Næs, Tormod (2006). LS-PLS Regression: Combining Design Variables with Blocks of Spectroscopic Measurements. Show summary
LS-PLS regression is a general methodology for building regressions on combinations of design variables and blocks of spectroscopic data. It is based on first fitting the design matrix with ordinary least squares (LS) regression, and then compress the spectroscopic matrices individually using partial least squares regression (PLSR) with the residuals from previous fits as response. Finally, the design variables and compressed blocks are used in an LS regression. The matrices can be added serially or in parallel, and can be orthogonalised against the preceding matrices. Main advantages of the LS-PLS methodology are simplicity, explicit fitting of design variables, and easy separation of the information gained from each block. It is also easy to extend or generalise to other data types, error structures or fitting methods.
Berget, Ingunn; Mevik, Bjørn-Helge; Vebø, Heidi & Næs, Tormod (2005). A strategy for finding relevant clusters;with an application to microarray data. Show summary
Cluster analysis is a helpful tool for explorative analysis of large and complex data. Most clustering methods will, however, find clusters also in random data. An important aspect of cluster analysis is therefore to distinguish real and artificial clusters, as this will make interpretation of the clusters easier. In some cases, certain types of clusters are more interesting than others. When working with gene expression data, examples of such clusters are gene clusters with high between sample variability, or clusters with a certain expression profile. Here we present a strategy with the ability to search for such clusters. The clustering is done sequentially. For each sequence, the data is separated into ?interesting? and ?rest? using the fuzzy c-means algorithm with noise clustering. The interesting cluster is defined by adding a penalty function to the usual clustering criterion. The penalty function is constructed in such a way that clusters without the interesting properties are given a high penalty. The strategy is presented in a general frame, and can be adjusted by defining different criteria for each type of cluster that is of interest. The methodology is presented and demonstrated in the context of microarray gene expression analysis, using real and simulated data, but can be used for any type of data where cluster analysis may be a helpful tool.
Mevik, Bjørn-Helge & Martens, Harald (2005). Creating `nice' spectra from 1D-gels, with a focus on alignment.
Dingstad, Gunvor; Mevik, Bjørn-Helge & Færgestad, Ellen (2003). Modelling and optimisation of quality and costs in hearth bread production.
Dingstad, Gunvor; Mevik, Bjørn-Helge & Færgestad, Ellen (2003). Hva koster brødkvalitet? Optimering av kvalitet og kostnad i brød bakt uten form.
Dingstad, Gunvor; Mevik, Bjørn-Helge & Færgestad, Ellen (2003). Hva koster brødkvalitet?
Nilsen, Bjørg Narum; Mevik, Bjørn-Helge; Hildrum, Kjell Ivar & Isaksson, Tomas (1999). On line near infrared analysis of whole salmon fillets.

View all works in Cristin

Published Nov. 16, 2011 1:43 PM - Last modified Aug. 19, 2012 9:11 PM