Madagascar Development Blog

Vplot figures and MS Word

March 16, 2016 Systems No comments

Joe Dellinger, the author of Vplot, suggests adjusting parameters for raster figures when including them in Word documents. He writes:

Wow, working on my SEG abstract I had a helluva time getting my vplot raster figures to look decent in word. Then I realized… wait a minute, it’s doing just the bad things plotters back in the 80’s were doing. I fiddled a little with pixc and greyc, and voila! Beautiful raster figures.

From the Vplot documentation:

pixc is used only when dithering is being performed, and also should only be used for hardcopy devices. It alters the grey scale to correct for pixel overlap on the device, which (if uncorrected) causes grey raster images to come out much darker on paper than on graphics displays.

greyc is used only when dithering is being performed, and really should only be used for hardcopy devices. It alters the grey scale so that grey rasters come out on paper with the same nonlinear appearance that is perceived on display devices.

The default values are pixc=1 greyc=1. The values used by Joe in his Word document were pixc=1.15 greyc=1.25.

To convert Vplot plots to other forms of graphics, you can use vpconvert.

National academies and reproducible research

March 14, 2016 Links No comments

A high-profile workshop Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results was organized by the National Academies of Sciences and the National Science Foundation and took place in Washington (DC) last year. The workshop summary report was recently published by the National Academies Press.

Here is an extract, which lists recommendations from the panel discussion:

Establish publication requirements for open data and code. Journal editors and referees should confirm that data and code are linked and accessible before a paper is published. (Keith Baggerly)

Clarify strength of evidence for findings. The strength of evidence should be clearly stated for theories and results (in publications, press releases, etc.) to ensure that initial explorations are not misrepresented as being more conclusive than they actually are. (Keith Baggerly)

Align incentives. Communities need to examine how to build a culture that rewards researchers who put effort into verifying their own results rather than quickly rushing to publication (Marcia McNutt)

Improve training.

Institutions need to make extra efforts to instill students with an ethos of care and reproducibility. (Marcia McNutt)

Universities need to change the curriculum to incorporate topics such as version control, code review, and general data management, and communities need to revise their incentives to improve the chances of reproducible, trustworthy research in the future. Steps to improve the future workforce are necessary to keep the public trust of science. (Randy LeVeque)

Many graduates are well steeped in open-source software norms and ethics, and they are used to this as a normal way of operating. However, they come into a scientific research setting where codes are not shared, transparent, or open; instead, codes are being built or constructed in a way that feels haphazard to them. This training disconnect can interfere with mentorship and with their continuation in science. Better understanding of these norms is needed in all levels of research (Victoria Stodden)

Prevention and motivation need to be components of instilling the proper ethos. This could be part of National Institutes of Health (NIH)-mandated ethics courses. (Keith Baggerly)

Clarify terminology. A clearer set of terms is needed, especially for teaching students and creating guidelines and best practices. Some examples of how to do this can be found within the uncertainty quantification community, which successfully clarified the terms verification and validation that were almost used synonymously 10-15 years ago. (Ronald Boisvert)

The authors of these recommendations are:

Keith Baggerly, a Professor at UT MD Anderson Cancer Center, best known as a practitioner of “forensic reproducibility” in bioinformatics.
Ronald Boisvert, the head of the Applied and Computational Mathematics Division at the National Institute of Standards and Technology (NIST).
Randy LeVeque, a Professor of Applied Mathematics at the University of Washington and the author of Top Ten Reasons To Not Share Your Code (and why you should anyway).
Marcia McNutt, the editor-in-chief of Science and the president-elect of the National Academy of Sciences.
Victoria Stodden, Associate Professor at the University of Illinois at Urbana-Champaign and a co-editor of Implementing Reproducible Research.

Double-sparse dictionary

February 27, 2016 Documentation No comments

A new paper is added to the collection of reproducible documents: Double sparsity dictionary for seismic noise attenuation

A key step in sparsifying signals is the choice of a sparsity-promoting dictionary. There are two basic approaches to design such a dictionary: the analytic approach and the learning-based approach. While the analytic approach enjoys the advantage of high efficiency, it lacks adaptivity to various data patterns. On the other hand, the learning-based approach can adaptively sparsify different datasets but has a heavier computational complexity and involves no prior-constraint pattern information for particular data. We propose a double sparsity dictionary (DSD) for seismic data in order to combine the benefits of both approaches. We provide two models to learn the DSD: the synthesis model and the analysis model. The synthesis model learns DSD in the data domain, and the analysis model learns DSD in the model domain. We give an example of the analysis model and propose to use the seislet transform and data-driven tight frame (DDTF) as the base transform and adaptive dictionary respectively in the DSD framework. DDTF obtains an extra structure regularization by learning dictionaries, while the seislet transform obtains a compensation for the transformation error caused by slope dependency. The proposed DSD aims to provide a sparser representation than the individual transform and dictionary and therefore can help achieve better performance in denoising applications. Although for the purpose of compression, the proposed DSD is less sparse than the seislet transform, it outperforms both seislet and DDTF in distinguishing signal and noise. Two simulated synthetic examples and three field data examples confirm a better denoising performance of the proposed approach.

Continuous reproducibility using CircleCI

February 20, 2016 Systems No comments

Continuous Integration (CI) is a powerful discipline of software engineering, which involves a shared code repository, where developers contribute frequently (possibly several times per day), and an automated build system which includes testing scripts.

As previously suggested, CI tools can be easily adopted to perform continuous reproducibility: repeatedly testing if previously reproducible results remain reproducibe after software changes. Continuous reproducibility can assure that reproducible documents stay “alive” and continue to be usable.

Numerous tools have appeared in recent years to offer CI services in the cloud: Travis CI, Semaphore, Codeship, Shippable, etc. It is hard to choose one. I would pick CircleCI. CircleCI is developed by a startup company from San Francisco. Its product is not fundamentally different from analogous services but provides a solid implementation, which includes:

Integration with GitHub
SSH access
Sleek user interface
Simple configuration via circle.yml file
Fast parallel execution

Let us test if it can serve as a good platform for Madagascar’s continuous reproducibility.

Program of the month: sfmig2

February 18, 2016 Programs No comments

sfmig2 implements 2-D prestack Kirchhoff time migration.

The program is using triangle antialiasing filters.

J. F. Claerbout, 1992, Anti Aliasing: SEP-73, Stanford Exploration Project, 371-390.

D. E. Lumley, J. F. Claerbout, and D. Bevc, 1994, Anti-aliased Kirchhoff 3-D migration: SEG Annual Meeting, Expanded Abstracts, 1282-1285.

The following example from sep/aal/gulf shows migration applied to a near-offset section from the Gulf of Mexico.

The amount of antialiasing is controlled by antialias= parameter.

A half-derivative waveform-correction filter (rho filter) is applied and is controlled by rho= parameter.

The program has an adjoint flag adj= and can be used as a linear operator. The default value adj=y corresponds to migration, adj=n corresponds to modeling (demigration).

An additional required input is vel= (time-migration velocity). An optional output is gather= (common-offset image gathers).

10 previous programs of the month:

Tutorial on compressed sensing

February 16, 2016 Examples No comments

The example in rsf/tutorials/cs reproduces the tutorial from Ben Bougher on compressed sensing. The tutorial was published in the October 2015 issue of The Leading Edge.

Madagascar users are encouraged to try improving the results.

Program of the month: sfsort

January 16, 2016 Programs No comments

sfsort sorts the input by absolute value.

It takes either floating-point or complex input. Here is a quick example:

bash$ sfspike n1=10 | sfnoise rep=y seed=2016 > random.rsf
bash$ < random.rsf sfdisfil
   0:       -0.3485      -0.3108       0.7928      0.01292      -0.5301
   5:       -0.4556      -0.2901      -0.7167       -1.209      -0.2871
bash$  < random.rsf sfsort | sfdisfil
   0:         1.209       0.7928       0.7167       0.5301       0.4556
   5:        0.3485       0.3108       0.2901       0.2871      0.01292

To sort in reverse (ascending) order, use ascmode=y:

 
bash$ < random.rsf sfsort ascmode=y | sfdisfil
   0:       0.01292       0.2871       0.2901       0.3108       0.3485
   5:        0.4556       0.5301       0.7167       0.7928        1.209

sfsort tries to perform sorting in memory but, if the input is too large, it switches to slower out-of-core operations. To control the amount of available memory, use memsize= parameter.

If the input is multidimensional, and you want to sort data only up to a certain dimension, use dim= parameter. In the following example, each of the two rows is sorted independently:

bash$ < random.rsf sfput n1=5 n2=2 | sfsort dim=1 | sfdisfil
   0:        0.7928       0.5301       0.3485       0.3108      0.01292
   5:         1.209       0.7167       0.4556       0.2901       0.2871

sfsort was contributed to Madagascar by Gilles Hennenfent and Henryk Modzelewski from SLIM, UBC. They provide the following test example in slim/slimUserManual/sfsort:

10 previous programs of the month:

Program of the month: sfdivn

December 22, 2015 Programs No comments

sfdivn divides two signals, producing a smooth output. It treats division as inversion and regularizes the inversion using shaping regularization.

The following example from jlu/riesz/sigmoid shows the local dip computed by a smooth division of two components of the Riesz transform.

The denominator file is provided by den=. The shaping regularization is controlled by smoothness radii rect1=, rect2=, etc. and the maximum number of iterations niter=. The iterations can be accelerated by using eps= parameter. To suppress the output of iteration statistics on the screen, use verb=n.

10 previous programs of the month:

Interpolation using nonlinear shaping regularization

November 25, 2015 Documentation No comments

A new paper is added to the collection of reproducible documents: Seismic data interpolation using nonlinear shaping regularization

Seismic data interpolation plays an indispensable role in common seismic data processing workflows. Iterative shrinkage thresholding (IST) and projection onto convex sets (POCS) can both be considered as a specific form of nonlinear shaping regularization. Compared with linear form of shaping regularization, the nonlinear version can be more adaptive because the shaping operator is not limited to be linear. With a linear combination operator, we introduce a faster version of nonlinear shaping regularization. The new shaping operator utilizes the information of previous model to better constrain the current model. Both synthetic and field data examples demonstrate that the nonlinear shaping regularization can be effectively used to interpolate irregular seismic data and the proposed faster version of shaping regularization can indeed get obvious faster convergence.

Ground-roll noise attenuation using local orthogonalization

November 24, 2015 Documentation 4 comments

A new paper is added to the collection of reproducible documents: Ground-roll noise attenuation using a simple and effective approach based on local bandlimited orthogonalization

Bandpass filtering is a common way to estimate ground-roll noise on land seismic data, because of the relatively low frequency content of ground-roll. However, there is usually a frequency overlap between ground-roll and the desired seismic reflections that prevents bandpass filtering alone from effectively removing ground-roll without also harming the desired reflections. We apply a bandpass filter with a relatively high upper bound to provide an initial, imperfect separation of ground-roll and reflection signal. We then apply a technique called ‘local orthogonalization’ to improve the separation. The procedure is easily implemented, since it involves only bandpass filtering and a regularized division of the initial signal and noise estimates. We demonstrate the effectiveness of the method on an open-source set of field data.

Vplot figures and MS Word

National academies and reproducible research

Double-sparse dictionary

Continuous reproducibility using CircleCI

Program of the month: sfmig2

10 previous programs of the month:

Tutorial on compressed sensing

Program of the month: sfsort

10 previous programs of the month:

Program of the month: sfdivn

10 previous programs of the month:

Interpolation using nonlinear shaping regularization

Ground-roll noise attenuation using local orthogonalization

Stay in touch

Archives