First approach: Self-supervised neural network

This protocol is extracted from research article:

Artificial intelligence for art investigation: Meeting the challenge of separating x-ray images of the Ghent Altarpiece

**
Sci Adv**,
Aug 30, 2019;
DOI:
10.1126/sciadv.aaw7416

Artificial intelligence for art investigation: Meeting the challenge of separating x-ray images of the Ghent Altarpiece

Procedure

In our initial attempt, we designed a self-supervised neural network that learns how to convert (approximately) an RGB image onto an x-ray image. Figure 6 depicts a high-level abstraction of this proposed approach. Explicitly, our approach was based on the following principles:

1. The function *f _{x}*( ⋅ ) :

2. The function *f _{x}* is implemented using a CNN.

3. The function is being learned by minimizing$$\parallel x-({f}_{x}({y}_{1})+{f}_{x}({y}_{2})){\parallel}_{F}^{2}$$(2)so that conceptually, the mapping *f _{x}*( ⋅ ) :

4. The input corresponds to patches taken from *y*_{1} and *y*_{2}, and the self-supervision is achieved through optimizing *f _{x}* with respect to the counterpart patch from

The original images *y*_{1}, *y*_{2}, and *x* were taken as a collection of 64 × 64 patches with an overlap of 52 pixels resulting overall in roughly 966 and 3168 patch triplets for details 1 and 2, respectively. That is, the input data were organized as RGB *N* patches $({y}_{1}^{j},{y}_{2}^{j})\in {\mathrm{\mathbb{R}}}^{64\times 64\times 3}\times {\mathrm{\mathbb{R}}}^{64\times 64\times 3}$ with the corresponding target patches *x ^{j}* ∈ ℝ

For each of the seven convolutional layers (denoted by *l*_{1}, *l*_{2}, …, *l*_{7}), we performed convolution with masks ${\{{M}_{k,i}\}}_{k=1}^{{N}_{i}}$, where the size of each mask was 5 × 5 × *N*_{i − 1}. Accordingly, the output of each of these layers would be *N _{i}* patches of size 64 × 64. We used

The learning process of the neural network aims at finding the most fitting entries of ${\{{M}_{k,i}\}}_{k=1}^{{N}_{i}}$, as well as *c*_{i, k}. The optimization of these parameters, with respect to the cost function of Eq. 2, was done through random initialization and performing 300 iterations of stochastic gradient descent. A schematic drawing of the CNN architecture is shown in Fig. 6.

As a result of the network’s design, the resolution of the output images was the same as that of the input images. As can be seen in Figs. 3 and 4 (column B), the results yielded by this process gave a seemingly clean reconstruction of *x*_{1} and a substantially worse reconstruction of *x*_{2}. However, even this result already improved upon other techniques designed to deal with the same problem (see Fig. 5). To check how faithful the reconstruction was to the mixed x-ray, we measured the MSE of the difference between the original mixed x-ray image and the summation of the two reconstructed separate x-ray images. The reconstruction MSE achieved by this approach was 0.0094 and 0.0053 (for grayscale values ranging between 0 and 1) when applied to details 1 and 2, respectively. The corresponding reconstruction mean absolute errors achieved by this approach were 0.0464 and 0.0297.

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A

Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.