RSNA 2023 腹部创伤检测 分类数据集预处理

RSNA 2023 腹部创伤检测 分类数据集预处理

对于一个患者的CT系列图像,我们往往只需要腹部CT图像即可,因此,为了减少数据的输入规模和筛选有用的CT切片。我们需要进行器官分割,得到至少含有肝、脾、肾、肠的CT切片。

器官分割的代码参考的:PoC Segmentator with relative low execution time. | Kaggle

读取dcm系列文件,得到分割mask

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import os
import numpy as np
from infer import volume_and_seg
count = len(os.listdir(path))
# T1 = time.time()
dicom_list, seg, _ = volume_and_seg(path, channels=count, clear_mempool=True, cuda=1)
# T2 = time.time()
# print(T2 - T1)
#分割时间大概是1000张CT切片在15s以内
dicom_array = np.zeros((seg.shape[0], 512, 512))
#这里的dicom_array序列号与seg是相反的,图像也是旋转180度后,需要做一些处理与seg的mask对应
for i in range(seg.shape[0]):
dicom_array[i] = np.array(dicom_list[seg.shape[0] - 1 - i].pixel_array)
#dicom_list[i].pixel_array是返回dcm的像素值信息,其他信息则不需要
dicom_array[i] = cv2.rotate(dicom_array[i], cv2.ROTATE_180)
#print(np.shape(dicom_array))

读取到了dcm_array和seg_mask后,需要将mask里各个器官对应的切片分开。

1
2
3
4
5
6
7
8
9
liver = np.zeros_like(seg, dtype=np.int32)
spleen = np.zeros_like(seg, dtype=np.int32)
kidney = np.zeros_like(seg, dtype=np.int32)
bowel = np.zeros_like(seg, dtype=np.int32)
liver[seg == 1] = 1
spleen[seg == 2] = 1
kidney[seg == 3] = 1
kidney[seg == 4] = 1
bowel[seg == 5] = 1

这样,每个器官对应的mask就被分开了,接下来需要筛选一下,如果对应器官对应的区域过小则丢弃,这一步是筛除错误分割结果和无价值的CT图。

1
2
3
4
5
6
7
8
9
10
11
12
def set_zero_if_less(matrix, count):
for i in range(matrix.shape[0]):
channel = matrix[i]
if np.sum(channel == 1) < count:
#如果等于1的像素个数少于一个值,则认为可能是分类错误或无价值
matrix[i] = 0
return matrix
liver = set_zero_if_less(liver, 400)
spleen = set_zero_if_less(spleen, 90)
#脾脏比较小,所以阈值设低一点
kidney = set_zero_if_less(kidney, 400)
bowel = set_zero_if_less(bowel, 400)

然后,生成有用图像相对应的序列,将无用的CT图像筛除。

1
2
3
4
5
6
7
8
9
10
11
def genernate_idx(martix):
martix_idx = []
for i in range(martix.shape[0]):
if (martix[i] == 1).any() == True:
#如果这个分割mask里含有1,即对应CT图有该器官,则将图像序列保存下来。
martix_idx.append(i)
return martix_idx
liver_idx = genernate_idx(liver)
spleen_idx = genernate_idx(spleen)
kidney_idx = genernate_idx(kidney)
bowel_idx = genernate_idx(bowel)

将对应的有用CT图像与分割mask提取出来,并且通过随机删除或复制,得到(128,512,512)的输入

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def matrix_idx(matrix , idx_list):
result = []
for i in range(len(idx_list)):
# print(idx_list[i])
result.append(matrix[idx_list[i]])
#按照上面得到的索引,筛选出有用的切片
return np.array(result)
def transform_matrix(matrix1, matrix2):
C, W, H = matrix1.shape
if C >= 128:
keep_channels = np.random.choice(range(C), size=128, replace=False)
#如果切片数大于128,随机选择到128个切片
else:
keep_channels = np.random.choice(range(C), size=128, replace=True)
#如果切片数小于128,随机复制到128个切片
new_matrix1 = matrix1[keep_channels]
new_matrix2 = matrix2[keep_channels]
new_matrix2 = new_matrix2.repeat(2, axis=1).repeat(2, axis=2)
return new_matrix1, new_matrix2

liver_mask = matrix_idx(liver, liver_idx)
liver_dcm = matrix_idx(dicom_array, liver_idx)
liver_dcm, liver_mask = transform_matrix(liver_dcm, liver_mask)

spleen_mask = matrix_idx(liver, spleen_idx)
spleen_dcm = matrix_idx(dicom_array, spleen_idx)
spleen_dcm, spleen_mask = transform_matrix(spleen_dcm, spleen_mask)

kidney_mask = matrix_idx(kidney, kidney_idx)
kidney_dcm = matrix_idx(dicom_array, kidney_idx)
kidney_dcm, kidney_mask = transform_matrix(kidney_dcm, kidney_mask)

bowel_mask = matrix_idx(bowel, bowel_idx)
bowel_dcm = matrix_idx(dicom_array, bowel_idx)
bowel_dcm, bowel_mask = transform_matrix(bowel_dcm, bowel_mask)

最终结果:

读取该病人706张切片,其中肝有190张切片,脾有93张切片,肾有134张切片,肠有417张切片(大肠相对于其他器官数量过多了,随机筛选到128张切片,是否会丢失有用的信息??)

整个预处理部分耗时13.7s,显卡为RTX3090Ti。

注:优化了切片筛选的代码,若切片数大于128的两倍,则先融合再随机丢弃,减少信息损失

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def transform_matrix(matrix1, matrix2):
C, W, H = matrix1.shape
if C >= 128:
while C >= 2 * 128:
new_matrix1 = np.zeros((C // 2, matrix1.shape[1], matrix1.shape[2]))
new_matrix2 = np.zeros((C // 2, matrix2.shape[1], matrix2.shape[2]))
for i in range(C // 2):
new_matrix1[i] = (matrix1[2 * i] + matrix1[2 * i + 1]) / 2 # 两两求平均
new_matrix2[i] = np.logical_or(matrix2[2 * i], matrix2[2 * i + 1]) # 两两求并集
matrix1 = new_matrix1 # 更新矩阵
matrix2 = new_matrix2
C = matrix1.shape[0] # 更新C通道数
keep_channels = np.random.choice(range(C), size=128, replace=False)
#优化后,最好的结果是,C是128的整数倍,只需要融合不需要丢弃,最差结果也仅是丢弃127张,保留128张。
else:
keep_channels = np.random.choice(range(C), size=128, replace=True)
new_matrix1 = matrix1[keep_channels]
new_matrix2 = matrix2[keep_channels]
new_matrix2 = new_matrix2.repeat(2, axis=1).repeat(2, axis=2)
return new_matrix1, new_matrix2
  • Copyright: Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.

扫一扫,分享到微信

微信分享二维码
  • Copyrights © 2020-2023 YYz
  • Visitors: | Views:

请我喝杯咖啡吧~

支付宝
微信