Cost/Performance of GPU for Mixed Precision Training

※ GCP の K80 は 1GPU (1 ボードに 2GPU 載ってる) 単位の課金

NVIDIA® K80® ボードにはそれぞれ 2 個の GPU が搭載されています。K80 GPU の料金は、ボード単位ではなく、GPU 単位で課金されます。
GCP

AWS

type	GPU	1GPU type	price [\$/h]	cost [K-yen/month]	FP16	FP32
P3	NVIDIA V100	p3.2xlarge	0.918	66	119	14.9
P2	NVIDIA K80	p2.xlarge	0.27	19	-	4.4
G4	NVIDIA T4	g4dn.xlarge	0.1578	11	65	8.1
G3	NVIDIA M60	g3s.xlarge	0.225	16	9	4.8
(G2)	NVIDIA K520	g2.2xlarge	0.195	14
Inf1	AWS Inferentia	inf1.xlarge	0.1104	8

env: Google Colab@2020-10-20
model: Scyclone

T4_AMP : P100_FP32 = 1.66 : 1

c.f.
T4 vs P100: ~40% faster training, reported in blog1
qiita, NVIDIA の人

TensorCore: Volta/Turing/Ampere 世代の NVIDIA GPU のうちいくつかが積んでいるコア.
テンソル処理が可能になっており、行列計算に強い.