The cluster, right now Live numbers from the BharatCode serving VM: vLLM throughput, queue depth, model list, and GPU telemetry. Current counters come directly from the serving VM, while the graph tracks serving-slot capacity and queue depth over time.
A100 VM 1x NVIDIA A100 40GB
0% load 31C
0% 5-minute averaged serving capacity
0% instantaneous GPU hardware load
Serving capacity utilization
100% 50% 0% 28 May, 01:40 29 May, 01:45 30 May, 01:45 Average queue depth
3 2 0 28 May, 01:40 29 May, 01:45 30 May, 01:45 1 Shivani Raja 14,99,51,761
2 pradeep reddy 1,30,86,664
3 Ritvik Sharma 89,54,943
4 Ishaan Kesarwani 34,44,950
5 PANDHRINATH RAHUL 20,02,270
6 Abhishek Keshri 1,59,727
7 vaibhav tiwari 1,894
8 Anonymous 76
1 Shivani Raja 1,157
2 pradeep reddy 799
3 Ritvik Sharma 773
4 Ishaan Kesarwani 753
5 PANDHRINATH RAHUL 703
6 Abhishek Keshri 544
7 vaibhav tiwari 336
8 Anonymous 204
0 running requests
0 waiting requests
8746ms avg latency
0 queue depth
0% capacity used
0% KV cache
0% GPU hardware load
Runtime counters 7,096 completed requests
50,15,86,780 total tokens
49,58,23,273 prompt tokens
57,63,507 generation tokens
1353ms avg time to first token
9ms avg output token interval
Serving models bharatcode:qwen36-35b-awq-200k 2,00,000 context
bharatcode:qwen36-35b-q6-256k-vision 2,00,000 context
bharatcode:qwen36-35b-q8-256k 2,00,000 context
GPU 36.4 GB / 40 GB VRAM, 31C, 51.4 W.