The cluster, right now Live numbers from the BharatCode serving VM: vLLM throughput, queue depth, model list, and GPU telemetry. Current counters come directly from the serving VM, while the graph tracks serving-slot capacity and queue depth over time.
A100 VM 1x NVIDIA A100 40GB
0% load 31C
0% 5-minute averaged serving capacity
0% instantaneous GPU hardware load
Serving capacity utilization
100% 50% 0% 28 May, 04:25 28 May, 16:25 29 May, 04:25 Average queue depth
3 2 0 28 May, 04:25 28 May, 16:25 29 May, 04:25 1 Shivani Raja 13,60,59,651
2 pradeep reddy 1,30,86,664
3 Ritvik Sharma 89,18,583
4 Ishaan Kesarwani 18,40,819
5 PANDHRINATH RAHUL 17,08,775
6 Abhishek Keshri 1,59,727
7 vaibhav tiwari 1,894
8 Anonymous 76
1 Shivani Raja 1,120
2 pradeep reddy 799
3 Ritvik Sharma 771
4 Ishaan Kesarwani 700
5 PANDHRINATH RAHUL 685
6 Abhishek Keshri 544
7 vaibhav tiwari 336
8 Anonymous 204
0 running requests
0 waiting requests
8751ms avg latency
0 queue depth
0% capacity used
0% KV cache
0% GPU hardware load
Runtime counters 6,056 completed requests
43,54,13,097 total tokens
43,05,80,650 prompt tokens
48,32,447 generation tokens
1450ms avg time to first token
9ms avg output token interval
Serving models bharatcode:qwen36-35b-awq-200k 2,00,000 context
bharatcode:qwen36-35b-q6-256k-vision 2,00,000 context
bharatcode:qwen36-35b-q8-256k 2,00,000 context
GPU 36.4 GB / 40 GB VRAM, 31C, 50.5 W.