The cluster, right now

Live numbers from the BharatCode serving VM: vLLM throughput, queue depth, model list, and GPU telemetry. Current counters come directly from the serving VM, while the graph tracks serving-slot capacity and queue depth over time.

SERVING LIVE1 GPU · bharatcode-a100

A100 VM1x NVIDIA A100 40GB

0% load31C

OPERATIONAL HISTORY1-hour avg

Range

6h 24h 7d 30d

Bucket

5m 15m 1h

0%1-hour averaged serving capacity

0%instantaneous GPU hardware load

Serving capacity utilization

Average queue depth

TOP TOKEN CONSUMERS

1Shivani Raja14,99,51,761

2pradeep reddy1,30,86,664

3Ritvik Sharma89,54,943

4Ishaan Kesarwani34,44,950

5PANDHRINATH RAHUL20,02,270

6Abhishek Keshri1,59,727

7vaibhav tiwari1,894

8Anonymous76

STEWARDSHIP SCORE

1Shivani Raja1,157

2pradeep reddy799

3Ritvik Sharma773

4Ishaan Kesarwani753

5PANDHRINATH RAHUL703

6Abhishek Keshri544

7vaibhav tiwari336

8Anonymous204

RIGHT NOW30/5/2026, 2:53:46 am IST

0running requests

0waiting requests

8746msavg latency

0queue depth

0%capacity used

0%KV cache

0%GPU hardware load

Runtime counters

7,096completed requests

50,15,86,780total tokens

49,58,23,273prompt tokens

57,63,507generation tokens

1353msavg time to first token

9msavg output token interval

Serving models

bharatcode:qwen36-35b-awq-200k2,00,000 context

bharatcode:qwen36-35b-q6-256k-vision2,00,000 context

bharatcode:qwen36-35b-q8-256k2,00,000 context

GPU

36.4 GB / 40 GB VRAM, 31C, 51.4 W.