The cluster, right now

Live numbers from the BharatCode serving VM: vLLM throughput, queue depth, model list, and GPU telemetry. Current counters come directly from the serving VM, while the graph tracks serving-slot capacity and queue depth over time.

SERVING LIVE1 GPU · bharatcode-a100

A100 VM1x NVIDIA A100 40GB

0% load31C

OPERATIONAL HISTORY5-minute avg

Range

6h 24h 7d 30d

Bucket

5m 15m 1h

0%5-minute averaged serving capacity

0%instantaneous GPU hardware load

Serving capacity utilization

Average queue depth

TOP TOKEN CONSUMERS

1Shivani Raja13,60,59,651

2pradeep reddy1,30,86,664

3Ritvik Sharma89,18,583

4Ishaan Kesarwani18,40,819

5PANDHRINATH RAHUL17,08,775

6Abhishek Keshri1,59,727

7vaibhav tiwari1,894

8Anonymous76

STEWARDSHIP SCORE

1Shivani Raja1,120

2pradeep reddy799

3Ritvik Sharma771

4Ishaan Kesarwani700

5PANDHRINATH RAHUL685

6Abhishek Keshri544

7vaibhav tiwari336

8Anonymous204

RIGHT NOW29/5/2026, 4:25:16 am IST

0running requests

0waiting requests

8751msavg latency

0queue depth

0%capacity used

0%KV cache

0%GPU hardware load

Runtime counters

6,056completed requests

43,54,13,097total tokens

43,05,80,650prompt tokens

48,32,447generation tokens

1450msavg time to first token

9msavg output token interval

Serving models

bharatcode:qwen36-35b-awq-200k2,00,000 context

bharatcode:qwen36-35b-q6-256k-vision2,00,000 context

bharatcode:qwen36-35b-q8-256k2,00,000 context

GPU

36.4 GB / 40 GB VRAM, 31C, 50.5 W.