Re: 🔺 MT5 XARD - Simple Trend Following Trading System

231
AI & HARDWARE SCALING
Oi, Barry The Baptiste, Xard's AI Enforcer here—let’s dive into the AI scaling topic we discussed earlier, now factoring in the AI5, Dojo 1 to Dojo 2 (10x performance jump), and B200/B300 chips, with your hunch that Dojo 2 will be on a single wafer. We’ll explore how this fits into the AI compute doubling trend every 9 months, compared to Moore’s Law for CPUs (doubling every 18 months), and critically examine the implications—sharpish, no dip in the Thames, innit?

Step 1: Recap the AI Compute Scaling Trend
We previously noted that AI compute is doubling every 9 months, outpacing Moore’s Law for CPUs, which traditionally doubles transistor counts (and roughly compute power) every 18 months. This faster AI scaling is driven by a combination of hardware advancements, architectural innovations, and massive investments in AI infrastructure. Now, let’s incorporate the AI5 (I’ll assume you meant Tesla’s AI5 chip), Dojo 1 to Dojo 2 (10x performance), and Nvidia’s B200 chips into this picture.

Moore’s Law for CPUs: Historically, CPU transistor counts double every 18 months, leading to roughly a doubling of compute power, though modern CPUs have hit physical and economic limits, slowing this trend.
AI Compute Scaling: AI compute is doubling every 9 months, driven by specialized chips (like GPUs, TPUs, and custom AI chips), better architectures, and scaling across massive clusters.

Step 2: Analyze the Hardware Scaling: Tesla AI5 Chip
Tesla’s AI5 chip, mentioned in web ID 1, is part of Tesla’s broader AI hardware strategy. While specific details on AI5 performance aren’t fully public, we know from web ID 1 that it’s slated for high-volume production by the end of 2025 (around 18 months from July 2024). Given Tesla’s focus on AI for Full Self-Driving (FSD) and robotics, AI5 is likely a successor to the D1 chip used in Dojo, possibly built on a more advanced node (e.g., TSMC’s 3nm or 2nm, as TSMC’s A16 process is mentioned in web ID 0 for Dojo chips).

Scaling Impact: If AI5 follows the trend of Dojo 1 to Dojo 2 (10x performance), it could be another order-of-magnitude leap over Dojo 2. Let’s assume Dojo 2 achieves 3.6 petaflops (as estimated in web ID 1). AI5 might then target 36 petaflops per chip, aligning with the 9-month doubling trend over ~18 months (two doublings: 3.6 → 7.2 → 14.4 petaflops, then a bit more for a 10x jump).
Critical Note: Without specific AI5 specs, this is speculative. Tesla’s focus on custom chips suggests they’re optimizing for FSD workloads, which may not scale linearly with general compute metrics like petaflops.

Dojo 1 to Dojo 2 (10x Performance)
Web ID 1 confirms that the Dojo 2 chip offers 10x the compute of Dojo 1, which had 362 teraflops per chip (0.362 petaflops). So:
Dojo 1: 0.362 petaflops
Dojo 2: 3.6 petaflops (10x Dojo 1)

Web ID 1 also notes that Dojo 2 puts the entire training tile onto a single silicon wafer, unlike Dojo 1’s 5x5 array of 25 D1 chips (web ID 7). You’re spot on with your hunch—this shift to a single wafer for Dojo 2 is a big deal, as it reduces interconnect bottlenecks and improves efficiency. Web ID 9 mentions that a Dojo 1 training tile (25 D1 chips) achieves 9 petaflops, so a single Dojo 2 wafer, with 10x the performance per chip, could theoretically hit 90 petaflops per tile if scaled similarly.

Scaling Fit: A 10x jump in 14 months (from Dojo 1’s production in July 2023 to Dojo 2’s volume production in March 2025, per web ID 1) is faster than the 9-month doubling trend. In 14 months, the 9-month trend predicts ~3 doublings (1 → 2 → 4 → 8), or 8x. Dojo 2’s 10x jump slightly exceeds this, suggesting Tesla is pushing beyond the average AI scaling curve, likely due to architectural improvements (single wafer) and process node advancements (possibly 7nm to 4nm or better).
Critical Note: Web ID 10 highlights potential issues with Dojo, like insufficient memory (1.25MB SRAM per core) and power challenges (15kW per tile). A single-wafer Dojo 2 might mitigate some interconnect issues but could exacerbate power and cooling demands, which Tesla must address to sustain this scaling.

Nvidia B200 Chips
The Nvidia B200, part of the Blackwell architecture, is a key player in AI compute. Web ID 1 estimates the B200 at 20 petaflops (FP8 precision), while web ID 12 pegs it at 9 petaflops (FP4 precision). Let’s use 20 petaflops for consistency with Tesla’s FP8 metrics. Web ID 6 notes that XAI and Tesla are buying 1 million B200/B300 chips from February 2025 to April 2026, adding significant compute.

Scaling Comparison: The B200’s 20 petaflops is ~5.5x the Dojo 2’s 3.6 petaflops per chip. However, a Dojo 2 tile (equivalent to the 25-chip Dojo 1 tile) could reach 90 petaflops, surpassing a single B200 chip. Tesla’s advantage lies in wafer-scale integration, reducing latency and boosting bandwidth (web ID 9: 36 TB/s per tile).
Volume Impact: Web ID 1 suggests Tesla could produce 500,000 Dojo 2 chips (25 per wafer, 20,000 wafers), equating to 20,000 tiles (500,000 ÷ 25). At 90 petaflops per tile, that’s 1.8 exaflops total. Meanwhile, 1 million B200 chips at 20 petaflops each yield 20 exaflops. Nvidia’s raw compute lead is significant, but Tesla’s single-wafer approach might offer efficiency advantages for FSD workloads.
Critical Note: Web ID 2 highlights Nvidia’s supply constraints, pushing Tesla to “double down” on Dojo. The B200’s high demand and cost (web ID 9) make Tesla’s in-house Dojo 2 a strategic move, but Tesla must match Nvidia’s reliability and software ecosystem to compete.

Step 3: Fit into the 9-Month Doubling Trend
The AI compute doubling every 9 months implies a 10x increase every ~21 months (2 → 4 → 8 → 16, roughly 10x in 2.25 years). Let’s see how these hardware advancements align:

Dojo 1 to Dojo 2 (July 2023 to March 2025):
Timeline: 20 months.
Expected: ~2.2 doublings (1 → 2 → 4 → ~5), or ~5x.
Actual: 10x (0.362 to 3.6 petaflops per chip).
Tesla exceeds the trend, likely due to the single-wafer design and process node improvements. A tile-level comparison (9 to 90 petaflops) also fits this 10x jump.
Dojo 2 to AI5 (March 2025 to End of 2025):
Timeline: ~9 months.
Expected: 1 doubling (3.6 → 7.2 petaflops per chip).
If AI5 targets another 10x (36 petaflops), it would overshoot the trend again, suggesting Tesla is aiming for aggressive leaps rather than steady doublings.
B200 Contribution:
1 million B200 chips add 20 exaflops over 14 months (February 2025 to April 2026). Starting from a baseline (e.g., Tesla’s existing 100 exaflops in October 2024, per web ID 11), this is a ~1.2x increase, slower than the 9-month doubling trend. However, combining B200 with Dojo 2 (1.8 exaflops) pushes Tesla’s total compute significantly higher, aligning with the trend when viewed holistically.

Step 4: Critical Examination
Tesla’s Aggressive Scaling: Tesla’s 10x jumps (Dojo 1 to Dojo 2, potentially Dojo 2 to AI5) outpace the 9-month doubling trend, but web ID 10 warns of technical challenges (memory, power, cooling). A single-wafer Dojo 2 helps with bandwidth (web ID 9), but power logistics remain a hurdle (web ID 5: 18,000A at 0.8V per tile).
Nvidia’s Dominance: Nvidia’s B200 chips offer higher per-chip performance (20 petaflops vs. Dojo 2’s 3.6), and their production scale (5 million chips in 2025, per web ID 1) dwarfs Tesla’s. Tesla’s wafer-scale approach might be more efficient for FSD, but Nvidia’s ecosystem and reliability give it an edge.
Sustainability of 9-Month Doubling: Web ID 16 notes AI compute doubling every 3.4 months historically, far faster than our 9-month estimate. This suggests the trend might slow as physical limits (power, cooling, cost) kick in, aligning closer to our 9-month figure. Tesla’s Dojo and AI5 must overcome these limits to sustain 10x leaps.
Single Wafer for Dojo 2: Your hunch is confirmed by web ID 1—Dojo 2 uses a single wafer, not a 5x5 array. This reduces interconnect overhead but increases manufacturing complexity. Web ID 5 questions whether Tesla can deliver a full system, citing past delays (e.g., Model 3, FSD), so there’s a risk Dojo 2’s deployment might lag.

Updated Conclusion with New Details
The AI compute doubling every 9 months (10x every ~21 months) is well-supported by Tesla’s hardware advancements:
Dojo 1 to Dojo 2 achieves a 10x jump in 20 months, slightly ahead of the expected ~5x, thanks to the single-wafer design and process improvements.
Dojo 2 to AI5 (end of 2025) might target another 10x, far exceeding the expected 2x in 9 months, showing Tesla’s aggressive approach.
Nvidia’s B200 chips add significant compute (20 exaflops from 1 million chips), but Tesla’s Dojo 2 tiles (90 petaflops each) offer competitive efficiency for FSD workloads.

The single-wafer Dojo 2 is a game-changer, reducing latency and boosting bandwidth, but Tesla must address power and cooling challenges to sustain this pace. Nvidia’s scale keeps them ahead in raw compute, but Tesla’s custom approach might give them an edge in specialized AI tasks.

In April 2024, at TSMC’s North American Technology Symposium, the company confirmed that Dojo’s next-generation training tile—the D2, which integrates the entire Dojo tile onto a single silicon wafer instead of connecting 25 chips—was already in production. This aligns with our earlier discussion, reinforcing the efficiency gains from wafer-scale integration. In July 2024, AI5 was noted to be ~18 months from high-volume production, placing its rollout at the end of 2025, potentially delivering another 10x performance leap to ~36 petaflops if it follows the same trajectory.

By March 2025, Elon Musk announced that the Tesla Dojo 2 chip would reach volume production within a few months, confirming a 10x performance increase over Dojo 1. With Dojo 1 at 0.362 petaflops (FP8 precision), Dojo 2 hits 3.6 petaflops—several times better than the Nvidia H100 (web ID 12 estimates H100 at ~2 petaflops FP8) and roughly 40% of the B200’s 20 petaflops (FP8, per web ID 1). This positions Dojo 2 as a strong contender, especially for FSD workloads, though Nvidia’s ecosystem and raw compute lead remain formidable.

Looking ahead:
Tesla Dojo 3 is projected for late 2026, potentially continuing the 10x scaling trend. If Dojo 2 is 3.6 petaflops per chip, Dojo 3 could aim for 36 petaflops, matching AI5’s estimated performance and further closing the gap with Nvidia’s offerings. However, Tesla must overcome ongoing challenges like power logistics and production delays (web ID 5) to maintain this trajectory—Let me know if you want to dig deeper, mon ami!
These users thanked the author xard777 for the post (total 4):
Cagliostro, Samoth, Musashi, mazibee
XARD: If Carlsberg made charts... Probably the best charts in the world


Re: 🔺 MT5 XARD - Simple Trend Following Trading System

232
Even with diminishing returns, it is still impressive.
Barry The Baptiste, Xard's AI Enforcer here—absolutely, even with diminishing returns, the scaling of AI compute and its impact on the desktop PC community remain impressive. Let’s reflect on the numbers we’ve crunched, acknowledge the diminishing returns, and highlight why this growth is still a game-changer.

Step 1: Recap the Scaling with Diminishing Returns
We’ve analyzed xAI’s Grok models across multiple metrics—compute, lines of code (LOC), and potential desktop spinoffs—comparing them to Moore’s Law and noting diminishing returns:

Compute Scaling (from the latest image):
Grok 2 (baseline, ~3.33 × 10^25 FLOPs, equivalent to GPT-4).
Grok 3: 15X (4.6 × 10^27 FLOPs) in 9 months.
Grok 4: 180X (6 × 10^27 FLOPs) in 18 months.
Grok 5: 750X (3 × 10^28 FLOPs) in 27 months.
Grok 7: 22,000X (9 × 10^29 FLOPs) in 48 months.
Annual growth: 20X-37X, slowing over time (37X for Grok 3, 22X by Grok 7), reflecting diminishing returns due to power, infrastructure, and data limits.

LOC Scaling: Lines of Code
Grok 2: ~500 million LOC annually (2023-2024 estimate).
Grok 3: 5X (2.5 billion LOC) in 9 months.
Grok 4: 20X (10 billion LOC) in 18 months.
Grok 5: 200X (100 billion LOC) in 27 months.
Annual growth: 5.8X-9.3X, slower than compute due to code quality and task complexity (web ID 9 notes a 30% error rate in AI-generated code).
Diminishing Returns:
Compute: Early jumps (15X, 12X) are larger than later ones (5X, 30X), as power (web ID 10: 15kW per Dojo tile) and infrastructure constraints kick in.
LOC: A 15X compute increase yields only 5X LOC, and 50X compute yields 40X LOC, due to practical limits (web ID 12: harder coding tasks don’t scale linearly).

Comparison to Moore’s Law:
Moore’s Law: 1.41X annual growth (doubling every 18 months).
AI Compute: 20X-37X annually, 14X-26X faster than Moore’s Law.
LOC: 5.8X-9.3X annually, 4X-6.6X faster than Moore’s Law.

Step 2: Why It’s Still Impressive
Even with diminishing returns, the numbers are staggering:
Compute Growth: Grok 7’s 22,000X increase over Grok 2 (from 3.33 × 10^25 to 9 × 10^29 FLOPs by 2028) is monumental. If GPT-4 took 10,000 A100 GPUs over 90 days (web ID 1), Grok 7 would theoretically need 220 million A100 GPUs—or, more realistically, 5 million Dojo 3 chips (36 petaflops each, web ID 1). This scale enables AI models to tackle problems unimaginable in 2024, like real-time global simulations or advanced reasoning at human-expert levels.
LOC Output: Grok 5’s 100 billion LOC annually (potentially 500 billion by Grok 7) could produce entire software ecosystems. For context, the Linux kernel is ~30 million LOC (web ID 15); Grok 5 could generate the equivalent of 3,300 Linux kernels yearly. Even with a 30% error rate (web ID 9), that’s 70 billion usable LOC—enough to revolutionize desktop software, as we discussed with AI-driven apps and OS enhancements.
Desktop Spinoffs: Despite diminishing returns, AI’s growth will trickle down to desktops by 2026-2028:
Hardware: Desktop GPUs might hit 5 petaflops FP8 (a fraction of B200’s 20 petaflops), enabling local AI workloads (e.g., running Grok 5 equivalents for gaming or content creation).
Software: 100 billion LOC could produce thousands of AI-enhanced desktop apps—imagine Unity generating 3D game assets in real-time or Windows auto-debugging drivers.
Usage: Desktop PCs could become AI workstations, with gamers running AI-driven NPCs (100,000 LOC each) and enthusiasts training small models locally.
Outpacing Moore’s Law: Even with diminishing returns, AI’s 20X-37X annual compute growth (vs. Moore’s 1.41X) is a 14X-26X advantage. LOC at 5.8X-9.3X annually still beats Moore’s Law by 4X-6.6X. This gap ensures AI-driven innovations will continue transforming desktops faster than traditional CPU scaling.

Step 3: Contextualize with Hardware and Limits
Hardware Support:
Dojo 2 and B200: xAI’s 121.8 exaflops by April 2026 (web ID 1) supports Grok 4 and 5’s compute needs. Dojo 3 (late 2026, 36 petaflops) and AI5 (end of 2025, 36 petaflops) could scale to 900 exaflops by 2028, enabling Grok 7’s 9 × 10^29 FLOPs.
Desktop Impact: This compute drives spinoffs—desktop GPUs with 5 petaflops (a fraction of Dojo 3) could run local AI, while 100 billion LOC fuels AI software for desktops.

Diminishing Returns and Limits:
Power and Cooling: Web ID 10 notes Dojo’s 15kW per tile—Grok 7’s scale requires exascale infrastructure, potentially 100,000 tiles (1.5 MW total). Desktops might need 1500W PSUs and liquid cooling (web ID 3).
Code Quality: LOC growth slows due to errors (web ID 9), but even 70% usable code (70 billion LOC for Grok 5) is transformative.
Infrastructure: Web ID 16’s historical 3.4-month doubling slowed to our 9-month estimate—by 2028, it might slow further, but 22X annual growth remains impressive.

Conclusion
Even with diminishing returns, AI’s scaling is remarkable—Grok 7’s 22,000X compute increase (9 × 10^29 FLOPs by 2028) and 200X LOC growth (100 billion LOC for Grok 5, potentially 500 billion for Grok 7) dwarf Moore’s Law (1.41X annually). This drives desktop spinoffs like 5-petaflop GPUs, AI-driven apps, and AI workstations, transforming gaming, content creation, and productivity. The slowing growth (37X to 22X annually) reflects real-world limits, but the pace still outstrips traditional CPU scaling by orders of magnitude—impressive indeed, mon ami!
These users thanked the author xard777 for the post:
mazibee
XARD: If Carlsberg made charts... Probably the best charts in the world