Gå til innhold

NVIDIA GeForce RTX 30xx tråden


Nizzen
Melding lagt til av Jarmo

Vennligst ikke ta AMD vs. Intel diskusjoner i denne tråden :)

Anbefalte innlegg

Spoiler

GeForce RTX 3080 CTD Issues likely due to SPCAP and MLCC configuration (updated)

by Hilbert Hagedoorn on: 09/26/2020 01:10 PM | source: | 199 comment(s)
GeForce RTX 3080 CTD Issues likely due to SPCAP and MLCC configuration (updated)

A few days ago we posted a news item in which we reported that some RTX 3080 owners experience issues. Games freeze or crash and returns to desktop. News is now building up on the actual (possible) root cause of it all.

I really wanted to dig in a little deeper before posting this, but after sinking my teeth in it, the theory is absolutely sound. The CTD (crash to desktop) issues are reported for several RTX 3080 cards from Zotac Trinity, MSI Ventus 3X OC, EVGA and ZOTAC cards. Likely most brands will stumble into the problem. New reports however also have indicated that Founder editions card owners sometimes see the issue. Should you experience the problem, you can temporarily tackle the problem by reducing the GPU clock speed by 50 a 100 MHz offset, possibly accompanied by a slight underclock. Of course, this quick temporary fix is at your own risk and requires the necessary knowledge. Download the latest possible Afterburner here.

What's going on?

So what is the cause of the CTD issue? Well, there are some theories. The crash to desktop seems to apply to GPU workloads that reside in a very high 2.010 MHz ~ 2.040 MHz ranges, and that's not your standard clock frequency. However, some games that are not very GPU bound, can result in higher GPU boost frequencies. Our German colleague Igor Wallossek had submitted a number of reasons for the problem, however one jumps out in particular.  

You need to understand, and I've explained this before, manufacturers have had their hands tied during the development stages of Ampere based graphics cards. No testing software other than NVIDIA supplied test software was available to stress test. Just to prevent benchmark leaks (that did work well though!). The problem is that the NVIDIA test software is a fixed workload, emulating a game. If the test finishes, the AIB will see either a 'pass' or 'fail' result.  So deep testing with other real-world applications has been a no-go at this development stage. And some applications do allow to boost that GPU at high frequencies. The problem does not seem to occur below a boost frequency of roughly 1950 MHz.

So on the reference card, six capacitors are located under the chip directly the voltage circuit (NVVDD and MSVDD voltages). These chips located there are called multilayer ceramic chip capacitors, and poscaps (conductive polymer tantalum solid capacitors).

Update: after further investigation of the components, what is referred to as poscaps everywhere, in fact, are spcaps (an imperceptibly varying component responsible for the same functionality often named and referred to being the same). None of the RTX 3000 cards would use poscaps or spcaps. 

 

index.php?ct=news&action=file&id=39038

Palit GeForce RTX 3080 GamingPRO OC shows four SPCAPS (red), two MLCC clusters (20) (green)

 

The mlcc's (in green) are extra capable to filter high frequencies, therefore video cards with more mlcc's experience fewer problems than cards with spcaps/poscaps (red). That is why the small, more difficult to solder mlcc's are also a lot more expensive. Some manufacturers have opted to use less or no mlcc's at all, and therein is the problem to be found. Manufacturers can choose this themselves. Nvidia's own Founders Edition uses four sp-caps. Currently, it is very silent at the AIB partners, but if all this information turns out to be the correct assumption, then AIBs will have to revise their design and release boards with a fix in place. For the current boards out there a quick solution would be to lower the Boost frequency with perhaps a 50 MHz lower frequency, diverting the issue. 

 

index.php?ct=news&action=file&id=39020

MSI GeForce RTX 3080 Gaming X Trio shows five SPCAPS, one MLCC clusters (10)

 

index.php?ct=news&action=file&id=39021

MSI GeForce RTX 3090 Gaming X Trio shows four SPCAPs and two MLCC clusters (20)

  

In short: specific implementations with POSCAP/SPCAP design are suspected of creating instability specifically with a particularly high boost clock. That results in itself in-game driver crashes and the dreaded CTD (crash to desktop). The solve, reconfigure POSCAPs/SPCAPS, and add MLCCs.

One AIB has confirmed this:

Hi all,

Recently there has been some discussion about the EVGA GeForce RTX 3080 series. During our mass production QC testing we discovered a full 6 POSCAPs solution cannot pass the real world applications testing. It took almost a week of R&D effort to find the cause and reduce the POSCAPs to 4 and add 20 MLCC caps prior to shipping production boards, this is why the EVGA GeForce RTX 3080 FTW3 series was delayed at launch. There were no 6 POSCAP production EVGA GeForce RTX 3080 FTW3 boards shipped.

But, due to the time crunch, some of the reviewers were sent a pre-production version with 6 POSCAP’s, we are working with those reviewers directly to replace their boards with production versions.
EVGA GeForce RTX 3080 XC3 series with 5 POSCAPs + 10 MLCC solution is matched with the XC3 spec without issues.

Thanks
EVGA

We'll keep an eye out on this situation. Again, should you experience is, downclock a bit to see if that makes a difference. We're sure that AIB will release new BIOSes where needed, and we're also sure that some board designs will get revised. Also, I read in the forums somewhere that people where wondering if NVIDIA checks AIB PCBs, I can answer that as it has been a policy for many years. The answer to that is yes, all boards are validated by NVIDIA and need to be approved before the manufacturers can mass-produce them.

Is all this 100% certain to be the issue?

I wish I knew for sure as at this point nothing is certain, but the AIB report above was pretty clear and sure about it. I have yet to experience even one crash on any of my samples at hand, and that is the honest truth. Currently, we're also seeing reports of ASUS cards (using 100% MLCCs and founder edition cards using 100% MLCCs) with similar CTD behavior reported, that could be a placebo effect. But that is odd. Also, does not seem to be a PSU issue. Some users that experienced these issues have bought a new PSU, the problem returned. As stated in the end there is a quick fix, prohibits that boost clock to rising above 2 GHz. This would hardly affect your framerates TBH as the domain frequency your 3080 sits in with a nice GPU bound title, most often is the 1900 MHz domain, so it's the unusual frequencies. Probably there will be BIOS/driver updates soon as that is the quickest and most solid fix. 

Small update: word right now is that NVIDIA is working on a new driver and has already provided it to AIB/AICs for testing. While we have no idea what the driver actually changes. I've been making some rounds at board partners to check the status of this topic. There is also still doubt among AIBs that POS/SPCAPs are responsible. And thus some AIBs think this does not involve a need for the capacitor (re)configuration, but everybody is working / heavily testing and what is needed right now is time. Meanwhile, we've asked NVIDIA for a response. To be continued, but we have hopes that the issues can be solved with a driver and/or firmware revision. 

Article sources: Guru3DComputerBaseIgor's LabNvidiaEVGA

https://www.guru3d.com/news-story/geforce-rtx-3080-ctd-issues-likely-due-to-poscap-and-mlcc-configuration.html

Litt oppdatering rundt kondensator-krisen

Quote

The CTD (crash to desktop) issues are reported for several RTX 3080 cards from Zotac Trinity, MSI Ventus 3X OC, EVGA and ZOTAC cards. Likely most brands will stumble into the problem. New reports however also have indicated that Founder editions card owners sometimes see the issue. Should you experience the problem, you can temporarily tackle the problem by reducing the GPU clock speed by 50 a 100 MHz offset, possibly accompanied by a slight underclock. Of course, this quick temporary fix is at your own risk and requires the necessary knowledge. Download the latest possible Afterburner here.

Quote

You need to understand, and I've explained this before, manufacturers have had their hands tied during the development stages of Ampere based graphics cards. No testing software other than NVIDIA supplied test software was available to stress test. Just to prevent benchmark leaks (that did work well though!). The problem is that the NVIDIA test software is a fixed workload, emulating a game. If the test finishes, the AIB will see either a 'pass' or 'fail' result.  So deep testing with other real-world applications has been a no-go at this development stage. And some applications do allow to boost that GPU at high frequencies. The problem does not seem to occur below a boost frequency of roughly 1950 MHz.

Quote

Is all this 100% certain to be the issue?

I wish I knew for sure as at this point nothing is certain, but the AIB report above was pretty clear and sure about it. I have yet to experience even one crash on any of my samples at hand, and that is the honest truth. Currently, we're also seeing reports of ASUS cards (using 100% MLCCs and founder edition cards using 100% MLCCs) with similar CTD behavior reported, that could be a placebo effect. But that is odd. Also, does not seem to be a PSU issue. Some users that experienced these issues have bought a new PSU, the problem returned. As stated in the end there is a quick fix, prohibits that boost clock to rising above 2 GHz. This would hardly affect your framerates TBH as the domain frequency your 3080 sits in with a nice GPU bound title, most often is the 1900 MHz domain, so it's the unusual frequencies. Probably there will be BIOS/driver updates soon as that is the quickest and most solid fix. 

Small update: word right now is that NVIDIA is working on a new driver and has already provided it to AIB/AICs for testing. While we have no idea what the driver actually changes. I've been making some rounds at board partners to check the status of this topic. There is also still doubt among AIBs that POS/SPCAPs are responsible. And thus some AIBs think this does not involve a need for the capacitor (re)configuration, but everybody is working / heavily testing and what is needed right now is time. Meanwhile, we've asked NVIDIA for a response. To be continued, but we have hopes that the issues can be solved with a driver and/or firmware revision. 

 

Endret av G
Lenke til kommentar
Videoannonse
Annonse

Hva en enkelt bruker opplever er likegyldig så lenge han ikke spiser seg gjennom tusenvis av kort.

 

Det er heller tvilsomt at det er valg av kondensatorer som alene er årsak til problemene. Å kjøre på med kondensatorer er ingen løsning. Det kalles å forsøke å løse problemet ved å gjøre ting på måfå og står til stryk.

Fagfolk har ytret at det ikke er en god løsning å ha kun MLCC da disse har en annen karakteristikk. Det sies også at MLCC er mer utsatt for kondensator-skriking.

Denne lanseringen virker forhastet og prosessvalg og chipdesign virker desperat.

 

  • Liker 1
Lenke til kommentar
14 hours ago, OddZen said:

Jeg tenker at han skal simulere et ganske vanlig scenario hos mange.

Har du bilder av ditt oppsett? 

Hvilke temps har du på ditt ventus og støynivå?

Jeg har et Phantek Evlolve X kabinett. Kjøpte det med tanke på at det skal være god luftflyt inn fra front hvor jeg ventilerer varmluft ut i toppen samt ut bak.

Temps på mitt ventus ligger veldig lavt under spilling. På hard benching har det aldri blitt varmere enn 72c og det er med alle vifter på "standard" profil, ikke performance.

Men joda, ikke uenig i at "vanlig" kan være at man tetter igjen kabinettet for mye i front og ikke tenker på hvordan man skal få frisk kald luft inn i kabinettet på en god måte.

Lenke til kommentar
13 hours ago, koford said:

Her er en eksempel:
Compared RTX 3080 stability, temp and overlocking

Som dere ser, så funker kortene ut av esken. Det er først når du overklokker da krasjer det.
Han sier at han kun har testet et spill (Tomb of raider) men diverse kort. (Gigabyte, Asus FE og zotac)

Mitt tanke er at dette er egentlig ikke et problem, det er først når du overklokker forbi 2GHz ish og krasjer.
Teorien er at folk overklokker høyt og får krasj, men at enkelte rapporteres at det krasjer "ut av esken" hmmm..vet ikke om jeg skal tro på det. For å oppsummere: Du kan overklokke, men det er ikke mye rom for det enkelt og greit.

Personlig tror jeg ikke på det MLC / "poscaps" som er problemet rett og slett. Mer om powerlimit tenker jeg.

Det som hadde vært interessant er om disse også bruker custom bios.

Er det noen som har begynt å få en akkumulert liste over hvor godt de forskjellige kortene klokker på luft  med stock bios? Kortene har jo forskjellig powerlimit. Ventus har forholdsvis lav PL på 320W og throtler på PL på 900-925mV hos meg men har sett tilfeller helt ned i 875mV. Ergo kommer jeg aldri i last særlig ofte opp i 2000+Mhz.

Tipper at kort som kræsjer har en del høyere powerlimit enn Ventus?

Endret av Theo343
Lenke til kommentar
4 minutes ago, BadCat said:

Hva en enkelt bruker opplever er likegyldig så lenge han ikke spiser seg gjennom tusenvis av kort.

 

Det er heller tvilsomt at det er valg av kondensatorer som alene er årsak til problemene. Å kjøre på med kondensatorer er ingen løsning. Det kalles å forsøke å løse problemet ved å gjøre ting på måfå og står til stryk.

Fagfolk har ytret at det ikke er en god løsning å ha kun MLCC da disse har en annen karakteristikk. Det sies også at MLCC er mer utsatt for kondensator-skriking.

Denne lanseringen virker forhastet og prosessvalg og chipdesign virker desperat.

 

Hva var det en sa for noe om MLCC at det var flere hundre tusen typer å velge mellom blant dem? Og at det er viktig å finne den riktige miksen slik at resultatet blir bra.

Endret av G
Lenke til kommentar
Quote

NVIDIA's RTX 30 series launched to a ton of fanfare and jaw-dropping levels of performance claims and specifications - but somewhere between all the hype and third-party reviews, the promised doubling in performance vanished without a trace.

Quote

TFLOPs is, after all, simply a function of shading clocks multiplied by the clock speed. Somewhere, somehow, performance is being lost.

Quote

NVIDIA-FineWine-Flowchart-1030x557.jpg

 

Kan ta opptil 1 år eller lengre å få den etterspurte ytelsen?

Quote

While this situation represents uncharted territory for NVIDIA, we think this is a good problem to have. Just like AMD's introduction of multiple core count CPUs forced game engines to support more than 16 cores, NVIDIAs aggressive approach with core count should force the software side to catch up with scaling as well. So over the next year, I expect RTX 30 owners will get software updates that will drastically increase performance

https://wccftech.com/nvidia-rtx-30-fine-wine-investigating-the-curious-case-of-missing-gaming-performance/

Endret av G
Lenke til kommentar
G skrev (20 minutter siden):

Jeg tipper vi vil få masse ekstra ytelse når vi får nye CPUer som har bedre IPC enn det dagens 5 år gamle arkitektur har. Rocket Lake er vel rett rundt hjørnet, og det ryktes at det er første gangen Intel kommer med IPC-forbedringer siden Skylake fra 2015.

Lenke til kommentar
8 hours ago, Nizzen said:

Og her er ny whql: litt lavere "batch"

NVIDIA GeForce 456.55 WHQL drivers
https://www.guru3d.com/files-details/geforce-456-55-whql-driver-download.html

these drivers provide support provides support for NVIDIA Reflex in the blockbuster titles, Call of Duty: Modern Warfare and Call of Duty: Warzone, as well as offers the best experience in Star Wars: Squadrons. The new Game Ready Driver also improves stability in certain games on RTX 30 Series GPUs.

Endret av Nizzen
  • Liker 2
Lenke til kommentar
2 minutes ago, FuzzyLogic said:

Fikk en promo-kode fra nVidia etter kjøp av RTX 3090. Hvilke spill er det man kan løse inn med denne? Må man ha RTX 3090 installert for å få den løst inn?

Tror neppe jeg kommer til å bruke den, så om noen her har lyst på den så send meg en PM

Er vel Watch Dogs: Legion. 

Lenke til kommentar
MrMarbles skrev (37 minutter siden):

Jeg tipper vi vil få masse ekstra ytelse når vi får nye CPUer som har bedre IPC enn det dagens 5 år gamle arkitektur har. Rocket Lake er vel rett rundt hjørnet, og det ryktes at det er første gangen Intel kommer med IPC-forbedringer siden Skylake fra 2015.

Tviler sterkt.

Grunnen til at Ampere ser vesentlig sterkere ut på høyere oppløsninger er fordi det blir betraktelig flere FP32-utregninger per shader på høyere oppløsninger, og dermed får kortene mer ut av de ekstra FP32-kjernene. På lavere oppløsninger er andre deler av en SM mer begrensende for ytelsen (TMU, minnebåndbredde/latency, tesselering, registre/cache), og dermed ser det ikke ut til å være så mye å hente på f. eks. 1920x1080 fra et 2080 Ti

EDIT: Jeg gjorde denne forutsigelsen før lansering også:

 

Endret av N o r e n g
  • Liker 1
Lenke til kommentar
N o r e n g skrev (12 minutter siden):

Tviler sterkt.

Grunnen til at Ampere ser vesentlig sterkere ut på høyere oppløsninger er fordi det blir betraktelig flere FP32-utregninger per shader på høyere oppløsninger, og dermed får kortene mer ut av de ekstra FP32-kjernene. På lavere oppløsninger er andre deler av en SM mer begrensende for ytelsen (TMU, minnebåndbredde/latency, tesselering, registre/cache), og dermed ser det ikke ut til å være så mye å hente på f. eks. 1920x1080 fra et 2080 Ti

EDIT: Jeg gjorde denne forutsigelsen før lansering også:

 

Vi får vel se ganske snart. Ryzen 4000 og Rocket Lake er begge rett rundt hjørnet.

Lenke til kommentar
8 minutes ago, MrMarbles said:

Vi får vel se ganske snart. Ryzen 4000 og Rocket Lake er begge rett rundt hjørnet.

10900k @ 5,5ghz og 4700c17 1T minne pusher godt 3080 i 1080p og 1440p for å si det slik. Tror ikkje noen av de nye cpuene kommer til å pushe særlig mere fps ut av 3080 eller 3090. Dog jeg vil gjerne ta feil :D

Jeg tror ikkje, fordi jeg ER Nizzen :p

Endret av Nizzen
Lenke til kommentar

Opprett en konto eller logg inn for å kommentere

Du må være et medlem for å kunne skrive en kommentar

Opprett konto

Det er enkelt å melde seg inn for å starte en ny konto!

Start en konto

Logg inn

Har du allerede en konto? Logg inn her.

Logg inn nå
×
×
  • Opprett ny...