OVH Community, your new community space.

SLA/noSLA 192 195 196 199


bjo
23.04.08, 18:02
Das frag ich mich auch. Am iowait? An der Anzahl der Verbindungen zum SAN? Laut pendulum gibt es ja eh nur eine Verbindung.

Edit:
top - 19:54:22 up 1 day, 9:34, 3 users, load average: 3.15, 2.36, 1.56
Tasks: 129 total, 2 running, 127 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.3%sy, 0.0%ni, 0.0%id, 99.3%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 474872k total, 466108k used, 8764k free, 26744k buffers
Swap: 0k total, 0k used, 0k free, 40792k cached

whyte
23.04.08, 18:00
Wie erkennt man denn, ob man im SLA oder noSLA Modus ist ?

edit: Im Manager steht "Typ der Bandbreite: SLA" - aber ich vermute, das betrifft die Bandbreite, nicht die Verbindung zum iSCSI ????

bjo
23.04.08, 15:29
I have also just one connection to 198. Maybe my server went into noSLA after backup-manager was stupid (or I because I thougt it would recognize symlinks) and ran into a deadlock after working recursive and creating a 2GB file.

MDGeist
15.04.08, 18:18
Zitat Zitat von pendulum
Seems that this is the case. If I stress the hdd for ~10 minutes, the RPS will run into a deadlock. All this with a single iscsi connection (there is never more than one, what is Oles talking about?). They probably think that this method works because the RPSs with a higher load will lock up but are not reported as down in the so called monitoring system.
That method is wrong.
These words look like :

http://www.abload.de/img/deadblockry9.jpg

this on monitoring...

Since rps is still pingable, but not responsible in ssh it doesnt fall under monitoring unless i manually hard reboot...

Far from good imho...


Date 2008-04-15 15:38:30, geg made Extensive diagnosis: abuse
Abuse with <18mbits ... cool :|
guess (r)torrent hates iscsi ?!

Edit :
Seems like my password was changed -AGAIN- ... so not only was machine down for 5 hours, i cant use it again...

Great feature, this 100ms rps ban

strex
15.04.08, 18:17
Thats bad OVH.

When you can´t create RPS with good performance and your iSCSI is slow, than forgot the RPS. RPS ist to slow to work on the shell. That isn´t our blame.

I think OVH should concentrate to the root server.

pendulum
15.04.08, 17:33
Seems that this is the case. If I stress the hdd for ~10 minutes, the RPS will run into a deadlock. All this with a single iscsi connection (there is never more than one, what is Oles talking about?). They probably think that this method works because the RPSs with a higher load will lock up but are not reported as down in the so called monitoring system.
That method is wrong.

MDGeist
15.04.08, 15:44
It almost looks as if this lag method kills RPS ... Or im just totally unlucky again with 2 4h+ downtimes already...

piespy
15.04.08, 00:08
Zitat Zitat von MDGeist
Also, about many connections : What about lets say a database ?? Small connections, but lots of them => throttled as well ?
You have to distinguish connections to the server and the DB, and connections to the iSCSI. Normally you would only have a single iSCSI connection, opened by the kernel/iSCSI client. Someone who has multiple connections open to the iSCSI device is definitely doing something fishy.

Even if you open many DB connections or downloads, the kernel has to serialize it all into one iSCSI connection, just like you can only transfer one thing to or from a normal HD at any time. Using read/write cache it may look like many things are being read or written at once, but in reality it's always one after another.

So I don't even know why one would have multiple connections to the iSCSI if they weren't intentionally trying to break it, unless it's a failure mode of the iSCSI client--maybe if the device doesn't respond it tries a new connection over and over again which destabilizes it even more. But that's just a guess. A "netstat -pan|grep iscsi" should show just one connection in normal operation.

MDGeist
14.04.08, 20:23
edit :

Not sure if this is because of noSLA, but my rps died twice today already...

Not responding to any input, but still pingable...

What I dont understand is :
what do you consider huge usage of the SAN ??

I did a single download at 7mb/s and it crashed the SAN... So no way to use 100mbit ?

Also, about many connections : What about lets say a database ?? Small connections, but lots of them => throttled as well ?

And 100ms extra wait time for DB responses could very fast lead to another deadbombed rps...

oles@ovh.net
14.04.08, 20:20
SLA/noSLA: 192/195 195/199

Guten Tag,

wir haben gerade die Einrichtung von SLA und NoSLA auf den iSCSI Servern der RPS beendet. Wir haben einige Kunden mit einer sehr grossen Festplattennutzung in den NoSLA Modus umgestellt. Dies kam zum Beispiel im Fall von Dateitauschservern vor die viele gleichzeitige Verbindungen zum iSCSI Server machten.

Ab jetzt ist es so dass wenn ein Kunde die Platten sehr intensiv nutzt er auf NoSLA umgestellt wird (derzeit erfolgt die Umstellung noch von Hand).

Was bedeuten SLA und NoSLA?
---------------------------
Im NoSLA Modus setzen wir Verzögerungen auf die Antworten des iSCSI Servers. Anstatt sofort zu antworten fügen wir 100 ms Lag hinzu. Das ist alles. Im SLA Modus gibt es keine Verzögerung.

Wir haben die Option auf den Netzen 192/195 und 196/199 hinzugefügt. Die Funktionalität der RPS, SLA und NoSLA ist (meiner Meinung nach) gar nicht schlecht. Aber wie immer hätte ich dazu gerne euer Feedback.

Mit freundlichen Grüssen

Octave