Hot Topic (More than 10 Replies) Fault tolerance and replication. (Read 1414 times)
Jamesman
Member
*
Offline


No personal text

Posts: 8
Joined: Aug 15th, 2005
Fault tolerance and replication.
Jan 24th, 2006 at 10:06pm
Print Post Print Post  
Hello,

I'm in the process of designing a mirror site for our hub and spoke network.  Basically, I need to mirror our Sesame database to a different location in the event that our "hub" site goes down.  We envision having minimal downtime with a database that's fairly up to date.  My challenge isn't finding software that can perform this, but making sure that whatever ends on the other side is usable to Sesame.

Scenario:  I have 130 users working in the same Sesame database in a central location.  The site hosting Sesame looses power and all connections are lost.  While this was all happening, this database was replicating (byte-level) in real time to our mirror location.  After the smoke clears, I access our mirror server to start the replicated version of the database.  I receive errors saying that the database cannot be opened for one reason or another.

I have tested the scenario above and have received mixed results.  I've seen solutions with SQL databases where scripts were making live backups of the database every half hour while a replication product was making copies (non-byte level) of the backup files to the mirror site/server.  This conquered the problem of needing expensive software to replicate locked records.  Has there been any thought to this with Sesame?  Any ideas or tools I can use?  I need a solution to this because all of our users are spread out around the country and come in through terminal servers.  We have a fail over farm at another location.

Thanks.  Smiley
  
Back to top
 
IP Logged
 
The Cow
YaBB Administrator
*****
Offline



Posts: 2530
Joined: Nov 22nd, 2002
Re: Fault tolerance and replication.
Reply #1 - Jan 25th, 2006 at 1:15am
Print Post Print Post  
1. How many simultaneous users do you expect?
2. How many simultaneous "writers" (i.e.: users making changes, adding, and deleting records - as opposed to searching and examining results)
3. Which OSs are we dealing with?
4. Is there any allowable downtime?

Unlike a web server which uses temporary connections, Sesame requires a constant connection between the client and the server. That may well change in future versions. But in the meantine, you will also have to consider some kind of connection management, so when one server goes offline, the other can pick up the connections and attempt to restore context.

A Sesame application fileset can be copied out from under a Sesame server, and typically it is okay. It will be "locked", but in most cases a sunlock operation before redeployment will take care of that. The exception to this is during very lengthy write operations to the server wherein a lot of records are being changed in sequence or in parallel.

To avoid this your best recourse would be to suspend (do not kill) the Sesame server process. Sesame has only one process with multiple threads. When the main process is suspended the threads will suspend along with it. Unix has commands for doing this. I do not know whether any flavor of Windows will.

After the process is suspended the clients connected to the server will remain connected and context will be sustained. Timing is important. There are tools on both Windows and Unix to determine if Sesame (any process) has the files open. On Unix these can be automated using shell scripting. I have not seen any tools on Windows that can be automated (they have a GUI interface and do not return values to scripts).

Sesame only has the file open when actually writing. Even with many simultaneous writing users, the windows of opportunity should be very frequent and (by computer's perspective of time) lengthy.

On Unix, It may well be doable to create a shell script kicked off with cron at some interval that waits for a moment when Sesame does not have the files open, and then suspends Sesame, copies the files (use something like rsynch), re-enables Sesame - all within a few moments.

If there is sufficient demand, I can add a suspend command to the administration interface in Sesame 2.0.
  

Mark Lasersohn&&Programmer&&Lantica Software, LLC
Back to top
IP Logged
 
Jamesman
Member
*
Offline


No personal text

Posts: 8
Joined: Aug 15th, 2005
Re: Fault tolerance and replication.
Reply #2 - Jan 25th, 2006 at 6:18pm
Print Post Print Post  
1. When we go live, it will be around 150.
2. Almost all will be making changes.  Only a few will be restricted to running reports, examine, ect..
3. The primary live backend is Red Hat ES 4.0 while the mirror is located offsite running Windows 2000 server.
4. Yes, we can be pretty flexible with this.

I'm running a demo of a product called Filereplicationpro http://www.filereplicationpro.com/ .  It will run on any OS that supports Java 1.4.1 which makes it perfect for replicating between Linux and Windows.  My first tests have revealed that it works pretty well on its own.  I currently have it configured to do byte level replication as changes are detected on the primary server.  I simulated a disaster by dropping the network connection on the primary Red Hat server.  I connected to the mirrored server at the other location and started the Sesame server using the replicated database.  I was able to pull up new records I created just before I pulled the plug.  However, during a few tests, I received an error indicating that the database could not be opened.  After trying to unlock, the database would load to about 99% and then give me a similar error.  Looking at the logs, I would then see info about records missing.  I was never able to open it.  My goal is make sure this never happens in a real disaster situation.

I'm not the developer of the application but a systems engineer trying to implement it.  Please forgive me for my lack of Sesame lingo.  Your saying that its possible for the Sesame file set to be copied during live operations?  What exactly is the "Sunlock" process, or did you mean unlock process?  Smiley Is this the administrative tool that gives you the locked or unlocked status and allows you to unlock databases?

Putting the Sesame process in suspend mode sounds interesting.  With my current user load, (which will be growing)  how fast must I perform this so that users don't see or have any problems?  Is this even possible to do this with so many people writing to the database at once?  What will happen while someone is trying to save a record during this process?

I'll begin looking into this as it seems to be the best solution.  Thanks for all your help on this.  Test.. test.. test..   Grin

  
Back to top
 
IP Logged
 
The Cow
YaBB Administrator
*****
Offline



Posts: 2530
Joined: Nov 22nd, 2002
Re: Fault tolerance and replication.
Reply #3 - Jan 25th, 2006 at 6:37pm
Print Post Print Post  
Quote:
'm not the developer of the application but a systems engineer trying to implement it.  Please forgive me for my lack of Sesame lingo.  Your saying that its possible for the Sesame file set to be copied during live operations?  What exactly is the "Sunlock" process, or did you mean unlock process?   Is this the administrative tool that gives you the locked or unlocked status and allows you to unlock databases?

Because Sesame does not actually keep files open unless it is specifically writing to them, it does set a flag in the file indicating that that file is in use. Sunlock is a program that can clear that flag.
Quote:

Putting the Sesame process in suspend mode sounds interesting.  With my current user load, (which will be growing)  how fast must I perform this so that users don't see or have any problems?

Depends entirely on what they are doing. If they are in the middle of a lengthy non-interactive operation, you can probably go quite a while before that user notices. If another user is stepping through records, they will probably notice any slow down or stutter.
Quote:

Is this even possible to do this with so many people writing to the database at once?  What will happen while someone is trying to save a record during this process?


As I mentioned, you will need to make sure that the file is not open by Sesame server before you attempt suspending the process.
  

Mark Lasersohn&&Programmer&&Lantica Software, LLC
Back to top
IP Logged
 
The Cow
YaBB Administrator
*****
Offline



Posts: 2530
Joined: Nov 22nd, 2002
Re: Fault tolerance and replication.
Reply #4 - Jan 25th, 2006 at 6:45pm
Print Post Print Post  
BTW: the command in Unix to check what processes are actively using a particular file is "fuser". Or you can use "lsof".
« Last Edit: Jan 25th, 2006 at 8:19pm by The Cow »  

Mark Lasersohn&&Programmer&&Lantica Software, LLC
Back to top
IP Logged
 
The Cow
YaBB Administrator
*****
Offline



Posts: 2530
Joined: Nov 22nd, 2002
Re: Fault tolerance and replication.
Reply #5 - Jan 25th, 2006 at 9:01pm
Print Post Print Post  
Twenty some years on Unix, but I'm still not much for shell scripting. The general logic of the script would be:

Code
Select All
Get the PID for Sesame with "pidof"
while not done and have not tried enough
  check if Sesame has files open (use lsof and grep, or fuser)
  if the files are not open
    suspend Sesame with "kill -s SIGSTOP sesame_pid"
    copy the .db and .dat
    re-enable Sesame with "kill -s SIGCONT sesame_pid"
    set done to true
  end if
end while
 


  

Mark Lasersohn&&Programmer&&Lantica Software, LLC
Back to top
IP Logged
 
The Cow
YaBB Administrator
*****
Offline



Posts: 2530
Joined: Nov 22nd, 2002
Re: Fault tolerance and replication.
Reply #6 - Jan 25th, 2006 at 9:41pm
Print Post Print Post  
I said:
Quote:
If there is sufficient demand, I can add a suspend command to the administration interface in Sesame 2.0.


Turns out this has already been added to Sesame 2.0.
  

Mark Lasersohn&&Programmer&&Lantica Software, LLC
Back to top
IP Logged
 
Jamesman
Member
*
Offline


No personal text

Posts: 8
Joined: Aug 15th, 2005
Re: Fault tolerance and replication.
Reply #7 - Jan 25th, 2006 at 10:55pm
Print Post Print Post  
Quote:
Twenty some years on Unix, but I'm still not much for shell scripting. The general logic of the script would be:

Code
Select All
Get the PID for Sesame with "pidof"
while not done and have not tried enough
  check if Sesame has files open (use lsof and grep, or fuser)
  if the files are not open
    suspend Sesame with "kill -s SIGSTOP sesame_pid"
    copy the .db and .dat
    re-enable Sesame with "kill -s SIGCONT sesame_pid"
    set done to true
  end if
end while
 




Thank you very much.  I'm working on a script applying the logic above.  When I have it finished and working I'll post it.  It's good to hear that suspension will be in version 2.0.  Cheesy



  
Back to top
 
IP Logged
 
Jamesman
Member
*
Offline


No personal text

Posts: 8
Joined: Aug 15th, 2005
Re: Fault tolerance and replication.
Reply #8 - Feb 7th, 2006 at 9:32pm
Print Post Print Post  
Ok,

This seems to work pretty well.  If any Unix geeks out there have a better way please let me know.  I'm not the greatest when it comes to scripting.  Wink

Code
Select All
pid=$(pidof sesame)
dbpath=/usr/local/sesame/Data/IL-SYS/database.db
dbpath2=/usr/local/sesame/Data/IL-SYS/database.dat

while true; do
    if ! /usr/sbin/lsof -p "$pid" -Fn | grep -q "$dbpath"; then
	  kill -STOP "$pid"
	  rsync -q "$dbpath" /usr/local/sesame/replicated/
	  rsync -q "$dbpath2" /usr/local/sesame/replicated/
	  kill -CONT "$pid"
	  break
    fi
    sleep 5
done
 




Thanks,
  
Back to top
 
IP Logged
 
The Cow
YaBB Administrator
*****
Offline



Posts: 2530
Joined: Nov 22nd, 2002
Re: Fault tolerance and replication.
Reply #9 - Feb 7th, 2006 at 9:37pm
Print Post Print Post  
Looks good to me (also no expert on shell scripting). Is every 5 seconds a bit of overkill?
  

Mark Lasersohn&&Programmer&&Lantica Software, LLC
Back to top
IP Logged
 
Jamesman
Member
*
Offline


No personal text

Posts: 8
Joined: Aug 15th, 2005
Re: Fault tolerance and replication.
Reply #10 - Feb 7th, 2006 at 9:46pm
Print Post Print Post  
It probably is.  It should be more like 15 or so.  I was pushing it a little..   Smiley
  
Back to top
 
IP Logged