Member Avatar for diafol

Hi all, I've been developing a GoogleCal app for a month or so in my spare time. I've ended up with a shed full of session variables, just so I can keep all the data alive and create an import file for GoogleCal at the end of the process. I purposely didn't use a DB as I thought I could get it all done with session vars. I had no idea that the 'simple' app would bloat and become rather unwieldy wrt storage. Although 'DB-ing' the app would streamline it, I'm unsure about performance. Currently, the session data is stored as a multidimensional (with a possible 1000 pieces of data - mainly integers and small strings).

Would the strain on the webserver (using sessions) affect performance to a greater extent than making greater demands on the db server? I would welcome any insights wrt this. Thanks.

YES! 1000 variable can affect the web server a lot!

It really depends on what they are being used for and when they are being used. If you use the correct MySQL (or other database) query, then you could get the variable you want without filling the entire server with variables. If the variables are being used all of the time then either way, you still have many variables to use.

Hope this helps
Kieran :)

Member Avatar for diafol

Yeah, cheers Kieran, it's along the lines of my fears. I think I'll 'DB-ify' the app. Keeping track of the multidimensionals is giving me a nosebleed. God knows what I'd do if I had to tinker with it a few months down the line.

Just another question, while I'm at it. I don't want to make a login, so I assume I'm going to have to use the session id as a foreign key for the related data my tables. Can anybody see a problem with this?

The idea is this:

User starts the process - create the sid
Add data via part1 form - add sid to DB and then relevant part1 data to table(s).
Add data via part2 form (etc) - add more data
[data can be edited or deleted at various points]
Finalize the data (last part of form) - create an iCal text file for import to GoogleCal.
User downloads file and session-related data deleted from DB and session unset.

Again any insights/pitfalls would be gratefully received.

What data type of data are you collecting in these forms?

Member Avatar for diafol

Integers mainly:

part1: 5 integers, 1 string (timetable periods for school)
part2: 4 integers (start hour, start min, end hour end min) for each period (max periods = 60) so max 240 integers + max 20 unix timestamps (start/end teaching blocks) = max 260 integers
part3: no max for class data (2 small strings (label, location) and 1 long (description). Say average 15 classes -> 30 small strings, 15 long.
part4: combine classes, weekcycles and teaching blocks -> max 600 periods (for 10 teaching blocks), each with a start hour, start min, end hour, end min and class id
[max 3000 integers]

Although it seems the above could be simplified, school timetables are a pain as different days can have different start times, etc, so each period needs its own time data.

There are a few other vars, but these are trivial.

Here's an idea of the period hierarchy:

no. periods per day [max 10]
no. days per week [max 6]
no. weeks in a cycle (e.g. fortnightly timetable) [max 4]
no. teaching blocks (e.g. half-terms or semesters) [max 10]

The only related data so far is the period.class_id / class.class_id

I've started DB-ifying a clone of the site - seems must neater wrt keeping tabs on the period data. However, I'm still wondering about the foreign key - would session_id be appropriate or a hash or the session or a normal increment id (linked to a session id in the sessions table)? There is no login - I want to avoid that, so session security isn't a massive issue at the moment.

Although I've been tinkering with php/mysql for a while, this is beyond the amount of data I'm used to dealing with 'in one go' and I'm struggling to find the appropriate methodology.

I don't think you will have any problems with that amount of data.

I would use database based sessions. Pretty much the session array would be serialized and put into the database with session id as the primary key.

Sessions are inherently insecure, so there is some risk, but you can alleviate some of that by regenerating the session id each page load (session_regenerate_id()). Also, storing sessions in the database, make them a little more secure anyway.

personally, I would go with a database. Session data that large, even though it shouldn't hinder the server since it is just a flat file, does hinder the users experience if it gets too large because the server has to read the entire session file every time the page is loaded and the server has to overwrite the entire session file for every miniscule adjustment, where as with a database you can surgically adjust and get data only when necessary.

Same rule also applies to database stored sessions, I wouldn't do it mainly because the user's overhead required to get and overwrite even just one single character is a monster.

Member Avatar for diafol

Thanks kk/robb, that's cleared up quite a few of my 'blank spots'. However, I'm a little confused wrt storing session data. I'm aware that you can store all the data wrt a session in a serialized field, and I have been looking at that. However, would approx 1000 vars/values in one field pose a negative impact wrt performance if I just wanted to change one or two of those? I'm pretty much looking at 2 models :

1. the serialized session model (just one table, session_id as PK, session_data as session data field, last_impression for garbage cleanup)

2. a 'normal' related model, session_id (PK)/last_impression and then about 3 tables using session_id as FK. This will require maybe a 1000 records to be added at a time and then deleted at one go. I can't see the problem with a couple of users doing this, but for say 20 concurrent users, would this pose a problem for a vanilla-flavoured shared-hosting site?

Sorry if I sound obtuse, and the answer may very well have been given already. I just want to be sure. I don't want to start a second data storage re-write!

Thanks.

This is the trade-off about this methodology. Either you can take the time to serialize the data at each pass or add, update, and remove a lot of records constantly.

Honestly, I am not sure which would be more efficient. I know from experience that a mysql database can handle a ton of data. I recently created a database with 1.4 million records and it has no problem reading this data, but constantly updating and deleting it, I don't know how it would perform.

Maybe if you have the time, create both and run some bench marks. I am kind of curious as to what is faster.

After thinking about it, I would go with the database, non-serialize solution. If you only need to update 1 or 2 fields out of 1000 than an sql query will be better than using serialize() and unserialize() (which are both pretty slow) to load the entire array into memory.

commented: Makes sense. +7
Member Avatar for diafol

Thanks kk, I really value your opinion/honesty. Frankly, I'd be more comfortable with the usual related model - it's what I'm used to. The serialize option would probably mean that I keep my current session structure, but simply move it to a DB. The structure itself is cause for a nosebleed! I reckon a compound PK (session_id/block/cycle/day/period) should suffice in my 'periods' table with an fk class_id. It's all taking shape now, at last.

I'll leave this open a little while longer before marking it solved, but I think I'm there. Thanks again.

Member Avatar for diafol

OK done - thanks all.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.