Help Parsing PLZ

Question

Philmagrill 0 Newbie Poster

16 Years Ago

I am creating a program in c that finds out the weather and changes the desktop background accordingly. So far i have managed to connect to the internet and DL the bbc rss feed to the file, which i can then open up. I have almost managed to get it to edit the registry to change the desktop background.

Atm i am trying to parse the file to extract the word sunny form an XML line

int main(void)
{
char word[128];
char *p, prefix;
int i;
char string[] = "<title>Saturday: sunny, Max Temp: 16&#xB0;C (61&#xB0;F), Min Temp: 4&#xB0;C (39&#xB0;F)</title>";

prefix = ',';

p = strchr(string, prefix);

if(p == NULL) {
printf("No %c found.\n", prefix);
}
else {
i = p-string;
printf("Found weather at position: %d\n", prefix, i+1);
strncpy(word, &string[0], i);
word[i] = '\0';
printf("Weather is: [%s]\n", word);
}
return 0;

}

I have managed to get it to extract everything before the comma

eg Weather is: <title>Saturday: sunny

How do i get it to parse the ":" and just leave the word sunny

(also i have just manually written in the string, how would i get it to find that line in a big XML file, would i have to parse the whole file? or can i somehow select that line number or something?)

Thank you

Phil

c xml

5 Contributors
18 Replies
109 Views
2 Days Discussion Span
Latest Post 16 Years Ago Latest Post by jephthah

All 18 Replies

Aia 1,977 Nearly a Posting Maven

16 Years Ago

>sunny╠╠╠╠╠╠╠
>do you know why this is?
Most likely because no null terminator '\0' is finishing the string.

jephthah 1,888 Posting Maven

16 Years Ago

never mind.

Aia 1,977 Nearly a Posting Maven

16 Years Ago

Thanks guys, its working. ur gods of the internets.
Phil

GREAT eBAYER»-(¯`v´¯)-»YOU ‹(•¿•)› POSITIVE w/ 5*-:¦:-THANX-COMEBACK.A++++++++++

[Edit] Oops! Wrong internet.

jephthah 1,888 Posting Maven

16 Years Ago

so you're hardcoding a solution that relies on some unaffiliated third-party website maintaining precisely the same formatting and whitespace of their webpages.

yeah....... i'm gonna have to go ahead and say that's a Bad Idea.

EDIT: no wait, on second thought, I'm gonna go ahead and say that's a F--KING STUPID IDEA.

nucleon quit coding shiite for this poor noob to shoot himself in the face with.

OP: do NOT attempt to parse a website based on the number of NEWLINES the source code contains.

DO parse a website based on the keyword or tag that marks the precise location where the text that you are searching for is located.

in other words: use fgets() to read each line, then for each line use strstr() to search for the location of the keyword or tag that marks what youre looking for.

furthermore, you dont even know if there's going to be a newline between the tag and the value you want, so you may actually have to look on the subsequent line(s)

if you dont know how to use strstr(), look it up, its in the <string.h> library.

.

nucleon 114 Posting Pro in Training

16 Years Ago

> yeah, that was really abrasive

Nah, not really. I just had to get used to your style. The fact is that my avatar should be a foot in a mouth.

jephthah commented: mine too. without a doubt. +6

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

nucleon 114 Posting Pro in Training · Answer 1 · 2009-04-18T22:00:10+00:00

OP> How do i get it to parse the ":" and just leave the word sunny

/* This assumes there are no other colons or commas
       before the ones you want. */

    char word[MAX_WORD];
    char *pColon, *pComma;

    pColon = strchr (string, ':');
    if (!pColon) { /* No colon found; handle error. */ }

    pComma = strchr (string, ',');
    if (!pComma) { /* No comma found; handle error. */ }

    if (pComma < pColon) { /* Comma before colon; handle error */ }

    /* Move pColon past the colon. You may also wish to move
        pColon forward past any spaces here. */
    ++pColon;

    strncpy (word, pColon, pComma - pColon);

OP> how would i get it to find that line in a big XML file

If it is always the same line number, just eat lines up to there, then read the line you want. If it's not always on the same line number you will have to search through the text for an identifying pattern, either reading line-by-line or reading the entire text file into memory first. In this situation, with an XML or HTML file, I usually read the entire text into memory, then search that as one big string. Regular expressions are helpful here, but not necessary.

Philmagrill 0 Newbie Poster · Answer 2 · 2009-04-18T22:58:28+00:00

Hi thank you for the help, it worked, didnt know you could minus them. However it has produced a load of random characters after the word sunny.

eg [ sunny╠╠╠╠╠╠╠]

do you know why this is?

Thank you

Phil

nucleon 114 Posting Pro in Training · Answer 3 · 2009-04-19T00:43:46+00:00

Yes, I forgot the following after strncpy: word[pComma - pColon] = 0;

Philmagrill 0 Newbie Poster · Answer 4 · 2009-04-19T20:35:27+00:00

Thanks guys, its working. ur gods of the internets.

Phil

Philmagrill 0 Newbie Poster · Answer 5 · 2009-04-19T22:52:42+00:00

Hey guys

Ive managed to load the whole file into the memory buffer, the line i want the program to read is always on the same number line, so is ther a c function that can be used to read just line number 35 of the XML, which i can then load into a string?

Thank you

phil

nucleon 114 Posting Pro in Training · Answer 6 · 2009-04-20T02:40:48+00:00

You can use this as char* pLine = SkipNChars (xmlText, 34, '\n'); to skip the first 34 lines in a string buffer.

char* SkipNChars (char* str, size_t n, char c) {
    while (n--)
        do {
             if (!*str) return str; /* Returns end of string */
        } while (*str++ != c);
    return str; /* Returns one past nth char c */
}

Philmagrill 0 Newbie Poster · Answer 7 · 2009-04-20T05:00:54+00:00

Whats SkipNChars? cant seem to find any info on it. Do i need a certain library for it?

Thank You
Phil

nucleon 114 Posting Pro in Training · Answer 8 · 2009-04-20T06:51:39+00:00

> Whats SkipNChars?

That's it right there. No guarantees. I'm unsure about the design decision to return the str pointer (pointing to the null character) when the end of string is reached; it could return a NULL ptr instead. It should also be renamed; maybe SkipNDelims.

But forget about that one. Here's a simplified version, just for newlines.

char* SkipNewlines (char* str, size_t n) {
    while (n--)
        do {
             if (!*str) return str; /* Returns pointer to '\0' */
        } while (*str++ != '\n');
    return str; /* Returns one past nth '\n' */
}

Use it like this:

char buffer[] = "here is your text\nline after line\netc....\n";
/* Init a char pointer. */
char *pLine = buffer;
/* Skip the first 34 lines. */
pLine = SkipNewlines (pLine, 34);
/* Now pLine points to beginning of 35th line;
    or the end of the string (the '\0')
    if there were less than 35 lines. */

nucleon 114 Posting Pro in Training · Answer 9 · 2009-04-20T22:10:04+00:00

jephthah is correct, in his abrasive way. :) As he himself might say, my "solution" (actually my code for your solution) is both half-assed and of the git-er-done variety.

It is better to search for some identifying string in whatever line it happens to be in, and strstr() is useful in that regard.

jephthah 1,888 Posting Maven · Answer 10 · 2009-04-20T22:25:11+00:00

yeah, that was really abrasive. sorry. :( i shouldn't take my issues out on people here. you were, after all, just giving him what he asked.

the problem is that the OP's approach itself is flawed. in a particularly bad way because it will work -- for a few weeks or a few months -- then stop working as soon as someone on the other end updates their webpage.

.

Philmagrill 0 Newbie Poster · Answer 11 · 2009-04-21T03:58:32+00:00

lol, nucleon gave me exactly what i did ask for, had already thought about the possibility of the xml changing, thou i thought that because the rss feed is from the bbc they prob wouldn change it, plus its for a uni project so only need to work for a couple of weeks. Think i will and have a go at both ways. will give me something to write about in the report. cheers guys

Thanks
Phil

Dave Sinkula 2,398 long time no c Team Colleague · Answer 12 · 2009-04-21T04:13:12+00:00

Sometimes I choose sscanf . Ballpark?

#include <stdio.h>
#include <ctype.h>

void parse_title(const char *text)
{
   char word[128];

   /* skip leading whitespace, if any */
   while ( isspace(*text) )
   {
      ++text;
   }

   /* look for formatted input */
   if ( sscanf(text, "<title>%*[^:]: %127[^,]", word) == 1 )
   {
      printf("word = \"%s\"\n", word);
   }
}

int main(void)
{
   const char filename[] = "file.xml";
   FILE *file = fopen(filename, "r");
   if ( file )
   {
      char line[256];
      while ( fgets(line, sizeof line, file) )
      {
         parse_title(line);
      }
      fclose(file);
   }
   else
   {
      perror(filename);
   }
   return 0;
}

/* file.xml
<results>
  <item>
    <title>Saturday: sunny, Max Temp: 16&#xB0;C (61&#xB0;F), Min Temp: 4&#xB0;C (39&#xB0;F)</title>
    <link>http://www.google.com</link>
    <description>Just another line of text.</description>
  </item>
  <item>
    <title>Saturday: partly cloudy, Max Temp: 16&#xB0;C (61&#xB0;F), Min Temp: 4&#xB0;C (39&#xB0;F)</title>
    <link>http://www.google.com</link>
    <description>Just another line of text.</description>
  </item>
</results>
*/

/* my output
word = "sunny"
word = "partly cloudy"
*/

jephthah 1,888 Posting Maven · Answer 13 · 2009-04-21T04:16:55+00:00

lol, nucleon gave me exactly what i did ask for, had already thought about the possibility of the xml changing, thou i thought that because the rss feed is from the bbc they prob wouldn change it, plus its for a uni project so only need to work for a couple of weeks.

look, its not a "style point". you have zero control over the website format or RSS feed or whatever third party thing you're collecting. it would be a critical flaw to rely on "newlines"

the fact that this is a "uni project" is irrelevant. why would you do it the wrong way when you can do it the right way?

EDIT:

oh, look. aren't you lucky. Dave S. just handed you a giftwrapped solution

Help Parsing PLZ

Recommended Answers Collapse Answers

All 18 Replies

Recommended Answers