The C language has one of the most powerful string handling
capabilities of any general purpose computer language.
A string is a single dimension array of characters terminated
by a zero byte.
Strings may be initialised in two ways. Either in the source
code where they may be assigned a constant value, as in;
int main()
{
char *p = "System 5";
char name[] = "Test Program" ;
}
or at run time by the function strcpy() that has the function
prototype;
char *strcpy(char *destination, char *source);
strcpy() copies the string pointed to by source into the
location pointed to by destination as in the following example;
#include<stdio.h>
int main()
{
char name[50];
strcpy(name,"Servile Software");
printf ("\nName equals %s",name);
}
C also allows direct access to each individual byte of the
string, so the following is quite permissible;
#include<stdio.h>
int main()
{
char name[50];
strcpy(name,"Servile Software");
printf
("\nName equals %s",name);
/* Replace first byte with lower case 's' */
name[0] = 's';
printf ("\nName equals %s",name);
}
The ANSI standard on the C programming language defines the
following functions for use with strings;
char *strcat(char *dest, char *source) Appends string source
to the end of string destination.
char *strchr(char *s, int c) Returns a pointer to the first occurence of character 'c' within s.
int strcmp(char *s1, char *s2) Compares strings s1 and s2 returning < 0 if s1 is less than s2
== 0 if s1 and s2 are the same
> 0 if s1 is greater than s2
int strcoll(char *s1, char *s2) Compares strings s1 and s2 according to the collating sequence set by
setlocale() returning < 0 if s1 is less than s2
== 0 if s1 and s2 are the same
> 0 if s1 is greater than s2
char *strcpy(char *dest, char *src) Copies string src into
string dest.
unsigned strcspn(char *s1, char *s2) Returns the length of
string s1 that consists entirely of characters not in string s2.
unsigned strlen(char *s) Returns the length of string s.
char *strncat(char *dest, char *src, unsigned len) Copies at
most 'len' characters from string src into string dest.
int strncmp(char *s1, char *s2, unsigned len) Compares at most 'len' characters from
string s1 with string s2 returning < 0 if s1 is less than s2
== 0 if s1 and s2 are the same
> 0 if s1 is greater than s2
char *strncpy(char *dest, char *src, unsigned len) Copies 'len' characters from string src into string dest, truncating or
padding with zero bytes as required.
char *strpbrk(char *s1, char *s2) Returns a pointer to the first character in string s1 that occurs in
string s2.
char *strrchr(char *s, int c) Returns a pointer to the last
occurence of 'c' within string s.
unsigned strspn(char *s1, char *s2) Returns the length of the initial segment of string s1 that consists
entirely of characters in string s2.
char *strstr(char *s1, char *s2) Returns a pointer to the first occurence of string s2 within string
s1, or NULL if string s2 is not found in string s1.
char *strtok(char *s1, char *s2) Returns a pointer to the token found in string s1 that is defined by
delimiters in string s2. Returns NULLif no tokens are found.
The ANSI standard also defines various functions for
converting strings into numbers and numbers into strings.
Some C compilers include functions to convert strings to upper and lower case, but these functions are not defined in the ANSI standard. However, the ANSI standard does define the functions; toupper() and tolower() that return an
integer parameter converted to upper and lowercase
respectively. By using these functions we can create our own ANSI compatible
versions;
#include<stdio.h>
void strupr(char *source)
{
char *p;
p = source;
while(*p)
{
*p = toupper(*p);
p++;
}
}
void strlwr(char *source)
{
char *p;
p = source;
while(*p)
{
*p = tolower(*p);
p++;
}
}
int main()
{
char name[50];
strcpy(name,"Servile Software");
printf
("\nName equals %s",name);
strupr(name);
printf
("\nName equals %s",name);
strlwr(name);
printf ("\nName equals %s",name);
}
C does not impose a maximum length that a string may be, unlike other computer languages. However, some CPUs impose restrictions on the maximum size a block of memory can be. For example, the 8088 family of CPUs, as used by the IBM PC, impose a limit of 64K bytes on a segment of memory.
An example program to reverse all the characters in a string.
#include <stdio.h>
#include <string.h>
char *strrev(char *s)
{
/* Reverses the order of all characters in a string except the null */
/* terminating byte */
char *start;
char *end;
char tmp;
/* Set pointer 'end' to last character in string */
end = s + strlen(s) - 1;
/* Preserve pointer to start of string */
start = s;
/* Swop characters */
while(end >= s)
{
tmp = *end;
*end = *s;
*s = tmp;
end--;
s++;
}
return(start);
}
main()
{
char text[100];
char *p;
strcpy(text,"This is a string of data");
p = strrev(text);
printf ("\n%s",p);
}
The function strtok() is a very powerful standard C feature for extracting substrings from within a single string. It is used where the substrings are separated by known delimiters, such as commas in the following example;
#include <stdio.h>
#include <string.h>
main()
{
char data[50];
char *p;
strcpy(data,"RED,ORANGE,YELLOW,GREEN,BLUE,INDIGO,VIOLET");
p = strtok(data,",");
while(p)
{
puts(p);
p = strtok(NULL,",");
};
}
Or this program can be written with a for() loop thus;
#include <stdio.h>
#include <string.h>
main()
{
char data[50];
char *p;
strcpy(data,"RED,ORANGE,YELLOW,GREEN,BLUE,INDIGO,VIOLET");
for(strtok(data,","); p; p = strtok(NULL,","))
{
puts(p);
};
}
They both compile to the same code but follow different
programming styles.
Initially, you call strtok() with the name of the string
variable to be parsed, and a second string that contains the known delimiters.
Strtok() then returns a pointer to the start of the first substring and replaces
the first token with a zero delimiter. Subsequent calls to strtok() can be made
in a loop passing NULL as the string to be parsed, and strtok() will return the
subsequent substrings.
Since strtok() can accept many delimiter characters in the second parameter string we can use it as the basis of a simple word counting program;
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main(int argc, char *argv[])
{
FILE *fp;
char buffer[256];
char *p;
long count;
if (argc != 2)
{
fputs("\nERROR: Usage is wordcnt <file>\n",stderr );
exit(0);
}
/* Open file for reading */
fp = fopen(argv[1],"r");
/* Check the open was okay */
if (!fp)
{
fputs("\nERROR: Cannot open source file\n",stderr );
exit(0);
}
/* Initialise word count */
count = 0;
do
{
/* Read a line of data from the file */
fgets(buffer,255,fp);
/* check for an error in the read or EOF */
if (ferror(fp) || feof(fp))
continue
;
/* count words in received line */
/* Words are defined as separated by the characters */
/* \t(tab) \n(newline) , ; : . ! ? ( ) - and [space] */
p = strtok(buffer,"\t\n,;:.!?()- ");
while(p)
{
count++;
p = strtok(NULL,"\t\n,;:.!?()- ");
}
}
while(!ferror(fp) && !feof(fp));
/* Finished reading. Was it due to an error? */
if (ferror(fp))
{
fputs("\nERROR: Reading source file\n",stderr );
fclose (fp);
exit(0);
}
/* Reading finished due to EOF, quite valid so print count */
printf ("\nFile %s contains %ld words\n",argv[1],count);
fclose (fp);
}
All C compilers provide a facility for converting numbers to strings. This being sprintf(). However, as happens sprintf() is a multi-purpose function that is therefore large and slow. The following function ITOS() accepts two parameters, the first being a signed integer and the second being a pointer to a character string. It then copies the integer into the memory pointed to by the character pointer. As with sprintf() ITOS() does not check that the target string is long enough to accept the result of the conversion. You should then ensure that the target string is long enough.
Example function for copying a signed integer into a string;
void ITOS(long x, char *ptr)
{
/* Convert
a signed decimal integer to a string */
long pt[9] = { 100000000, 10000000, 1000000, 100000, 10000, 1000, 100, 10, 1 };
int n;
/* Check sign */
if (x < 0)
{
*ptr++ = '-';
/* Convert x to absolute */
x = 0 - x;
}
for(n = 0; n < 9; n++)
{
if (x > pt[n])
{
*ptr++ = '0' + x / pt[n];
x %= pt[n];
}
}
return;
}
To convert a string to a floating point number, C provides two
functions; atof() and strtod(). atof() has the prototype;
double atof(const char *s);
strtod has the prototype;
double strtod(const char *s,char **endptr);
Both functions scan the string and convert it as far as they can, until they come across a character they don't understand. The difference between the two functions is that if strtod() is passed a character pointer for parameter 'endptr', it sets that pointer to the first character in the string that terminated the conversion. Because of its better error reporting, by way of endptr, strtod() is often preferred to atof().
To convert a string to an integer use atoi() that has the
prototype;
int atoi(const char *s);
atoi() does not check for an overflow, and the results are
undefined!
atol() is a similar function but returns a long. Alternatively, you can use strtol() and stroul() instead that have better error checking.
Human languages write information down as 'text'. This is comprised of words, figures and punctuation. The words being made up of upper case and lower case letters. Processing text with a computer is a commonly required task, and yet quite a difficult one.
The ANSI C definitions include string processing functions that are by their nature sensitive to case. That is the letter 'A' is seen as distinct from the letter 'a'. This is the first problem that must be overcome by the programmer. Fortunately both Borland's Turbo C compilers and Microsoft's C compilers include case insensitive forms of the string functions.
stricmp() for example is the case insensitive form of strcmp(), and strnicmp() is the case insensitive form of strncmp().
If you are concerned about writing portable code, then you must restrict yourself to the ANSI C functions, and write your own case insensitive functions using the tools provided.
Here is a simple implementation of a case insensitive version
of strstr(). The function simply makes a copy of the parameter strings, converts
those copies both to upper case and then does a standard strstr() on the copies.
The offset of the target string within the source string will be the same for
the copy as the original, and so it can be returned relative to the parameter
string.
char *stristr(char *s1, char *s2)
{
char c1[1000];
char c2[1000];
char *p;
strcpy(c1,s1);
strcpy(c2,s2);
strupr(c1);
strupr(c2);
p = strstr(c1,c2);
if (p)
return s1 + (p - c1);
return NULL;
}
This function scans a string, s1 looking for the word held in s2. The word must be a complete word, not simply a character pattern, for the function to return true. It makes use of the stristr() function described previously.
int word_in(char *s1,char *s2)
{
/* return non-zero if s2 occurs as a word in s1 */
char *p;
char *q;
int ok;
ok = 0;
q = s1;
do
{
/* Locate character occurence s2 in s1 */
p = stristr(q,s2);
if (p)
{
/* Found */
ok = 1;
if (p > s1)
{
/* Check previous character */
if (*(p - 1) >= 'A' && *(p - 1) <= 'z')
ok = 0;
}
/* Move p to end of character set */
p += strlen(s2);
if (*p)
{
/* Check character following */
if (*p >= 'A' && *p <= 'z')
ok = 0;
}
}
q = p;
}
while(p && !ok);
return ok;
}
Some more useful functions for dealing with text are truncstr() that truncates a string;
void truncstr(char *p,int num)
{
/* Truncate string by losing last num characters */
if (num < strlen(p))
p[strlen(p) - num] = 0;
}
trim() that removes trailing spaces from the end of a string;
void trim(char *text)
{
/* remove trailing spaces */
char *p;
p = &text[strlen(text) - 1];
while(*p == 32 && p >= text)
*p-- = 0;
}
strlench() that changes the length of a string by adding or
deleting characters;
void strlench(char *p,int num)
{
/* Change length of string by adding or deleting characters */
if (num > 0)
memmove(p + num,p,strlen(p) + 1);
else
{
num = 0 - num;
memmove(p,p + num,strlen(p) + 1);
}
}
strins() that inserts a string into another string;
void strins(char *p, char *q)
{
/* Insert string q into p */
strlench(p,strlen(q));
strncpy(p,q,strlen(q));
}
strchg() that replaces all occurences of one sub-string with
another within a target string;
void strchg(char *data, char *s1, char *s2)
{
/* Replace all occurences of s1 with s2 */
char *p;
char changed;
do
{
changed = 0;
p = strstr(data,s1);
if (p)
{
/* Delete original string */
strlench(p,0 - strlen(s1));
/* Insert replacement string */
strins(p,s2);
changed = 1;
}
}
while(changed);
}
Links: Home C Programming Guide C++ programming Guide