The actions of <i>strtok</i> can easily be coded using <i>strspn</i> and <i>strcspn</i>.<p><a href="https://groups.google.com/forum/message/raw?msg=comp.lang.c/ZhXAlw6VZsA/_Y5evTIkf6kJ" rel="nofollow">https://groups.google.com/forum/message/raw?msg=comp.lang.c/...</a> [2001]<p><a href="https://groups.google.com/forum/message/raw?msg=comp.lang.c/ff0xFqRPH_Y/Cen0mgciXn8J" rel="nofollow">https://groups.google.com/forum/message/raw?msg=comp.lang.c/...</a> [2011 repost]<p><i>strspn(s, bag)</i> calculates the length of the prefix of string <i>s</i> which consists only of the characters in string <i>bag</i>. <i>strcspn(s, bag)</i> calculates the length of the prefix of <i>s</i> consisting of characters <i>not</i> in <i>bag</i>.<p>The <i>bag</i> is like a one-character regex class; so that is to say <i>strspn(s, "abcd")</i> is like calculating the length of the token at the front of input <i>s</i> matching the regex [abcd]* , and in the case of <i>strcspn</i>, that becomes [^abcd]* .
strtok is one of the silliest parts of the standard library. (And there are many bad ones). It's broken. It's not thread safe (yes there is strtok_r). It's needlessly hard to use. And it writes zeros to the input array. The latter means it's unfit for most use cases, including non-trivial tokenization where you want e.g. to split "a+1" into three tokens.<p>If you program in C please just write those four obvious lines yourself.
I recommend ksplit/ksplit_core from Heng Li’s excellent klib kstring.{h,c}[0]. It modifies the string in-place, adding null terminators, and provides a list of offsets into the string. This gives you the flexibility of accessing tokens by index without paying costs of copying or memory
allocation.<p>[0] <a href="https://github.com/attractivechaos/klib" rel="nofollow">https://github.com/attractivechaos/klib</a>
I have an obsession with unsafe example code:<p><pre><code> strcpy(str,"abc,def,ghi");
token = strtok(str,",");
printf("%s \n",token);
</code></pre>
Even if the author knows how many tokens are returned I would prefer a check for NULL here since a good fraction might not read further than this bad example.
Well, yes, using strtok works if the data happens to be structured in a certain simple way.
Very often you want to do something more advanced though, and using regex for matching tokens is then necessary.
A lot of experience shows that the string tokenization in Open Object Rexx is darned useful. E.g., for many years, IBM's internal computing was from about 3600 <i>mainframe</i> computers around the world running VM/CMS with a lot of <i>service machines</i> written in Rexx. Rexx is no toy but a powerful, polished, scripting language and really good at handling strings.<p>A little example of some Rexx code with some string parsing is in<p><a href="https://news.ycombinator.com/item?id=18648999" rel="nofollow">https://news.ycombinator.com/item?id=18648999</a>
> Next, strtok is not thread-safe. That's because it uses a static buffer internally. So, you should take care that only one thread in your program calls strtok at a time.<p>I wonder why strtok() does not use an output parameter similar to scanf() — and return the number of tokens. Something like:<p><pre><code> int strtok(char *str, char *delim, char **tokens);
</code></pre>
Granted, it would involve dynamic memory allocation and the implementation that immediately comes to mind would be less efficient than the current implementation, but surely it’s worth eliminating the kind of bugs the current strtok() can introduce?<p>Does anyone here have the historical prospective?
Other approach from library calls and flex is re2c. It preprocesses the source code and inlines regular expression parsing where you needed. It's very powerful in combination with goto.
AFAIK strtok has restrict on both args since C99. And the safe variants strtok_s and esp. wcstok_s are missing. Strings are unicode nowadays, not ASCII.<p><a href="https://en.cppreference.com/w/c/string/byte/strtok" rel="nofollow">https://en.cppreference.com/w/c/string/byte/strtok</a>