Python-Help needed

06/03/2019 01:38 ηøℓι#1
EN:
Hey, I have to write a function which tokenizes the Input "sentence".
Currently I do that with:
Code:
x = sentence.split()
This code works fine.
But now I have to make sure that punctuation (. , : etc.) gets treated like a part of the strings so that e.g.:
sentence = "Hi, Ich bins."

is NOT just:

["Hi," "Ich", "bins."] (like it would be with .split() )

but that the output is:

["Hi", ",", "Ich", "bins", "."]

How can I get to that?
Apparently I need to use a for-loop for it, but I have no clue how.
(i cannot use NLTK or regex)

DE:
06/03/2019 05:25 cypher#2
Ooops no regex. Ok sorry.
06/03/2019 16:06 0xFADED#3
Quote:
Originally Posted by ηøℓι View Post
Apparently I need to use a for-loop for it, but I have no clue how.
(i cannot use NLTK or regex)
I dont know python, but basically you just need to loop through the characters and buffer them until you hit a space or punctuation.

Here's an example in C++:
06/04/2019 01:47 elmarcia#4
Quote:
Originally Posted by ηøℓι View Post
EN:
Hey, I have to write a function which tokenizes the Input "sentence".
Currently I do that with:
Code:
x = sentence.split()
This code works fine.
But now I have to make sure that punctuation (. , : etc.) gets treated like a part of the strings so that e.g.:
sentence = "Hi, Ich bins."

is NOT just:

["Hi," "Ich", "bins."] (like it would be with .split() )

but that the output is:

["Hi", ",", "Ich", "bins", "."]

How can I get to that?
Apparently I need to use a for-loop for it, but I have no clue how.
(i cannot use NLTK or regex)

DE:
If you go to the python docs and read string.split function u will find out that you can specify a separator char, then using something like this should do the trick
Code:
str = "im a string, and have punctuation too."

print(str.split(" "))
[Only registered and activated users can see links. Click Here To Register...]
06/04/2019 11:13 0xFADED#5
Quote:
Originally Posted by elmarcia View Post
u will find out that you can specify a separator char
Read the question again elmarcia.
OP specifically asked for a solution that treats punctuation like it was enclosed in whitespace.
Split is not enough here, it will yield the false ["string,"] result where a ["string", ","] was expected.
06/04/2019 11:58 ηøℓι#6
Quote:
Originally Posted by 0xFADED View Post
I dont know python, but basically you just need to loop through the characters and buffer them until you hit a space or punctuation.

Here's an example in C++:
Thank you, my solution in the end looked like this:
(the code might be weird, but hey.. it works! )




and the official / easiest solution seems to be:
Code:
emptylist = []

    for y in x.split():                    #x = input-sentence
        if x[-1] in ".,!?;":
            emptylist.extend([x[:-1], x[-1]])
        else:
            emptylist.append(x)

    return emptylist
Glad this could be solved, thank you for helping!