Let's say a normal sample with its already oto is like this:
Well, the longer the white zone, the vowel part, the more human the very long
notes will sound.
UTAU has to stretch the vowel if the sample is shorter than the needed length of the note (for example, if the sample has a vowel of one second and you want to make a note of two seconds UTAU has to make that vowel part longer artificially stretching the sample).
But, of course, this only applies for really long notes. Probably samples with vowel around 0.5-2 seconds are more than enough but vowels of like 0.2 or less may start to be a noticeable problem.
As an example, the cute robotic Uta Utane has a vowel around 0.1 seconds while the very cool "new" Aoi Celestine sounds very human with vowels just a bit over 0.5 seconds -but Defoko may be a bad example because she doesn't come from a person from the start-.
If you plan on doing long samples for that, look again to the picture of the oto config of the start.
You see a red line going around the center up and down (not the vertical line of the oto config). Well, if you make the sample long but the line goes away a lot from the center, then don't record them that way. That frequency thing part (not sure what exatly is!) is more important that the sample being long.
If you want to sound more human, try the "blanket over your recording" trick. It really helps a lot to improve the recording.