Let my clarify my first hastily/poorly written response:<br><br>1. The target parser for attacks using these techniques could be SQL query engine, could be web user agent, etc.. The target parser, however, is not necessarily where the decoding occurs.
<br><br>2. Somewhere along the path from HTTP protocol --> to app untrusted entry point --> to parser, there are several possible layers of decoding. These could include:
<br><br>+ Web Sever itself<br><br>+ Web Server plugin<br><br>+ Canonicalization in framework (e.g.-some .NET modules)<br><br>+ Canonicalization steps in web <br>code.<br><br>+ Decoding and interpretation by shellscripts and the like.
<br><br>+ Decoding certain encoding types for normalization (see this a lot in PHP, or cookies base64 file-system encoded, etc.)<br><br>+ etc. <br><br>This means that:<br><br>3. It is possible for an app to have one or more layers of canonicalization/conversion, allowing for even crazy things like double and triple-encoding, which IDS/IPS do not handle at all over HTTP (heck, only a year or two ago most of the WAFs didn't handle these properly; I could walk through all but Teros with simple encoding attacks):
<br><br>+ Web Server does Hex URL, some forms of full-width<br><br>+ Shell script converts shellcode or some hex form<br><br>+ final interpreter decodes UTF-7 to UTF-8 to normalize before outputting to parser<br><br>so you could have
<br><br>1. Hex URL<br>2. Full-Width Unicode<br>3. Shellcode Hex payload<br>4. UTF-7<br><br>All decoded in order, potentially, to get down to your canonicalized attack that works for the specific parser you are targeting. Now the above four-step example would be a crazy rare case, but I've seen one instance of double/triple-decoding just recently in a major production site that was fairly insane. One of our developers keeps asking me over and over:
<br><br>"Why in the world do these tests work?"<br><br>As a side note: I suspect it's related to that college-java-programmer supplied IDE:<br><br>- control-c<br>- control-v<br> <br>-ae<br><br><div><span class="gmail_quote">
On 5/21/07, <b class="gmail_sendername">
Arian J. Evans</b> <<a href="mailto:arian.evans@anachronic.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">arian.evans@anachronic.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div>1. You are missing what I consider to be the major point.</div>
<div> </div>
<div>2. I don't know the context of the cert advisory; there are more encoding types than full under full-width that IDS today don't decode (that are of interest to us as well), but...</div>
<div> </div>
<div>3. The question we need to ask ourselves is one of cannonicalization. In monolithic J2EE projects and modern cobbled-together web code, PHP is notoriously dirty for this, there are *multiple* layers of cannonicalization that often occur specific to particular untrusted entry points. This stuff is really hard to find (initially) in source code.
</div>
<div> </div>
<div>You will find that sometimes you can even double-encode your attacks, and they get decoded/cannonicalized to their common ASCII or UTF-8 (or whatever format) before they read the parser (query engine, browser, shell script, smtp relay, whatever parser you are targeting).
</div>
<div> </div>
<div>It's fair to be skeptical about this though Brian. It's not common to find where these attacks work, and I find that few people go beyond buzzwords and encoding-attack-technobafflegab when discussing this subject in the security "consultant" space.
</div>
<div> </div>
<div>Guess it's finally time for a paper on this,</div>
<div> </div>
<div>-- <br>Arian Evans<br>solipsistic software security sophist<br><br>"I love deadlines. I like the whooshing sound they make as they fly by." - Douglas Adams <br> </div><div><span>
<div><span class="gmail_quote">On 5/21/07, <b class="gmail_sendername">Brian Eaton</b> <<a href="mailto:eaton.lists@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">eaton.lists@gmail.com
</a>> wrote:</span>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;">Has anyone had a look at the full-width unicode encoding trick discussed here?<br><br><a href="http://www.kb.cert.org/vuls/id/739224" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
http://www.kb.cert.org/vuls/id/739224</a><br><br>AFAICT, this technique could be useful for a homograph attack. I<br>don't think it's useful for much else. However, a few vendors have<br>reacted already, so I may be missing something important.
<br><br>Here's why I think the attack is mostly harmless:<br><br>Let's say an attacker wants to use this technique to hide a SQL<br>injection attack. They decide to use a full-width encoding for single<br>quote, 0xff 0x07. They successfully bypass the IDS, because the IDS
<br>is only scanning for normal single quotes. (You can see the encodings<br>and their graphical representation here:<br><a href="http://www.unicode.org/charts/PDF/UFF00.pdf" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
http://www.unicode.org/charts/PDF/UFF00.pdf</a>
)<br><br>If the SQL engine is processing queries in Unicode, then 0xff 0x07<br>will be treated as a normal unicode character, not a single quote.<br>The sequence 0xff 0x07 is not equivalent to 0x27, the real single<br>quote value. No SQL injection occurs.
<br><br>If the SQL engine is processing queries in UTF-8, then 0xff 0x07 will<br>be converted from Unicode to UTF-8: 0xef 0xbc 0x87. Again, the engine<br>does not recognize 0xef 0xbc 0x87 as equivalent to 0x27.<br><br>If the SQL engine is processing queries in ASCII or ISO-8859-1, the
<br>conversion from unicode to the code page used by the engine will fail.<br>Either the engine will give up on the query, or it might substitute a<br>question mark (?) for the unconvertible character.<br><br>To summarize: I think half-width and full-width unicode characters are
<br>characters that happen to have the same graphical representation as<br>other characters, but don't carry any special significance outside of<br>that graphical representation. The graphical representation can be<br>
important in homograph attacks, but otherwise I don't see this<br>technique as particularly useful to an attacker.<br><br>Any comments on what I may have missed?<br><br>Regards,<br>Brian<br><br>----------------------------------------------------------------------------
<br>Join us on IRC: <a href="http://irc.freenode.net" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">irc.freenode.net</a> #webappsec<br><br>Have a question? Search The Web Security Mailing List Archives:
<br><a href="http://www.webappsec.org/lists/websecurity/" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://www.webappsec.org/lists/websecurity/
</a><br><br>Subscribe via RSS:<br><a href="http://www.webappsec.org/rss/websecurity.rss" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">http://www.webappsec.org/rss/websecurity.rss</a> [RSS Feed]<br>
</blockquote></div><br><br clear="all"><br>
</span></div></blockquote></div><br><br clear="all"><br>-- <br>Arian Evans<br>solipsistic software security sophist<br><br>"I love deadlines. I like the whooshing sound they make as they fly by." - Douglas Adams