在Emacs/elisp的解码HTML实体

问题描述:

一些在线网站喜欢编码它们通过HTML实体的所有文字,所以也看不到文本 像在Emacs/elisp的解码HTML实体

So I'm looking 

你喜欢的东西:

So I'm looking  

我想知道是否有内置的方式来使用Emacs内置插件将编码文本转换为常规文本,或者如果我应该声明我的字符串映射(“& 83”=>“S”...)并手动解码它使用 地图。

任何指针将不胜感激。

+2

顺便说一句:那些不是HTML实体,而是Unicode实体 - 这是不同的。请参阅http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Character_reference_overview – ty812 2009-10-27 15:58:26

不知道是否有一个内置的功能,但这个小功能,可以做的工作:

(defun my-insert-encode-entities-string (str) 
    (mapconcat 
    (lambda (char) (format "&#%d;" char)) 
    (string-to-list str) 
    "")) 

如果你只是想的HTML编码,使用url-insert-entities-in-string来代替。

+0

该函数是错误的,因为您不想格式化为%d,您希望获取%d并将其格式化为char。 – 2009-10-30 04:03:10

+0

@Federico:我不确定我是否能看到你的观点。调用'(我插入编码实体字符串“所以我看”)'返回完全相同的结果,你提供。变量'char'保存当前字符表示为一个整数,所以在这种情况下,我认为不管是使用'%s'还是'%d'。 – 2009-10-30 07:59:41

+0

@Török:在我正在寻找的问题中可以看到“......一种将编码文本转换为常规文本的方式”,这有一点误解。您的解决方案将常规转换为编码文本:) 我写了这个http://gist.github.com/222709来解决它,但它显然不像您原来的建议那么干净。 – 2009-10-30 20:49:31

我写了这个函数来处理非数字unicode实体,以防万一需要。

(defun html-entities-to-unicode (string) 
    (let* ((plist '(Aacute "Á" aacute "á" Acirc "Â" acirc "â" acute "´" AElig "Æ" aelig "æ" Agrave "À" agrave "à" alefsym "ℵ" Alpha "Α" alpha "α" amp "&" and "∧" ang "∠" apos "'" aring "å" Aring "Å" asymp "≈" atilde "ã" Atilde "Ã" auml "ä" Auml "Ä" bdquo "„" Beta "Β" beta "β" brvbar "¦" bull "•" cap "∩" ccedil "ç" Ccedil "Ç" cedil "¸" cent "¢" Chi "Χ" chi "χ" circ "ˆ" clubs "♣" cong "≅" copy "©" crarr "↵" cup "∪" curren "¤" Dagger "‡" dagger "†" darr "↓" dArr "⇓" deg "°" Delta "Δ" delta "δ" diams "♦" divide "÷" eacute "é" Eacute "É" ecirc "ê" Ecirc "Ê" egrave "è" Egrave "È" empty "∅" emsp " " ensp " " Epsilon "Ε" epsilon "ε" equiv "≡" Eta "Η" eta "η" eth "ð" ETH "Ð" euml "ë" Euml "Ë" euro "€" exist "∃" fnof "ƒ" forall "∀" frac12 "½" frac14 "¼" frac34 "¾" frasl "⁄" Gamma "Γ" gamma "γ" ge "≥" gt ">" harr "↔" hArr "⇔" hearts "♥" hellip "…" iacute "í" Iacute "Í" icirc "î" Icirc "Î" iexcl "¡" igrave "ì" Igrave "Ì" image "ℑ" infin "∞" int "∫" Iota "Ι" iota "ι" iquest "¿" isin "∈" iuml "ï" Iuml "Ï" Kappa "Κ" kappa "κ" Lambda "Λ" lambda "λ" lang "〈" laquo "«" larr "←" lArr "⇐" lceil "⌈" ldquo "“" le "≤" lfloor "⌊" lowast "∗" loz "◊" lrm "" lsaquo "‹" lsquo "‘" lt "<" macr "¯" mdash "—" micro "µ" middot "·" minus "−" Mu "Μ" mu "μ" nabla "∇" nbsp "" ndash "–" ne "≠" ni "∋" not "¬" notin "∉" nsub "⊄" ntilde "ñ" Ntilde "Ñ" Nu "Ν" nu "ν" oacute "ó" Oacute "Ó" ocirc "ô" Ocirc "Ô" OElig "Œ" oelig "œ" ograve "ò" Ograve "Ò" oline "‾" omega "ω" Omega "Ω" Omicron "Ο" omicron "ο" oplus "⊕" or "∨" ordf "ª" ordm "º" oslash "ø" Oslash "Ø" otilde "õ" Otilde "Õ" otimes "⊗" ouml "ö" Ouml "Ö" para "¶" part "∂" permil "‰" perp "⊥" Phi "Φ" phi "φ" Pi "Π" pi "π" piv "ϖ" plusmn "±" pound "£" Prime "″" prime "′" prod "∏" prop "∝" Psi "Ψ" psi "ψ" quot "\"" radic "√" rang "〉" raquo "»" rarr "→" rArr "⇒" rceil "⌉" rdquo "”" real "ℜ" reg "®" rfloor "⌋" Rho "Ρ" rho "ρ" rlm "" rsaquo "›" rsquo "’" sbquo "‚" scaron "š" Scaron "Š" sdot "⋅" sect "§" shy "" Sigma "Σ" sigma "σ" sigmaf "ς" sim "∼" spades "♠" sub "⊂" sube "⊆" sum "∑" sup "⊃" sup1 "¹" sup2 "²" sup3 "³" supe "⊇" szlig "ß" Tau "Τ" tau "τ" there4 "∴" Theta "Θ" theta "θ" thetasym "ϑ" thinsp " " thorn "þ" THORN "Þ" tilde "˜" times "×" trade "™" uacute "ú" Uacute "Ú" uarr "↑" uArr "⇑" ucirc "û" Ucirc "Û" ugrave "ù" Ugrave "Ù" uml "¨" upsih "ϒ" Upsilon "Υ" upsilon "υ" uuml "ü" Uuml "Ü" weierp "℘" Xi "Ξ" xi "ξ" yacute "ý" Yacute "Ý" yen "¥" yuml "ÿ" Yuml "Ÿ" Zeta "Ζ" zeta "ζ" zwj "" zwnj "")) 
     (get-function (lambda (s) (or (plist-get plist (intern (substring s 1 -1))) s)))) 
    (replace-regexp-in-string "&[^; ]*;" get-function string))) 

我写了下面的内容,它是做你需要的,@ federico-builes。 (我需要同样的东西。)

(defun ajs-decimal-escapes-to-unicode (start end) 
    "Convert escapes like '&#955;' to Unicode like 'λ'. 
Operates on the active region or the whole buffer." 
    (interactive (list (point) (mark))) 
    (or (use-region-p) 
     (setq start (point-min) end (point-max))) 
    (insert (replace-regexp-in-string 
      "&#[0-9]*;" 
      (lambda (match) 
      (format "%c" (string-to-number (substring match 2 -1)))) 
      (filter-buffer-substring start end t)))) 

@ konr的回复很有帮助 - 谢谢!我也一直在享受An Introduction to Programming in Emacs Lisp。这是我写的第一个可能有用的Lisp。我会很感激反馈,甚至在诸如空白之类的东西上;谢谢!