Html- Status und Versionscheck

Stellt hier eure Projekte vor.
Internetseiten, Skripte, und alles andere bzgl. Python.
1of7470000000
User
Beiträge: 10
Registriert: Donnerstag 8. Februar 2018, 22:04

Html- Status und Versionscheck

Beitragvon 1of7470000000 » Freitag 9. Februar 2018, 00:28

  1. import urllib3
  2. import re
  3. http = urllib3.PoolManager()
  4. input = input("insert URL & press ENTER.")
  5. r = http.request("GET", input)
  6. print("HTML-Statuscode:", r.status)
  7. if r.status == 200:
  8.     print("'OK'. This is standard response for successful HTTP requests. The actual response will depend on the request method used. In a GET request, the response will contain an entity corresponding to the requested resource (This was a GET request). In a POST request, the response will contain an entity describing or containing the result of the action.")
  9. elif r.status == 201:
  10.     print("201 Created. The request has been fulfilled, resulting in the creation of a new resource.")
  11. elif r.status == 202:
  12.     print("202 Accepted. The request has been accepted for processing, but the processing has not been completed. The request might or might not be eventually acted upon, and may be disallowed when processing occurs.")
  13. elif r.status == 203:
  14.     print("203 Non-Authoritative. Information (since HTTP/1.1). The server is a transforming proxy (e.g. a Web accelerator) that received a 200 OK from its origin, but is returning a modified version of the origin's response.")
  15. elif r.status == 204:
  16.     print("204 No Content. The server successfully processed the request and is not returning any content.")
  17. elif r.status == 205:
  18.     print("205 Reset Content.The server successfully processed the request, but is not returning any content. Unlike a 204 response, this response requires that the requester reset the document view.")
  19. elif r.status == 206:
  20.         print("206 Partial Content (RFC 7233). The server is delivering only part of the resource (byte serving) due to a range header sent by the client. The range header is used by HTTP clients to enable resuming of interrupted downloads, or split a download into multiple simultaneous streams.")
  21. elif r.status == 207:
  22.     print("207 Multi-Status (WebDAV; RFC 4918). The message body that follows is an XML message and can contain a number of separate response codes, depending on how many sub-requests were made.")
  23. elif r.status == 208:
  24.     print("208 Already Reported (WebDAV; RFC 5842). The members of a DAV binding have already been enumerated in a preceding part of the (multistatus) response, and are not being included again.")
  25. elif r.status == 226:
  26.     print("226 IM Used (RFC 3229). The server has fulfilled a request for the resource, and the response is a representation of the result of one or more instance-manipulations applied to the current instance.")
  27. elif r.status == 400:
  28.     print("Bad request. The server cannot or will not process the request due to an apparent client error (e.g., malformed request syntax, size too large, invalid request message framing, or deceptive request routing).")
  29. elif r.status == 401:
  30.     print("Unauthorized (RFC 7235). Similar to 403 Forbidden, but specifically for use when authentication is required and has failed or has not yet been provided. The response must include a WWW-Authenticate header field containing a challenge applicable to the requested resource. Check 'Basic access authentication and Digest access authentication'. 401 semantically means 'unauthenticated', i.e. the user does not have the necessary credentials. Note: Some sites issue HTTP 401 when an IP address is banned from the website (usually the website domain) and that specific address is refused permission to access a website.")
  31. elif r.status == 403:
  32.     print("Forbidden. The request was valid, but the server is refusing action. The user might not have the necessary permissions for a resource, or may need an account of some sort.")
  33. elif r.status == 404:
  34.     print("The HTTP 404 Not Found (pronounced 'four oh four'') error message is a Hypertext Transfer Protocol (HTTP) standard response code, in computer network communications, to indicate that the client was able to communicate with a given server, but the server could not find what was requested. The website hosting server will typically generate a '404 Not Found' web page when a user attempts to follow a broken or dead link; hence the 404 error is one of the most recognizable errors encountered on the World Wide Web.")
  35. elif r.status == 405:
  36.     print("Method Not Allowed.A request method is not supported for the requested resource; for example, a GET request on a form that requires data to be presented via POST, or a PUT request on a read-only resource.")
  37. elif r.status == 406:
  38.     print:("406 Not AcceptableThe requested resource is capable of generating only content not acceptable according to the Accept headers sent in the request.[39] See Content negotiation.")
  39. elif r.status == 407:
  40.     print("407 Proxy Authentication Required (RFC 7235). The client must first authenticate itself with the proxy.")
  41. elif r.status == 408:
  42.     print("408 Request Timeout. The server timed out waiting for the request. According to HTTP specifications: The client did not produce a request within the time that the server was prepared to wait. The client MAY repeat the request without modifications at any later time.")
  43. elif r.status == 409:
  44.     print("409 Conflict. Indicates that the request could not be processed because of conflict in the request, such as an edit conflict between multiple simultaneous updates.")
  45. elif r.status == 410:
  46.     print("410 Gone. Indicates that the resource requested is no longer available and will not be available again. This should be used when a resource has been intentionally removed and the resource should be purged. Upon receiving a 410 status code, the client should not request the resource in the future. Clients such as search engines should remove the resource from their indices. Most use cases do not require clients and search engines to purge the resource, and a '404 Not Found' may be used instead.")
  47. elif r.status == 411:
  48.     print("411 Length Required. The request did not specify the length of its content, which is required by the requested resource.")
  49. elif r.status == 412:
  50.     print("412 Precondition Failed (RFC 7232). The server does not meet one of the preconditions that the requester put on the request.")
  51. elif r.status == 413:
  52.     print("413 Payload Too Large (RFC 7231). The request is larger than the server is willing or able to process. Previously called 'Request Entity Too Large.")
  53. elif r.status == 414:
  54.     print("414 URI Too Long (RFC 7231). The URI provided was too long for the server to process. Often the result of too much data being encoded as a query-string of a GET request, in which case it should be converted to a POST request. Called 'Request-URI Too Long' previously.")
  55. elif r.status == 415:
  56.     print("415 Unsupported Media Type. The request entity has a media type which the server or resource does not support. For example, the client uploads an image as image/svg+xml, but the server requires that images use a different format.")
  57. elif r.status == 416:
  58.     print("416 Range Not Satisfiable (RFC 7233). The client has asked for a portion of the file (byte serving), but the server cannot supply that portion. For example, if the client asked for a part of the file that lies beyond the end of the file. Called 'Requested Range Not Satisfiable' previously.")
  59. elif r.status == 417:
  60.     print("417 Expectation Failed. The server cannot meet the requirements of the Expect request-header field.")
  61. elif r.status == 418:
  62.     print("418 I'm a teapot (RFC 2324). This code was defined in 1998 as one of the traditional IETF April Fools' jokes, in RFC 2324, Hyper Text Coffee Pot Control Protocol, and is not expected to be implemented by actual HTTP servers. The RFC specifies this code should be returned by teapots requested to brew coffee.[51] This HTTP status is used as an Easter egg in some websites, including Google.com.")
  63. elif r.status == 421:
  64.     print("421 Misdirected Request (RFC 7540). The request was directed at a server that is not able to produce a response.(for example because of a connection reuse)")
  65. elif r.status == 422:
  66.     print("422 Unprocessable Entity (WebDAV; RFC 4918). The request was well-formed but was unable to be followed due to semantic errors.")
  67. elif r.status == 423:
  68.     print("423 Locked (WebDAV; RFC 4918). The resource that is being accessed is locked.")
  69. elif r.status == 424:
  70.     print("424 Failed Dependency (WebDAV; RFC 4918). The request failed because it depended on another request and that request failed (e.g., a PROPPATCH).")
  71. elif r.status == 426:
  72.     print("426 Upgrade Required. The client should switch to a different protocol such as TLS/1.0, given in the Upgrade header field.")
  73. elif r.status == 428:
  74.     print("428 Precondition Required (RFC 6585).The origin server requires the request to be conditional. Intended to prevent the 'lost update' problem, where a client GETs a resource's state, modifies it, and PUTs it back to the server, when meanwhile a third party has modified the state on the server, leading to a conflict.")
  75. elif r.status == 429:
  76.     print("429 Too Many Requests (RFC 6585). The user has sent too many requests in a given amount of time. Intended for use with rate-limiting schemes.")
  77. elif r.status == 431:
  78.     print("431 Request Header Fields Too Large (RFC 6585). The server is unwilling to process the request because either an individual header field, or all the header fields collectively, are too large.")
  79. elif r.status == 451:
  80.     print("451 Unavailable For Legal Reasons (RFC 7725). A server operator has received a legal demand to deny access to a resource or to a set of resources that includes the requested resource.The code 451 was chosen as a reference to the novel Fahrenheit 451 (see the Acknowledgements in the RFC).")
  81. elif r.status == 500:
  82.     print("500 Internal Server Error. A generic error message, given when an unexpected condition was encountered and no more specific message is suitable.")
  83. elif r.status == 501:
  84.     print("501 Not Implemented. The server either does not recognize the request method, or it lacks the ability to fulfil the request. Usually this implies future availability (e.g., a new feature of a web-service API).")
  85. elif r.status == 502:
  86.     print("502 Bad Gateway. The server was acting as a gateway or proxy and received an invalid response from the upstream server.")
  87. elif r.status == 503:
  88.     print("503 Service Unavailable. The server is currently unavailable (because it is overloaded or down for maintenance). Generally, this is a temporary state.")
  89. elif r.status == 50:
  90.     print("504 Gateway Timeout. The server was acting as a gateway or proxy and did not receive a timely response from the upstream server.")
  91. elif r.status == 505:
  92.     print("505 HTTP Version Not Supported The server does not support the HTTP protocol version used in the request. ")
  93. elif r.status == 506:
  94.     print("506 Variant Also Negotiates (RFC 2295). Transparent content negotiation for the request results in a circular reference.")
  95. elif r.status == 507:
  96.     print("507 Insufficient Storage (WebDAV; RFC 4918). The server is unable to store the representation needed to complete the request.")
  97. elif r.status == 508:
  98.     print("508 Loop Detected (WebDAV; RFC 5842). The server detected an infinite loop while processing the request (sent in lieu of 208 Already Reported).")
  99. elif r.status == 510:
  100.     print("510 Not Extended (RFC 2774). Further extensions to the request are required for the server to fulfil it.")
  101. elif r.status == 511:
  102.     print("511 Network Authentication Required (RFC 6585). The client needs to authenticate to gain network access. Intended for use by intercepting proxies used to control access to the network (e.g., 'captive portals' used to require agreement to Terms of Service before granting full Internet access via a Wi-Fi hotspot).")
  103. elif r.status == 300:
  104.     print("300 Multiple Choices. Indicates multiple options for the resource from which the client may choose (via agent-driven content negotiation). For example, this code could be used to present multiple video format options, to list files with different filename extensions, or to suggest word-sense disambiguation.")
  105. elif r.status == 301:
  106.     print("301 Moved Permanently. This and all future requests should be directed to the given URI.")
  107. elif r.status == 302:
  108.     print("302 Found. This is an example of industry practice contradicting the standard. The HTTP/1.0 specification (RFC 1945) required the client to perform a temporary redirect (the original describing phrase was 'Moved Temporarily'), but popular browsers implemented 302 with the functionality of a 303 See Other. Therefore, HTTP/1.1 added status codes 303 and 307 to distinguish between the two behaviours. However, some Web applications and frameworks use the 302 status code as if it were the 303.")
  109. elif r.status == 303:
  110.     print("303 See Other (since HTTP/1.1) The response to the request can be found under another URI using the GET method. When received in response to a POST (or PUT/DELETE), the client should presume that the server has received the data and should issue a new GET request to the given URI.")
  111. elif r.status == 304:
  112.     print("304 Not Modified (RFC 7232). Indicates that the resource has not been modified since the version specified by the request headers If-Modified-Since or If-None-Match. In such case, there is no need to retransmit the resource since the client still has a previously-downloaded copy.")
  113. elif r.status == 305:
  114.     print("305 Use Proxy (since HTTP/1.1). The requested resource is available only through a proxy, the address for which is provided in the response. Many HTTP clients (such as Mozilla[27] and Internet Explorer) do not correctly handle responses with this status code, primarily for security reasons.")
  115. elif r.status == 306:
  116.     print("306 Switch Proxy. No longer used. Originally meant 'Subsequent requests should use the specified proxy.'")
  117. elif r.status == 307:
  118.     print("307 Temporary Redirect (since HTTP/1.1). In this case, the request should be repeated with another URI; however, future requests should still use the original URI. In contrast to how 302 was historically implemented, the request method is not allowed to be changed when reissuing the original request. For example, a POST request should be repeated using another POST request.")
  119. elif r.status == 308:
  120.     print("308 Permanent Redirect (RFC 7538). The request and all future requests should be repeated using another URI. 307 and 308 parallel the behaviors of 302 and 301, but do not allow the HTTP method to change. So, for example, submitting a form to a permanently redirected resource may continue smoothly.")
  121.  
  122. if r.status >= 100 and r.status <= 199:
  123.     print("GENERAL: HTML Response-code between 100 and 199. 1xx Informational responses Edit An informational response indicates that the request was received and understood. It is issued on a provisional basis while request processing continues. It alerts the client to wait for a final response. The message consists only of the status line and optional header fields, and is terminated by an empty line. As the HTTP/1.0 standard did not define any 1xx status codes, servers must not[note 1] send a 1xx response to an HTTP/1.0 compliant client except under experimental conditions.")
  124. elif r.status >= 200 and r.status <= 299:
  125.     print("GENERAL: HTML Response-code between 200 and 299. 2xx, his class of status codes indicates the action requested by the client was received, understood and accepted.")
  126. elif r.status >=300 and r.status <= 399:
  127.     print("GENERAL: HTML Response-code between 300 and 399. 3xx, this class of status code indicates the client must take additional action to complete the request. Many of these status codes are used in URL redirection. A user agent may carry out the additional action with no user interaction only if the method used in the second request is GET or HEAD. A user agent may automatically redirect a request. A user agent should detect and intervene to prevent cyclical redirects.")
  128. elif r.status >= 400 and r.status <= 499:
  129.     print("GENERAL: HTML Response-code between 400 and 499. 4xx, this class of status code is intended for situations in which the error seems to have been caused by the client. Except when responding to a HEAD request, the server should include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition. These status codes are applicable to any request method. User agents should display any included entity to the user.")
  130. elif r.status >= 500 and r.status <= 599:
  131.     print("GENERAL: HTML Response-code between 500 and 599. 5xx, this means in general, the server failed to fulfil a request. Response status codes beginning with the digit '5'  indicate cases in which the server is aware that it has encountered an error or is otherwise incapable of performing the request. Except when responding to a HEAD request, the server should include an entity containing an explanation of the error situation, and indicate whether it is a temporary or permanent condition. Likewise, user agents should display any included entity to the user. These response codes are applicable to any request method.")
  132. else:
  133.     print("Something went wrong. Please try again.")
  134. print("---- HTML 4 OR 5----")
  135. r = http.request('GET', input)
  136. check_site = str(r.data[0:50])
  137. pattern = r"<!DOCTYPE HTML"
  138. if re.search(pattern, check_site):
  139.     print("Site is HTML 5")
  140. else:
  141.     pattern2 = r"<!doctype html"
  142.     if re.search(pattern2, check_site):
  143.         print("Site is HTML 5")
  144.     else:
  145.         print("Site is HTML 4")
  146.  
  147.  
  148.  
Sirius3
User
Beiträge: 7052
Registriert: Sonntag 21. Oktober 2012, 17:20

Re: Html- Status und Versionscheck

Beitragvon Sirius3 » Freitag 9. Februar 2018, 08:24

@1of7470000000: ist das hier nur Dein öffentliches GIST? Die Lange if-Kette solltest Du durch ein Wörterbuch ersetzen. Man kann mit re.IGNORECASE auch großkleinschreibungsinsensitiv suchen, denn auch "DocType" ist erlaubt. Auch die Doctypes aller anderen HTML-Versionen fangen mit "<!DOCTYPE HTML" an, so dass das kein Diskriminator ist.
1of7470000000
User
Beiträge: 10
Registriert: Donnerstag 8. Februar 2018, 22:04

Re: Html- Status und Versionscheck

Beitragvon 1of7470000000 » Freitag 9. Februar 2018, 21:02

Hallo,
nein, natürlich nicht (ich musste erstmal GIST googeln). Ich dachte nur, wenn ich die Scripte zeige, von denen ich in meiner Vorstellung gesprochen habe, kann man mein (bescheidenes) Level besser einschätzen. Und durch das Feedback lerne ich womöglich was.

Die Statuscodes zu mit einem Dictionary aufzurufen ist eine richtig klasse Idee. Das merke ich mir (ist ja auch in Zukunft sehr cool, bei solchen Situationen/Problemen). Das hat schon mal viel gebracht. Eigentlich logisch. Wenn man z. B. 20 mögliche Fälle hat, auf die eine bestimmte Info folgen soll, ist ein Dict die allerbeste Lösung. Danke dir. Und auch für die re.IGNORECASE - Funktion.

Ganz ehrlich, woher erfährt man so was? Wahrscheinlich nur über "tun", Feedback abholen und draus lernen.

Auch die anderen Html- Versionen fangen mit <!DOCTYPE HTML> an? Tz tz. ich habe, sofort nachgesehen. Komisch, Zitat aus dem Kurs:
"When writing HTML documents, one of the first new features that you 'ill notice is the type declaration: <!DOCTYPE HTML>

Na ja, ich will natürlich nicht rum- diskutieren, sondern nur erklären, wie ich überhaupt auf die Annahme gekommen bin. Wenn ich einen klaren Diskriminator (wieder gegoogelt, wieder ein Wort gelernt) gefunden habe, werde ich das umschreiben. Aber so leicht ist das nicht, glaube ich. Z. B. das <nav> Tag wäre ein Indikator. Aber nur weil eine Seite kein >nav> Tag hat, muss das nicht bedeuten, dass es sich um HTML 4 handelt. Kniffelig. Wenn das Grundgerüst soweit steht, braucht man nur noch ein Indiz, nach dem man suchen lässt.

Ich lasse das, wenns okay ist, aber mal stehen. Wenn nur 1 weiterer Mensch das liest und was davon mitnimmt ohne den gleichen Fehler nochmal zu machen, war es das wert.
Und man kann mit den Codesnippet ja auch im Grunde nach allem anderen suchen. Also nicht völlig sinnlos.

Danke für dein Feedback.
Sirius3
User
Beiträge: 7052
Registriert: Sonntag 21. Oktober 2012, 17:20

Re: Html- Status und Versionscheck

Beitragvon Sirius3 » Samstag 10. Februar 2018, 10:43

Ich schrieb "<!DOCTYPE HTML" kann auch bei HTML4-Seiten vorkommen. Und im Zweifel, wenn kein DocType angegeben ist, ist alles HTML5 oder irgendein anderes HTML dessen Verhalten in uneindeutigen Situationen nicht klar definiert ist. Für die Unterscheidung braucht man eben einen Parser, der dann die Qualität des HTML einschätzt und dann vielleicht eine Einschätzung abgeben kann.
1of7470000000
User
Beiträge: 10
Registriert: Donnerstag 8. Februar 2018, 22:04

Re: Html- Status und Versionscheck

Beitragvon 1of7470000000 » Dienstag 13. Februar 2018, 01:32

Hallo,

aufgrund des konstruktiven Feedbacks habe ich das Programm umgeschrieben. Die Idee mit dem Dictionary war genial.

Mit der neuen Informationen (das ich HTML 4 eben noch nicht zuverlässig von HTML 5 unterscheiden kann) kann ich den Teil nicht weiter drinnen lassen. Es liegt nicht daran, dass ich es nicht gerne programmieren können würde, aber mir fehlt aktuell ein eindeutiges Indiz. Auch den Import von Regex braucht man dann nicht mehr. Es war auf jeden Fall eine gute Übung um HTML- Dokumente nach etwas zu durchsuchen.

Das Programm habe ich auch die anderen Feedbacks einfließen lassen. Möglichst klare Benennung der Bezeichner, möglichst einfach und durch die Kommentare sollte ich es auch in 6 Monaten noch verstehen.

Du schriebst, dass man einen guten Parser braucht. Das macht doch BS4, wenn ich es richtig verstanden habe.

Es parst das HTML- Dokument. Ich könnte es z. B. mit prettify als HTML- Dokument anzeigen lassen. Hm, aber das lässt (mit meinem derzeitigen Wissensstand) keine Rückschlüsse auf die HTML- Version zu.

Vielleicht finde ich noch einen Weg. Bis dahin ist es eben nur ein HTML- Status- Code- Check mit Ausgabe von derart detaillierten Informationen, wie es mir möglich war. Ich mache auch kein Geheimnis, die Infos sind trivial von Wikipedia. Aber es ist ja erweiterbar.

Mir kam schon der Gedanke, nach mehreren Dingen zu suchen, die es in HTML 4 noch nicht gab. Z. B. <article>, <nav>, <aside> usw. Aber wenn es eine relativ schlichte Seite ist, könnte das Programm damit doch ebenso daneben liegen. Schwierig.

Oder sollte ich den ganzen Post löschen und das Programm korrekt benannt lieber nochmal Posten?
Oder soll ich es ganz raus nehmen?

  1. # Python 3.6
  2.  
  3. # requiered libarys import
  4. import urllib3
  5.  
  6. # create  PoolManager instance to make requests:
  7. http = urllib3.PoolManager()
  8.  
  9. # ask for site:
  10. input = input("insert URL & press ENTER.")
  11.  
  12. # request website with response as r with asked input
  13. r = http.request("GET", input)
  14.  
  15. # the dicitioary with status-codes
  16. status_codes_dictionary = {
  17. 200: "200.'OK'. This is standard response for successful HTTP requests. The actual response will depend on the request method used. In a GET request, the response will contain an entity corresponding to the requested resource (This was a GET request). In a POST request, the response will contain an entity describing or containing the result of the action.",
  18. 201: "201 Created. The request has been fulfilled, resulting in the creation of a new resource.",
  19. 202: "202 Accepted. The request has been accepted for processing, but the processing has not been completed. The request might or might not be eventually acted upon, and may be disallowed when processing occurs.",
  20. 203: "203 Non-Authoritative. Information (since HTTP/1.1). The server is a transforming proxy (e.g. a Web accelerator) that received a 200 OK from its origin, but is returning a modified version of the origin's response.",
  21. 204: "204 No Content. The server successfully processed the request and is not returning any content.",
  22. 205: "205 Reset Content.The server successfully processed the request, but is not returning any content. Unlike a 204 response, this response requires that the requester reset the document view.",
  23. 206: "206 Partial Content (RFC 7233). The server is delivering only part of the resource (byte serving) due to a range header sent by the client. The range header is used by HTTP clients to enable resuming of interrupted downloads, or split a download into multiple simultaneous streams.",
  24. 207: "207 Multi-Status (WebDAV; RFC 4918). The message body that follows is an XML message and can contain a number of separate response codes, depending on how many sub-requests were made.",
  25. 208: "208 Already Reported (WebDAV; RFC 5842). The members of a DAV binding have already been enumerated in a preceding part of the (multistatus) response, and are not being included again.",
  26. 226: "226 IM Used (RFC 3229). The server has fulfilled a request for the resource, and the response is a representation of the result of one or more instance-manipulations applied to the current instance.",
  27. 300: "300 Multiple Choices. Indicates multiple options for the resource from which the client may choose (via agent-driven content negotiation). For example, this code could be used to present multiple video format options, to list files with different filename extensions, or to suggest word-sense disambiguation.",
  28. 301: "301 Moved Permanently. This and all future requests should be directed to the given URI.",
  29. 302: "302 Found. This is an example of industry practice contradicting the standard. The HTTP/1.0 specification (RFC 1945) required the client to perform a temporary redirect (the original describing phrase was 'Moved Temporarily'), but popular browsers implemented 302 with the functionality of a 303 See Other. Therefore, HTTP/1.1 added status codes 303 and 307 to distinguish between the two behaviours. However, some Web applications and frameworks use the 302 status code as if it were the 303.",
  30. 303: "303 See Other (since HTTP/1.1) The response to the request can be found under another URI using the GET method. When received in response to a POST (or PUT/DELETE), the client should presume that the server has received the data and should issue a new GET request to the given URI.",
  31. 304: "304 Not Modified (RFC 7232). Indicates that the resource has not been modified since the version specified by the request headers If-Modified-Since or If-None-Match. In such case, there is no need to retransmit the resource since the client still has a previously-downloaded copy.",
  32. 305: "305 Use Proxy (since HTTP/1.1). The requested resource is available only through a proxy, the address for which is provided in the response. Many HTTP clients (such as Mozilla[27] and Internet Explorer) do not correctly handle responses with this status code, primarily for security reasons.",
  33. 306: "306 Switch Proxy. No longer used. Originally meant 'Subsequent requests should use the specified proxy.'",
  34. 307: "307 Temporary Redirect (since HTTP/1.1). In this case, the request should be repeated with another URI; however, future requests should still use the original URI. In contrast to how 302 was historically implemented, the request method is not allowed to be changed when reissuing the original request. For example, a POST request should be repeated using another POST request.",
  35. 308: "308 Permanent Redirect (RFC 7538). The request and all future requests should be repeated using another URI. 307 and 308 parallel the behaviors of 302 and 301, but do not allow the HTTP method to change. So, for example, submitting a form to a permanently redirected resource may continue smoothly.",
  36. 400: "Bad request. The server cannot or will not process the request due to an apparent client error (e.g., malformed request syntax, size too large, invalid request message framing, or deceptive request routing).",
  37. 401: "Unauthorized (RFC 7235). Similar to 403 Forbidden, but specifically for use when authentication is required and has failed or has not yet been provided. The response must include a WWW-Authenticate header field containing a challenge applicable to the requested resource. Check 'Basic access authentication and Digest access authentication'. 401 semantically means 'unauthenticated', i.e. the user does not have the necessary credentials. Note: Some sites issue HTTP 401 when an IP address is banned from the website (usually the website domain) and that specific address is refused permission to access a website.",
  38. 403: "Forbidden. The request was valid, but the server is refusing action. The user might not have the necessary permissions for a resource, or may need an account of some sort.",
  39. 404: "The HTTP 404 Not Found (pronounced 'four oh four'') error message is a Hypertext Transfer Protocol (HTTP) standard response code, in computer network communications, to indicate that the client was able to communicate with a given server, but the server could not find what was requested. The website hosting server will typically generate a '404 Not Found' web page when a user attempts to follow a broken or dead link; hence the 404 error is one of the most recognizable errors encountered on the World Wide Web.",
  40. 405: "Method Not Allowed.A request method is not supported for the requested resource; for example, a GET request on a form that requires data to be presented via POST, or a PUT request on a read-only resource.",
  41. 406: "406 Not AcceptableThe requested resource is capable of generating only content not acceptable according to the Accept headers sent in the request.[39] See Content negotiation.",
  42. 407: "407 Proxy Authentication Required (RFC 7235). The client must first authenticate itself with the proxy.",
  43. 408: "408 Request Timeout. The server timed out waiting for the request. According to HTTP specifications: The client did not produce a request within the time that the server was prepared to wait. The client MAY repeat the request without modifications at any later time.",
  44. 409: "409 Conflict. Indicates that the request could not be processed because of conflict in the request, such as an edit conflict between multiple simultaneous updates.",
  45. 410: "410 Gone. Indicates that the resource requested is no longer available and will not be available again. This should be used when a resource has been intentionally removed and the resource should be purged. Upon receiving a 410 status code, the client should not request the resource in the future. Clients such as search engines should remove the resource from their indices. Most use cases do not require clients and search engines to purge the resource, and a '404 Not Found' may be used instead.",
  46. 411: "411 Length Required. The request did not specify the length of its content, which is required by the requested resource.",
  47. 412: "412 Precondition Failed (RFC 7232). The server does not meet one of the preconditions that the requester put on the request.",
  48. 413: "413 Payload Too Large (RFC 7231). The request is larger than the server is willing or able to process. Previously called 'Request Entity Too Large.",
  49. 414: "414 URI Too Long (RFC 7231). The URI provided was too long for the server to process. Often the result of too much data being encoded as a query-string of a GET request, in which case it should be converted to a POST request. Called 'Request-URI Too Long' previously.",
  50. 415: "415 Unsupported Media Type. The request entity has a media type which the server or resource does not support. For example, the client uploads an image as image/svg+xml, but the server requires that images use a different format.",
  51. 416: "416 Range Not Satisfiable (RFC 7233). The client has asked for a portion of the file (byte serving), but the server cannot supply that portion. For example, if the client asked for a part of the file that lies beyond the end of the file. Called 'Requested Range Not Satisfiable' previously.",
  52. 417: "417 Expectation Failed. The server cannot meet the requirements of the Expect request-header field.",
  53. 418: "418 I'm a teapot (RFC 2324). This code was defined in 1998 as one of the traditional IETF April Fools' jokes, in RFC 2324, Hyper Text Coffee Pot Control Protocol, and is not expected to be implemented by actual HTTP servers. The RFC specifies this code should be returned by teapots requested to brew coffee.[51] This HTTP status is used as an Easter egg in some websites, including Google.com.",
  54. 421: "421 Misdirected Request (RFC 7540). The request was directed at a server that is not able to produce a response.(for example because of a connection reuse)",
  55. 422: "422 Unprocessable Entity (WebDAV; RFC 4918). The request was well-formed but was unable to be followed due to semantic errors.",
  56. 423: "423 Locked (WebDAV; RFC 4918). The resource that is being accessed is locked.",
  57. 424: "424 Failed Dependency (WebDAV; RFC 4918). The request failed because it depended on another request and that request failed (e.g., a PROPPATCH).",
  58. 426: "426 Upgrade Required. The client should switch to a different protocol such as TLS/1.0, given in the Upgrade header field.",
  59. 428: "428 Precondition Required (RFC 6585).The origin server requires the request to be conditional. Intended to prevent the 'lost update' problem, where a client GETs a resource's state, modifies it, and PUTs it back to the server, when meanwhile a third party has modified the state on the server, leading to a conflict.",
  60. 429: "429 Too Many Requests (RFC 6585). The user has sent too many requests in a given amount of time. Intended for use with rate-limiting schemes.",
  61. 431: "431 Request Header Fields Too Large (RFC 6585). The server is unwilling to process the request because either an individual header field, or all the header fields collectively, are too large.",
  62. 451: "451 Unavailable For Legal Reasons (RFC 7725). A server operator has received a legal demand to deny access to a resource or to a set of resources that includes the requested resource.The code 451 was chosen as a reference to the novel Fahrenheit 451 (see the Acknowledgements in the RFC).",
  63. 500: "500 Internal Server Error. A generic error message, given when an unexpected condition was encountered and no more specific message is suitable.",
  64. 501: "501 Not Implemented. The server either does not recognize the request method, or it lacks the ability to fulfil the request. Usually this implies future availability (e.g., a new feature of a web-service API).",
  65. 502: "502 Bad Gateway. The server was acting as a gateway or proxy and received an invalid response from the upstream server.",
  66. 503: "503 Service Unavailable. The server is currently unavailable (because it is overloaded or down for maintenance). Generally, this is a temporary state.",
  67. 504: "504 Gateway Timeout. The server was acting as a gateway or proxy and did not receive a timely response from the upstream server.",
  68. 505: "505 HTTP Version Not Supported The server does not support the HTTP protocol version used in the request. ",
  69. 506: "506 Variant Also Negotiates (RFC 2295). Transparent content negotiation for the request results in a circular reference.",
  70. 507: "507 Insufficient Storage (WebDAV; RFC 4918). The server is unable to store the representation needed to complete the request.",
  71. 508: "508 Loop Detected (WebDAV; RFC 5842). The server detected an infinite loop while processing the request (sent in lieu of 208 Already Reported).",
  72. 510: "510 Not Extended (RFC 2774). Further extensions to the request are required for the server to fulfil it.",
  73. 511: "511 Network Authentication Required (RFC 6585). The client needs to authenticate to gain network access. Intended for use by intercepting proxies used to control access to the network (e.g., 'captive portals' used to require agreement to Terms of Service before granting full Internet access via a Wi-Fi hotspot).",
  74. }
  75.  
  76. # print detailed information
  77. print(status_codes_dictionary[r.status])
  78.  
  79. # general information
  80. if r.status >= 100 and r.status <= 199:
  81.     print("GENERAL: HTML Response-code between 100 and 199. 1xx Informational responses Edit An informational response indicates that the request was received and understood. It is issued on a provisional basis while request processing continues. It alerts the client to wait for a final response. The message consists only of the status line and optional header fields, and is terminated by an empty line. As the HTTP/1.0 standard did not define any 1xx status codes, servers must not[note 1] send a 1xx response to an HTTP/1.0 compliant client except under experimental conditions.")
  82. elif r.status >= 200 and r.status <= 299:
  83.     print("GENERAL: HTML Response-code between 200 and 299. 2xx, his class of status codes indicates the action requested by the client was received, understood and accepted.")
  84. elif r.status >=300 and r.status <= 399:
  85.     print("GENERAL: HTML Response-code between 300 and 399. 3xx, this class of status code indicates the client must take additional action to complete the request. Many of these status codes are used in URL redirection. A user agent may carry out the additional action with no user interaction only if the method used in the second request is GET or HEAD. A user agent may automatically redirect a request. A user agent should detect and intervene to prevent cyclical redirects.")
  86. elif r.status >= 400 and r.status <= 499:
  87.     print("GENERAL: HTML Response-code between 400 and 499. 4xx, this class of status code is intended for situations in which the error seems to have been caused by the client. Except when responding to a HEAD request, the server should include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition. These status codes are applicable to any request method. User agents should display any included entity to the user.")
  88. elif r.status >= 500 and r.status <= 599:
  89.     print("GENERAL: HTML Response-code between 500 and 599. 5xx, this means in general, the server failed to fulfil a request. Response status codes beginning with the digit '5'  indicate cases in which the server is aware that it has encountered an error or is otherwise incapable of performing the request. Except when responding to a HEAD request, the server should include an entity containing an explanation of the error situation, and indicate whether it is a temporary or permanent condition. Likewise, user agents should display any included entity to the user. These response codes are applicable to any request method.")
  90.  
  91.  
  92. # EOF
Benutzeravatar
kbr
User
Beiträge: 779
Registriert: Mittwoch 15. Oktober 2008, 09:27
Wohnort: Düsseldorf

Re: Html- Status und Versionscheck

Beitragvon kbr » Dienstag 13. Februar 2018, 08:05

Betrachte das Programm doch als Übung zum strukturierten Programmieren. Mir wäre da noch viel zu viel Text im Code. Den könntest Du auslagern und in eine geeignete Datenstruktur einlesen. Die detailierten Informationen beispielsweise als Liste:

  1. with open('detailed_info.txt') as fobj:
  2.     detailed_infos = fobj.readlines()
  3. try:
  4.     print(detailed_infos[r.status // 100 - 1]
  5. except IndexError:
  6.     print('Unknown status: ', r.status)


Für die einzelnen Status-Codes könntest Du mit dem Configparser rumspielen. Dann bekommst Du da auch ein Gefühl für.
Sirius3
User
Beiträge: 7052
Registriert: Sonntag 21. Oktober 2012, 17:20

Re: Html- Status und Versionscheck

Beitragvon Sirius3 » Dienstag 13. Februar 2018, 08:13

@1of7470000000: HTML hat auch keinen Status, sondern der Status kommt vom HTTP-Protokoll. `input` überschreibt die eingebaute Funktion `input` und ist für eine URL auch ein zu generischer Name. Die generelle Information zum Status kannst Du auch als Wörterbuch der Hunderter schreiben. Du solltest dict.get benutzen oder den KeyError abfangen, um auch unbekannte Codes behandeln zu können.

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder